1. Let's Have a Chat: How Well Does an Artificial Intelligence Chatbot Answer Clinical Infectious Diseases Pharmacotherapy Questions?
- Author
-
Kufel WD, Hanrahan KD, Seabury RW, Parsels KA, Gallagher JC, MacDougall C, Covington EW, Chahine EB, Britt RS, and Steele JM
- Abstract
Background: It is unknown whether ChatGPT provides quality responses to infectious diseases (ID) pharmacotherapy questions. This study surveyed ID pharmacist subject matter experts (SMEs) to assess the quality of ChatGPT version 3.5 (GPT-3.5) responses., Methods: The primary outcome was the percentage of GPT-3.5 responses considered useful by SME rating. Secondary outcomes were SMEs' ratings of correctness, completeness, and safety. Rating definitions were based on literature review. One hundred ID pharmacotherapy questions were entered into GPT-3.5 without custom instructions or additional prompts, and responses were recorded. A 0-10 rating scale for correctness, completeness, and safety was developed and validated for interrater reliability. Continuous and categorical variables were assessed for interrater reliability via average measures intraclass correlation coefficient and Fleiss multirater kappa, respectively. SMEs' responses were compared by the Kruskal-Wallis test and chi-square test for continuous and categorical variables., Results: SMEs considered 41.8% of responses useful. Median (IQR) ratings for correctness, completeness, and safety were 7 (4-9), 5 (3-8), and 8 (4-10), respectively. The Fleiss multirater kappa for usefulness was 0.379 (95% CI, .317-.441) indicating fair agreement, and intraclass correlation coefficients were 0.820 (95% CI, .758-.870), 0.745 (95% CI, .656-.816), and 0.833 (95% CI, .775-.880) for correctness, completeness, and safety, indicating at least substantial agreement. No significant difference was observed among SME responses for percentage of responses considered useful., Conclusions: Fewer than 50% of GPT-3.5 responses were considered useful by SMEs. Responses were mostly considered correct and safe but were often incomplete, suggesting that GPT-3.5 responses may not replace an ID pharmacist's responses., Competing Interests: Potential conflicts of interest. J. M. S. served on an advisory board for Paratek Pharmaceuticals. E. B. C. served on an advisory board and speakers bureau for Seqirus. W. D. K. received research grants from Melinta, Merck & Co, and Shionogi, Inc and served on an advisory board for Theratechnologies, Inc. All other authors report no potential conflicts., (© The Author(s) 2024. Published by Oxford University Press on behalf of Infectious Diseases Society of America.)
- Published
- 2024
- Full Text
- View/download PDF