Start Over

AI chatbots show promise but limitations on UK medical exam questions: a comparative performance study.

Authors :: Sadeq, Mohammed Ahmed
Ghorab, Reem Mohamed Farouk
Ashry, Mohamed Hady
Abozaid, Ahmed Mohamed
Banihani, Haneen A.
Salem, Moustafa
Aisheh, Mohammed Tawfiq Abu
Abuzahra, Saad
Mourid, Marina Ramzy
Assker, Mohamad Monif
Ayyad, Mohammed
Moawad, Mostafa Hossam El Din
Source :: Scientific Reports; 8/15/2024, Vol. 14 Issue 1, p1-11, 11p
Publication Year :: 2024
Abstract: Large language models (LLMs) like ChatGPT have potential applications in medical education such as helping students study for their licensing exams by discussing unclear questions with them. However, they require evaluation on these complex tasks. The purpose of this study was to evaluate how well publicly accessible LLMs performed on simulated UK medical board exam questions. 423 board-style questions from 9 UK exams (MRCS, MRCP, etc.) were answered by seven LLMs (ChatGPT-3.5, ChatGPT-4, Bard, Perplexity, Claude, Bing, Claude Instant). There were 406 multiple-choice, 13 true/false, and 4 "choose N" questions covering topics in surgery, pediatrics, and other disciplines. The accuracy of the output was graded. Statistics were used to analyze differences among LLMs. Leaked questions were excluded from the primary analysis. ChatGPT 4.0 scored (78.2%), Bing (67.2%), Claude (64.4%), and Claude Instant (62.9%). Perplexity scored the lowest (56.1%). Scores differed significantly between LLMs overall (p < 0.001) and in pairwise comparisons. All LLMs scored higher on multiple-choice vs true/false or "choose N" questions. LLMs demonstrated limitations in answering certain questions, indicating refinements needed before primary reliance in medical education. However, their expanding capabilities suggest a potential to improve training if thoughtfully implemented. Further research should explore specialty specific LLMs and optimal integration into medical curricula. [ABSTRACT FROM AUTHOR]

Subjects :: CHATBOTS
LANGUAGE models
ARTIFICIAL intelligence
CHATGPT
PERFORMANCE theory
MEDICAL education

Details

Language :: English
ISSN :: 20452322
Volume :: 14
Issue :: 1
Database :: Complementary Index
Journal :: Scientific Reports
Publication Type :: Academic Journal
Accession number :: 179040638
Full Text :: https://doi.org/10.1038/s41598-024-68996-2

Full Text Access

View/download PDF

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

AI chatbots show promise but limitations on UK medical exam questions: a comparative performance study.

Abstract

Subjects

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

AI chatbots show promise but limitations on UK medical exam questions: a comparative performance study.

Abstract

Subjects

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources