Back to Search
Start Over
Performance of large language models in oral and maxillofacial surgery examinations.
- Source :
- International Journal of Oral & Maxillofacial Surgery; Oct2024, Vol. 53 Issue 10, p881-886, 6p
- Publication Year :
- 2024
-
Abstract
- This study aimed to determine the accuracy of large language models (LLMs) in answering oral and maxillofacial surgery (OMS) multiple choice questions. A total of 259 questions from the university's question bank were answered by the LLMs (GPT-3.5, GPT-4, Llama 2, Gemini, and Copilot). The scores per category as well as the total score out of 259 were recorded and evaluated, with the passing score set at 50%. The mean overall score amongst all LLMs was 62.5%. GPT-4 performed the best (76.8%, 95% confidence interval (CI) 71.4–82.2%), followed by Copilot (72.6%, 95% CI 67.2–78.0%), GPT-3.5 (62.2%, 95% CI 56.4–68.0%), Gemini (58.7%, 95% CI 52.9–64.5%), and Llama 2 (42.5%, 95% CI 37.1–48.6%). There was a statistically significant difference between the scores of the five LLMs overall (χ<superscript>2</superscript> = 79.9, df = 4, P < 0.001) and within all categories except 'basic sciences' (P = 0.129), 'dentoalveolar and implant surgery' (P = 0.052), and 'oral medicine/pathology/radiology' (P = 0.801). The LLMs performed best in 'basic sciences' (68.9%) and poorest in 'pharmacology' (45.9%). The LLMs can be used as adjuncts in teaching, but should not be used for clinical decision-making until the models are further developed and validated. [ABSTRACT FROM AUTHOR]
Details
- Language :
- English
- ISSN :
- 09015027
- Volume :
- 53
- Issue :
- 10
- Database :
- Supplemental Index
- Journal :
- International Journal of Oral & Maxillofacial Surgery
- Publication Type :
- Academic Journal
- Accession number :
- 179238972
- Full Text :
- https://doi.org/10.1016/j.ijom.2024.06.003