1. Artificial Intelligence in Orthopaedics: Performance of ChatGPT on Text and Image Questions on a Complete AAOS Orthopaedic In-Training Examination (OITE).
- Author
-
Hayes DS, Foster BK, Makar G, Manzar S, Ozdag Y, Shultz M, Klena JC, and Grandizio LC
- Subjects
- Humans, Education, Medical, Graduate methods, Clinical Competence, Artificial Intelligence, Orthopedics education, Educational Measurement methods, Internship and Residency methods
- Abstract
Objective: Artificial intelligence (AI) is capable of answering complex medical examination questions, offering the potential to revolutionize medical education and healthcare delivery. In this study we aimed to assess ChatGPT, a model that has demonstrated exceptional performance on standardized exams. Specifically, our focus was on evaluating ChatGPT's performance on the complete 2019 Orthopaedic In-Training Examination (OITE), including questions with an image component. Furthermore, we explored difference in performance when questions varied by text only or text with an associated image, including whether the image was described using AI or a trained orthopaedist., Design and Setting: Questions from the 2019 OITE were input into ChatGPT version 4.0 (GPT-4) using 3 response variants. As the capacity to input or interpret images is not publicly available in ChatGPT at the time of this study, questions with an image component were described and added to the OITE question using descriptions generated by Microsoft Azure AI Vision Studio or authors of the study., Results: ChatGPT performed equally on OITE questions with or without imaging components, with an average correct answer choice of 49% and 48% across all 3 input methods. Performance dropped by 6% when using image descriptions generated by AI. When using single answer multiple-choice input methods, ChatGPT performed nearly double the rate of random guessing, answering 49% of questions correctly. The performance of ChatGPT was worse than all resident classes on the 2019 exam, scoring 4% lower than PGY-1 residents., Discussion: ChatGT performed below all resident classes on the 2019 OITE. Performance on text only questions and questions with images was nearly equal if the image was described by a trained orthopaedic specialist but decreased when using an AI generated description. Recognizing the performance abilities of AI software may provide insight into the current and future applications of this technology into medical education., (Copyright © 2024 Association of Program Directors in Surgery. Published by Elsevier Inc. All rights reserved.)
- Published
- 2024
- Full Text
- View/download PDF