1. Step into the era of large multimodal models: a pilot study on ChatGPT-4V(ision)'s ability to interpret radiological images.
- Author
-
Zhu L, Mou W, Lai Y, Chen J, Lin S, Xu L, Lin J, Guo Z, Yang T, Lin A, Qi C, Gan L, Zhang J, and Luo P
- Subjects
- Humans, Pilot Projects, Computer Simulation, Image Interpretation, Computer-Assisted, Artificial Intelligence, Radiography
- Abstract
Background: The introduction of ChatGPT-4V's 'Chat with images' feature represents the beginning of the era of large multimodal models (LMMs), which allows ChatGPT to process and answer questions based on uploaded images. This advancement has the potential to transform how surgical teams utilize radiographic data, as radiological interpretation is crucial for surgical planning and postoperative care. However, a comprehensive evaluation of ChatGPT-4V's capabilities in interpret radiological images and formulating treatment plans remains to be explored., Patients and Methods: Three types of questions were collected: (1) 87 USMLE-style questions, submitting only the question stems and images without providing options to assess ChatGPT's diagnostic capability. For questions involving treatment plan formulations, a five-point Likert scale was used to assess ChatGPT's proposed treatment plan. The 87 questions were then adapted by removing detailed patient history to assess its contribution to diagnosis. The diagnostic performance of ChatGPT-4V was also tested when only medical history was provided. (2) We randomly selected 100 chest radiography from the ChestX-ray8 database to test the ability of ChatGPT-4V to identify abnormal chest radiography. (3) Cases from the 'Diagnose Please' section in the Radiology journal were collected to evaluate the performance of ChatGPT-4V in diagnosing complex cases. Three responses were collected for each question., Results: ChatGPT-4V achieved a diagnostic accuracy of 77.01% for USMLE-style questions. The average score of ChatGPT-4V's treatment plans was 3.97 (Interquartile Range: 3.33-4.67). Removing detailed patient history dropped the diagnostic accuracy to 19.54% (P<0.0001). ChatGPT-4V achieved an AUC of 0.768 (95% CI: 0.684-0.851) in detecting abnormalities in chest radiography, but could not specify the exact disease due to the lack of detailed patient history. For cases from 'Diagnose Please' ChatGPT provided diagnoses consistent with or very similar to the reference answers., Conclusion: ChatGPT-4V demonstrated an impressive ability to combine patient history with radiological images to make diagnoses and directly design treatment plans based on images, suggesting its potential for future application in clinical practice., (Copyright © 2024 The Author(s). Published by Wolters Kluwer Health, Inc.)
- Published
- 2024
- Full Text
- View/download PDF