1. Can ChatGPT4-vision identify radiologic progression of multiple sclerosis on brain MRI?
- Author
-
Kelly, Brendan S., Duignan, Sophie, Mathur, Prateek, Dillon, Henry, Lee, Edward H., Yeom, Kristen W., Keane, Pearse A., Lawlor, Aonghus, and Killeen, Ronan P.
- Subjects
LANGUAGE models ,ELECTRONIC data processing ,TRANSFORMER models ,COMPUTER vision ,ARTIFICIAL intelligence - Abstract
Background: The large language model ChatGPT can now accept image input with the GPT4-vision (GPT4V) version. We aimed to compare the performance of GPT4V to pretrained U-Net and vision transformer (ViT) models for the identification of the progression of multiple sclerosis (MS) on magnetic resonance imaging (MRI). Methods: Paired coregistered MR images with and without progression were provided as input to ChatGPT4V in a zero-shot experiment to identify radiologic progression. Its performance was compared to pretrained U-Net and ViT models. Accuracy was the primary evaluation metric and 95% confidence interval (CIs) were calculated by bootstrapping. We included 170 patients with MS (50 males, 120 females), aged 21–74 years (mean 42.3), imaged at a single institution from 2019 to 2021, each with 2–5 MRI studies (496 in total). Results: One hundred seventy patients were included, 110 for training, 30 for tuning, and 30 for testing; 100 unseen paired images were randomly selected from the test set for evaluation. Both U-Net and ViT had 94% (95% CI: 89–98%) accuracy while GPT4V had 85% (77–91%). GPT4V gave cautious nonanswers in six cases. GPT4V had precision (specificity), recall (sensitivity), and F1 score of 89% (75–93%), 92% (82–98%), 91 (82–97%) compared to 100% (100–100%), 88 (78–96%), and 0.94 (88–98%) for U-Net and 94% (87–100%), 94 (88–100%), and 94 (89–98%) for ViT. Conclusion: The performance of GPT4V combined with its accessibility suggests has the potential to impact AI radiology research. However, misclassified cases and overly cautious non-answers confirm that it is not yet ready for clinical use. Relevance statement: GPT4V can identify the radiologic progression of MS in a simplified experimental setting. However, GPT4V is not a medical device, and its widespread availability highlights the need for caution and education for lay users, especially those with limited access to expert healthcare. Key Points: Without fine-tuning or the need for prior coding experience, GPT4V can perform a zero-shot radiologic change detection task with reasonable accuracy. However, in absolute terms, in a simplified "spot the difference" medical imaging task, GPT4V was inferior to state-of-the-art computer vision methods. GPT4V's performance metrics were more similar to the ViT than the U-net. This is an exploratory experimental study and GPT4V is not intended for use as a medical device. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF