Back to Search Start Over

Diagnostic Performance of GPT-4o and Claude 3 Opus in Determining Causes of Death From Medical Histories and Postmortem CT Findings.

Authors :
Ishida M
Gonoi W
Nyunoya K
Abe H
Shirota G
Okimoto N
Fujimoto K
Kurokawa M
Nakai M
Saito K
Ushiku T
Abe O
Source :
Cureus [Cureus] 2024 Aug 20; Vol. 16 (8), pp. e67306. Date of Electronic Publication: 2024 Aug 20 (Print Publication: 2024).
Publication Year :
2024

Abstract

Introduction: This study evaluates the diagnostic performance of the latest large language models (LLMs), GPT-4o (OpenAI, San Francisco, CA, USA) and Claude 3 Opus (Anthropic, San Francisco, CA, USA), in determining causes of death from medical histories and postmortem CT findings.<br />Methods: We included 100 adult cases whose postmortem CT scans were diagnosable for the causes of death using the gold standard of autopsy results. Their medical histories and postmortem CT findings were compiled, and clinical and imaging diagnoses of both the underlying and immediate causes of death, as well as their personal information, were carefully separated from the database to be shown to the LLMs. Both GPT-4o and Claude 3 Opus generated the top three differential diagnoses for each of the underlying or immediate causes of death based on the following three prompts: 1) medical history only; 2) postmortem CT findings only; and 3) both medical history and postmortem CT findings. The diagnostic performance of the LLMs was compared using McNemar's test.<br />Results: For the underlying cause of death, GPT-4o achieved primary diagnostic accuracy rates of 78%, 72%, and 78%, while Claude 3 Opus achieved 72%, 56%, and 75% for prompts 1, 2, and 3, respectively. Including any of the top three differential diagnoses, GPT-4o's accuracy rates were 92%, 90%, and 92%, while Claude 3 Opus's rates were 93%, 69%, and 93% for prompts 1, 2, and 3, respectively. For the immediate cause of death, GPT-4o's primary diagnostic accuracy rates were 55%, 58%, and 62%, while Claude 3 Opus's rates were 60%, 62%, and 63% for prompts 1,2, and 3, respectively. For any of the top three differential diagnoses, GPT-4o's accuracy rates were 88% for prompt 1 and 91% for prompts 2 and 3, whereas Claude 3 Opus's rates were 92% for all three prompts. Significant differences between the models were observed for prompt two in diagnosing the underlying cause of death (p = 0.03 and <0.01 for the primary and top three differential diagnoses, respectively).<br />Conclusion: Both GPT-4o and Claude 3 Opus demonstrated relatively high performance in diagnosing both the underlying and immediate causes of death using medical histories and postmortem CT findings.<br />Competing Interests: Human subjects: Consent was obtained or waived by all participants in this study. Institutional Review Board, The University of Tokyo Hospital issued approval 2076. Animal subjects: All authors have confirmed that this study did not involve animal subjects or tissue. Conflicts of interest: In compliance with the ICMJE uniform disclosure form, all authors declare the following: Payment/services info: This work was supported by the Japan Society for the Promotion of Science (JSPS) KAKENHI Grant No. 23K07202, gained by Masanori Ishida, and No. 20K07989, gained by Wataru Gonoi. Financial relationships: All authors have declared that they have no financial relationships at present or within the previous three years with any organizations that might have an interest in the submitted work. Other relationships: All authors have declared that there are no other relationships or activities that could appear to have influenced the submitted work.<br /> (Copyright © 2024, Ishida et al.)

Details

Language :
English
ISSN :
2168-8184
Volume :
16
Issue :
8
Database :
MEDLINE
Journal :
Cureus
Publication Type :
Academic Journal
Accession number :
39301343
Full Text :
https://doi.org/10.7759/cureus.67306