Start Over

Evaluation of GPT-4 ability to identify and generate patient instructions for actionable incidental radiology findings.

Authors :: Woo KC
Simon GW
Akindutire O
Aphinyanaphongs Y
Austrian JS
Kim JG
Genes N
Goldenring JA
Major VJ
Pariente CS
Pineda EG
Kang SK
Source :: Journal of the American Medical Informatics Association : JAMIA [J Am Med Inform Assoc] 2024 Sep 01; Vol. 31 (9), pp. 1983-1993.
Publication Year :: 2024
Abstract: Objectives: To evaluate the proficiency of a HIPAA-compliant version of GPT-4 in identifying actionable, incidental findings from unstructured radiology reports of Emergency Department patients. To assess appropriateness of artificial intelligence (AI)-generated, patient-facing summaries of these findings.<br />Materials and Methods: Radiology reports extracted from the electronic health record of a large academic medical center were manually reviewed to identify non-emergent, incidental findings with high likelihood of requiring follow-up, further sub-stratified as "definitely actionable" (DA) or "possibly actionable-clinical correlation" (PA-CC). Instruction prompts to GPT-4 were developed and iteratively optimized using a validation set of 50 reports. The optimized prompt was then applied to a test set of 430 unseen reports. GPT-4 performance was primarily graded on accuracy identifying either DA or PA-CC findings, then secondarily for DA findings alone. Outputs were reviewed for hallucinations. AI-generated patient-facing summaries were assessed for appropriateness via Likert scale.<br />Results: For the primary outcome (DA or PA-CC), GPT-4 achieved 99.3% recall, 73.6% precision, and 84.5% F-1. For the secondary outcome (DA only), GPT-4 demonstrated 95.2% recall, 77.3% precision, and 85.3% F-1. No findings were "hallucinated" outright. However, 2.8% of cases included generated text about recommendations that were inferred without specific reference. The majority of True Positive AI-generated summaries required no or minor revision.<br />Conclusion: GPT-4 demonstrates proficiency in detecting actionable, incidental findings after refined instruction prompting. AI-generated patient instructions were most often appropriate, but rarely included inferred recommendations. While this technology shows promise to augment diagnostics, active clinician oversight via "human-in-the-loop" workflows remains critical for clinical implementation.<br /> (© The Author(s) 2024. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For permissions, please email: journals.permissions@oup.com.)

Subjects :: Humans
Emergency Service, Hospital
Radiology Information Systems
Incidental Findings
Electronic Health Records
Artificial Intelligence

Details

Language :: English
ISSN :: 1527-974X
Volume :: 31
Issue :: 9
Database :: MEDLINE
Journal :: Journal of the American Medical Informatics Association : JAMIA
Publication Type :: Academic Journal
Accession number :: 38778578
Full Text :: https://doi.org/10.1093/jamia/ocae117

Full Text Access

View/download PDF

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Evaluation of GPT-4 ability to identify and generate patient instructions for actionable incidental radiology findings.

Abstract

Subjects

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Evaluation of GPT-4 ability to identify and generate patient instructions for actionable incidental radiology findings.

Abstract

Subjects

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources