1. Between Always and Never: Evaluating Uncertainty in Radiology Reports Using Natural Language Processing
- Author
-
Sara M. Dupont, Jared Narvid, Adi Price, Jason F. Talbott, Andrew L. Callen, Bao H. Do, David McCoy, Marc D. Kohli, and Ben Laguna
- Subjects
Research Report ,medicine.medical_specialty ,Computer science ,media_common.quotation_subject ,Patient characteristics ,computer.software_genre ,030218 nuclear medicine & medical imaging ,03 medical and health sciences ,0302 clinical medicine ,medicine ,Humans ,Radiology, Nuclear Medicine and imaging ,Single institution ,Set (psychology) ,Natural Language Processing ,media_common ,Original Paper ,Modalities ,Radiological and Ultrasound Technology ,business.industry ,Uncertainty ,Gold standard (test) ,Ambiguity ,Computer Science Applications ,Test (assessment) ,Radiology report ,Radiology Information Systems ,Artificial intelligence ,Radiology ,business ,computer ,030217 neurology & neurosurgery ,Natural language processing - Abstract
The ideal radiology report reduces diagnostic uncertainty, while avoiding ambiguity whenever possible. The purpose of this study was to characterize the use of uncertainty terms in radiology reports at a single institution and compare the use of these terms across imaging modalities, anatomic sections, patient characteristics, and radiologist characteristics. We hypothesized that there would be variability among radiologists and between subspecialities within radiology regarding the use of uncertainty terms and that the length of the impression of a report would be a predictor of use of uncertainty terms. Finally, we hypothesized that use of uncertainty terms would often be interpreted by human readers as “hedging.” To test these hypotheses, we applied a natural language processing (NLP) algorithm to assess and count the number of uncertainty terms within radiology reports. An algorithm was created to detect usage of a published set of uncertainty terms. All 642,569 radiology report impressions from 171 reporting radiologists were collected from 2011 through 2015. For validation, two radiologists without knowledge of the software algorithm reviewed report impressions and were asked to determine whether the report was “uncertain” or “hedging.” The relationship between the presence of 1 or more uncertainty terms and the human readers’ assessment was compared. There were significant differences in the proportion of reports containing uncertainty terms across patient admission status and across anatomic imaging subsections. Reports with uncertainty were significantly longer than those without, although report length was not significantly different between subspecialities or modalities. There were no significant differences in rates of uncertainty when comparing the experience of the attending radiologist. When compared with reader 1 as a gold standard, accuracy was 0.91, sensitivity was 0.92, specificity was 0.9, and precision was 0.88, with an F1-score of 0.9. When compared with reader 2, accuracy was 0.84, sensitivity was 0.88, specificity was 0.82, and precision was 0.68, with an F1-score of 0.77. Substantial variability exists among radiologists and subspecialities regarding the use of uncertainty terms, and this variability cannot be explained by years of radiologist experience or differences in proportions of specific modalities. Furthermore, detection of uncertainty terms demonstrates good test characteristics for predicting human readers’ assessment of uncertainty.
- Published
- 2020
- Full Text
- View/download PDF