Back to Search
Start Over
Natural Languageābased Machine Learning Models for the Annotation of Clinical Radiology Reports
- Source :
- Radiology. 287:570-580
- Publication Year :
- 2018
- Publisher :
- Radiological Society of North America (RSNA), 2018.
-
Abstract
- Purpose To compare different methods for generating features from radiology reports and to develop a method to automatically identify findings in these reports. Materials and Methods In this study, 96 303 head computed tomography (CT) reports were obtained. The linguistic complexity of these reports was compared with that of alternative corpora. Head CT reports were preprocessed, and machine-analyzable features were constructed by using bag-of-words (BOW), word embedding, and Latent Dirichlet allocation-based approaches. Ultimately, 1004 head CT reports were manually labeled for findings of interest by physicians, and a subset of these were deemed critical findings. Lasso logistic regression was used to train models for physician-assigned labels on 602 of 1004 head CT reports (60%) using the constructed features, and the performance of these models was validated on a held-out 402 of 1004 reports (40%). Models were scored by area under the receiver operating characteristic curve (AUC), and aggregate AUC statistics were reported for (a) all labels, (b) critical labels, and (c) the presence of any critical finding in a report. Sensitivity, specificity, accuracy, and F1 score were reported for the best performing model's (a) predictions of all labels and (b) identification of reports containing critical findings. Results The best-performing model (BOW with unigrams, bigrams, and trigrams plus average word embeddings vector) had a held-out AUC of 0.966 for identifying the presence of any critical head CT finding and an average 0.957 AUC across all head CT findings. Sensitivity and specificity for identifying the presence of any critical finding were 92.59% (175 of 189) and 89.67% (191 of 213), respectively. Average sensitivity and specificity across all findings were 90.25% (1898 of 2103) and 91.72% (18 351 of 20 007), respectively. Simpler BOW methods achieved results competitive with those of more sophisticated approaches, with an average AUC for presence of any critical finding of 0.951 for unigram BOW versus 0.966 for the best-performing model. The Yule I of the head CT corpus was 34, markedly lower than that of the Reuters corpus (at 103) or I2B2 discharge summaries (at 271), indicating lower linguistic complexity. Conclusion Automated methods can be used to identify findings in radiology reports. The success of this approach benefits from the standardized language of these reports. With this method, a large labeled corpus can be generated for applications such as deep learning.
- Subjects :
- Text corpus
medicine.medical_specialty
Databases, Factual
ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION
Machine learning
computer.software_genre
Sensitivity and Specificity
030218 nuclear medicine & medical imaging
Machine Learning
03 medical and health sciences
Annotation
0302 clinical medicine
medicine
Electronic Health Records
Humans
Radiology, Nuclear Medicine and imaging
030212 general & internal medicine
Natural Language Processing
business.industry
Deep learning
Area Under Curve
Artificial intelligence
Radiology
Tomography, X-Ray Computed
business
computer
Natural language
Subjects
Details
- ISSN :
- 15271315 and 00338419
- Volume :
- 287
- Database :
- OpenAIRE
- Journal :
- Radiology
- Accession number :
- edsair.doi.dedup.....17da14a104282bc3247f70aee2a19a75
- Full Text :
- https://doi.org/10.1148/radiol.2018171093