1. Patterns of Metastatic Disease in Patients with Cancer Derived from Natural Language Processing of Structured CT Radiology Reports over a 10-year Period
- Author
-
Michio Taya, Nikolaus Schultz, Amber L. Simpson, Farhana Zulkernine, Lior Gazit, Kevin Nicholas, Karen Batch, Krishna Juluru, Huy Nguyen, Prachi Rahurkar, Natalie Gangai, Richard K. G. Do, Christopher J. Fong, Kaelan Lupton, Pamela Ines Causa Andrieu, Varadan Sevilimedu, Anisha Luthra, and Hedvig Hricak
- Subjects
Male ,medicine.medical_specialty ,Databases, Factual ,MEDLINE ,Disease ,computer.software_genre ,Neoplasms ,Electronic Health Records ,Humans ,Medicine ,Radiology, Nuclear Medicine and imaging ,In patient ,Longitudinal Studies ,Neoplasm Metastasis ,Original Research ,Data Management ,Natural Language Processing ,Retrospective Studies ,business.industry ,Reproducibility of Results ,food and beverages ,Cancer ,Middle Aged ,medicine.disease ,Feasibility Studies ,Female ,Radiology ,Artificial intelligence ,Tomography, X-Ray Computed ,business ,computer ,Period (music) ,Natural language processing - Abstract
BACKGROUND: Patterns of metastasis in cancer are increasingly relevant to prognostication and treatment planning but have historically been documented by means of autopsy series. PURPOSE: To show the feasibility of using natural language processing (NLP) to gather accurate data from radiology reports for assessing spatial and temporal patterns of metastatic spread in a large patient cohort. MATERIALS AND METHODS: In this retrospective longitudinal study, consecutive patients who underwent CT from July 2009 to April 2019 and whose CT reports followed a departmental structured template were included. Three radiologists manually curated a sample of 2219 reports for the presence or absence of metastases across 13 organs; these manually curated reports were used to develop three NLP models with an 80%-20% split for training and test sets. A separate random sample of 448 manually curated reports was used for validation. Model performance was measured by accuracy, precision, and recall for each organ. The best-performing NLP model was used to generate a final database of metastatic disease across all patients. For each cancer type, statistical descriptive reports were provided by analyzing the frequencies of metastatic disease at the report and patient levels. RESULTS: In 91 665 patients (mean age ± standard deviation, 61 years ± 15; 46 939 women), 387 359 reports were labeled. The best-performing NLP model achieved accuracies from 90% to 99% across all organs. Metastases were most frequently reported in abdominopelvic (23.6% of all reports) and thoracic (17.6%) nodes, followed by lungs (14.7%), liver (13.7%), and bones (9.9%). Metastatic disease tropism is distinct among common cancers, with the most common first site being bones in prostate and breast cancers and liver among pancreatic and colorectal cancers. CONCLUSION: Natural language processing may be applied to cancer patients’ CT reports to generate a large database of metastatic phenotypes. Such a database could be combined with genomic studies and used to explore prognostic imaging phenotypes with relevance to treatment planning. © RSNA, 2021 Online supplemental material is available for this article.
- Published
- 2021
- Full Text
- View/download PDF