1. An accessible, efficient, and accurate natural language processing method for extracting diagnostic data from pathology reports
- Author
-
Hansen Lam, Freddy Nguyen, Xintong Wang, Aryeh Stock, Volha Lenskaya, Maryam Kooshesh, Peizi Li, Mohammad Qazi, Shenyu Wang, Mitra Dehghan, Xia Qian, Qiusheng Si, and Alexandros D. Polydorides
- Subjects
Unstructured ,Free-text ,Narrative ,Extraction ,XML ,Carcinoma ,Computer applications to medicine. Medical informatics ,R858-859.7 ,Pathology ,RB1-214 - Abstract
Context: Analysis of diagnostic information in pathology reports for the purposes of clinical or translational research and quality assessment/control often requires manual data extraction, which can be laborious, time-consuming, and subject to mistakes. Objective: We sought to develop, employ, and evaluate a simple, dictionary- and rule-based natural language processing (NLP) algorithm for generating searchable information on various types of parameters from diverse surgical pathology reports. Design: Data were exported from the pathology laboratory information system (LIS) into extensible markup language (XML) documents, which were parsed by NLP-based Python code into desired data points and delivered to Excel spreadsheets. Accuracy and efficiency were compared to a manual data extraction method with concordance measured by Cohen’s κ coefficient and corresponding P values. Results: The automated method was highly concordant (90%–100%, P
- Published
- 2022
- Full Text
- View/download PDF