1. Extracting principal diagnosis, co-morbidity and smoking status for asthma research: evaluation of a natural language processing system
- Author
-
Ross Lazarus, Sergey Goryachev, Shawn N. Murphy, Qing Treitler Zeng, Scott T. Weiss, and Margarita Sordo
- Subjects
Research evaluation ,Medical Records Systems, Computerized ,Health Informatics ,Comorbidity ,computer.software_genre ,lcsh:Computer applications to medicine. Medical informatics ,Health informatics ,Sensitivity and Specificity ,03 medical and health sciences ,Pulmonary Disease, Chronic Obstructive ,0302 clinical medicine ,Text mining ,International Classification of Diseases ,medicine ,Humans ,030212 general & internal medicine ,030304 developmental biology ,Asthma ,Natural Language Processing ,0303 health sciences ,business.industry ,Medical record ,Health Policy ,Smoking ,medicine.disease ,Patient Discharge ,3. Good health ,Computer Science Applications ,lcsh:R858-859.7 ,Co morbidity ,Smoking status ,Data mining ,Medical emergency ,Principal diagnosis ,business ,computer ,Research Article - Abstract
Background The text descriptions in electronic medical records are a rich source of information. We have developed a Health Information Text Extraction (HITEx) tool and used it to extract key findings for a research study on airways disease. Methods The principal diagnosis, co-morbidity and smoking status extracted by HITEx from a set of 150 discharge summaries were compared to an expert-generated gold standard. Results The accuracy of HITEx was 82% for principal diagnosis, 87% for co-morbidity, and 90% for smoking status extraction, when cases labeled "Insufficient Data" by the gold standard were excluded. Conclusion We consider the results promising, given the complexity of the discharge summaries and the extraction tasks.
- Full Text
- View/download PDF