Back to Search Start Over

Methods to Develop an Electronic Medical Record Phenotype Algorithm to Compare the Risk of Coronary Artery Disease across 3 Chronic Disease Cohorts

Authors :
Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Szolovits, Peter
Liao, Katherine P.
Ananthakrishnan, Ashwin N.
Kumar, Vishesh
Xia, Zongqi
Cagan, Andrew
Gainer, Vivian S.
Goryachev, Sergey
Chen, Pei
Savova, Guergana K.
Agniel, Denis
Churchill, Susanne
Lee, Jaeyoung
Murphy, Shawn N.
Plenge, Robert M.
Kohane, Isaac
Shaw, Stanley Y.
Karlson, Elizabeth W.
Cai, Tianxi
Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Szolovits, Peter
Liao, Katherine P.
Ananthakrishnan, Ashwin N.
Kumar, Vishesh
Xia, Zongqi
Cagan, Andrew
Gainer, Vivian S.
Goryachev, Sergey
Chen, Pei
Savova, Guergana K.
Agniel, Denis
Churchill, Susanne
Lee, Jaeyoung
Murphy, Shawn N.
Plenge, Robert M.
Kohane, Isaac
Shaw, Stanley Y.
Karlson, Elizabeth W.
Cai, Tianxi
Source :
Public Library of Science
Publication Year :
2015

Abstract

Background Typically, algorithms to classify phenotypes using electronic medical record (EMR) data were developed to perform well in a specific patient population. There is increasing interest in analyses which can allow study of a specific outcome across different diseases. Such a study in the EMR would require an algorithm that can be applied across different patient populations. Our objectives were: (1) to develop an algorithm that would enable the study of coronary artery disease (CAD) across diverse patient populations; (2) to study the impact of adding narrative data extracted using natural language processing (NLP) in the algorithm. Additionally, we demonstrate how to implement CAD algorithm to compare risk across 3 chronic diseases in a preliminary study. Methods and Results We studied 3 established EMR based patient cohorts: diabetes mellitus (DM, n = 65,099), inflammatory bowel disease (IBD, n = 10,974), and rheumatoid arthritis (RA, n = 4,453) from two large academic centers. We developed a CAD algorithm using NLP in addition to structured data (e.g. ICD9 codes) in the RA cohort and validated it in the DM and IBD cohorts. The CAD algorithm using NLP in addition to structured data achieved specificity >95% with a positive predictive value (PPV) 90% in the training (RA) and validation sets (IBD and DM). The addition of NLP data improved the sensitivity for all cohorts, classifying an additional 17% of CAD subjects in IBD and 10% in DM while maintaining PPV of 90%. The algorithm classified 16,488 DM (26.1%), 457 IBD (4.2%), and 245 RA (5.0%) with CAD. In a cross-sectional analysis, CAD risk was 63% lower in RA and 68% lower in IBD compared to DM (p<0.0001) after adjusting for traditional cardiovascular risk factors. Conclusions We developed and validated a CAD algorithm that performed well across diverse patient populations. The addition of NLP into the CAD algorithm improved the sensitivity of the algorithm, particularly in cohorts where the prevalence of C<br />National Institutes of Health (U.S.). Informatics for Integrating Biology and the Bedside Project (U54LM008748)

Details

Database :
OAIster
Journal :
Public Library of Science
Notes :
application/pdf, en_US
Publication Type :
Electronic Resource
Accession number :
edsoai.on1286404673
Document Type :
Electronic Resource