Back to Search Start Over

Using Random Forest Models to Identify Correlates of a Diabetic Peripheral Neuropathy Diagnosis from Electronic Health Record Data.

Authors :
DuBrava, Sarah
Mardekian, Jack
Sadosky, Alesia
Bienen, E. Jay
Parsons, Bruce
Hopps, Markay
Markman, John
Source :
Pain Medicine; Jan2017, Vol. 18 Issue 1, p107-115, 9p, 2 Charts, 4 Graphs
Publication Year :
2017

Abstract

Objective. To identify variables correlated with a diagnosis of diabetic peripheral neuropathy (DPN) using random forest modeling applied to electronic health records. Design. Retrospective analysis. Setting. Humedica de-identified electronic health records database. Subjects. Subjects≥18 years old with type 2 diabetes from January 1, 2008-September 30, 2013 having continuous data for 1 year pre- and postindex with DPN (n535,050) and without DPN (n5288,328) were identified. Methods. Demographic, clinical, and health care resource utilization variables (e.g., inpatient and outpatient encounters, medications, and procedures) were input into a random forest model to identify the most important correlates of a DPN diagnosis. Random forest modeling is a computationally extensive, robust data mining technique that accommodates large sets of variables to identify associated factors using an ensemble of classifications trees. Accuracy of the model was evaluated using receiver operating characteristic curves (ROC). Results. The final random forest model consisted of the following variables (importance) associated with a DPN diagnosis: Charlson Comorbidity Index score (100%), age (37.1%), number of pre-index procedures and services (29.7%), number of pre-index outpatient prescriptions (24.2%), number of preindex outpatient visits (18.3%), number of pre-index laboratory visits (16.9%), number of pre-index outpatient office visits (12.1%), number of inpatient prescriptions (5.9%), and number of pain-related medication prescriptions (4.4%). ROC analysis confirmed model performance, with an area under the curve of 0.824 and accuracy of 89.6% (95% confidence interval 89.4%, 89.8%). Conclusions. Random forest modeling can determine likelihood of a DPN diagnosis. Further validation of the random forest model may help facilitate earlier diagnosis and enhance management strategies. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
15262375
Volume :
18
Issue :
1
Database :
Complementary Index
Journal :
Pain Medicine
Publication Type :
Academic Journal
Accession number :
121261599
Full Text :
https://doi.org/10.1093/pm/pnw096