Contribution of Data Categories to Readmission Prediction Accuracy

Authors :: Ge, Wendong
Kim, Hee Yeun
Desai, Sonali
Perlovsky, Leonid
Turchin, Alexander
Publication Year :: 2018
Abstract: Identification of patients at high risk for readmission could help reduce morbidity and mortality as well as healthcare costs. Most of the existing studies on readmission prediction did not compare the contribution of data categories. In this study we analyzed relative contribution of 90,101 variables across 398,884 admission records corresponding to 163,468 patients, including patient demographics, historical hospitalization information, discharge disposition, diagnoses, procedures, medications and laboratory test results. We established an interpretable readmission prediction model based on Logistic Regression in scikit-learn, and added the available variables to the model one by one in order to analyze the influences of individual data categories on readmission prediction accuracy. Diagnosis related groups (c-statistic increment of 0.0933) and discharge disposition (c-statistic increment of 0.0269) were the strongest contributors to model accuracy. Additionally, we also identified the top ten contributing variables in every data category.