1. Concurrent Imputation and Prediction on EHR data using Bi-Directional GANs: Bi-GANs for EHR imputation and prediction
- Author
-
Mehak Gupta, Thao-Ly T. Phan, Rahmatollah Beheshti, and H. Timothy Bunnell
- Subjects
Task (computing) ,Discriminator ,Recurrent neural network ,Computer science ,Window (computing) ,Imputation (statistics) ,Data mining ,computer.software_genre ,Missing data ,computer ,Field (computer science) ,Article ,Generator (mathematics) - Abstract
Working with electronic health records (EHRs) is known to be challenging due to several reasons. These reasons include not having: 1) similar lengths (per visit), 2) the same number of observations (per patient), and 3) complete entries in the available records. These issues hinder the performance of the predictive models created using EHRs. In this paper, we approach these issues by presenting a model for the combined task of imputing and predicting values for the irregularly observed and varying length EHR data with missing entries. Our proposed model (dubbed as Bi-GAN) uses a bidirectional recurrent network in a generative adversarial setting. In this architecture, the generator is a bidirectional recurrent network that receives the EHR data and imputes the existing missing values. The discriminator attempts to discriminate between the actual and the imputed values generated by the generator. Using the input data in its entirety, Bi-GAN learns how to impute missing elements in-between (imputation) or outside of the input time steps (prediction). Our method has three advantages to the state-of-the-art methods in the field: (a) one single model performs both the imputation and prediction tasks; (b) the model can perform predictions using time-series of varying length with missing data; (c) it does not require to know the observation and prediction time window during training and can be used for the predictions with different observation and prediction window lengths, for short- and long-term predictions. We evaluate our model on two large EHR datasets to impute and predict body mass index (BMI) values and show its superior performance in both settings.
- Published
- 2021