Back to Search
Start Over
Accuracy of random-forest-based imputation of missing data in the presence of non-normality, non-linearity, and interaction
- Source :
- BMC Medical Research Methodology, Vol 20, Iss 1, Pp 1-12 (2020)
- Publication Year :
- 2020
- Publisher :
- BMC, 2020.
-
Abstract
- Abstract Background Missing data are common in statistical analyses, and imputation methods based on random forests (RF) are becoming popular for handling missing data especially in biomedical research. Unlike standard imputation approaches, RF-based imputation methods do not assume normality or require specification of parametric models. However, it is still inconclusive how they perform for non-normally distributed data or when there are non-linear relationships or interactions. Methods To examine the effects of these three factors, a variety of datasets were simulated with outcome-dependent missing at random (MAR) covariates, and the performances of the RF-based imputation methods missForest and CALIBERrfimpute were evaluated in comparison with predictive mean matching (PMM). Results Both missForest and CALIBERrfimpute have high predictive accuracy but missForest can produce severely biased regression coefficient estimates and downward biased confidence interval coverages, especially for highly skewed variables in nonlinear models. CALIBERrfimpute typically outperforms missForest when estimating regression coefficients, although its biases are still substantial and can be worse than PMM for logistic regression relationships with interaction. Conclusions RF-based imputation, in particular missForest, should not be indiscriminately recommended as a panacea for imputing missing data, especially when data are highly skewed and/or outcome-dependent MAR. A correct analysis requires a careful critique of the missing data mechanism and the inter-relationships between the variables in the data.
- Subjects :
- Missing data imputation
Imputation accuracy
Random forest
Medicine (General)
R5-920
Subjects
Details
- Language :
- English
- ISSN :
- 14712288
- Volume :
- 20
- Issue :
- 1
- Database :
- Directory of Open Access Journals
- Journal :
- BMC Medical Research Methodology
- Publication Type :
- Academic Journal
- Accession number :
- edsdoj.0fbcb76e9c224a688c0ec27ff50fe295
- Document Type :
- article
- Full Text :
- https://doi.org/10.1186/s12874-020-01080-1