Back to Search
Start Over
Two-stage imputation method to handle missing data for categorical response variable.
- Source :
- Communications for Statistical Applications & Methods; Nov2023, Vol. 30 Issue 6, p577-587, 11p
- Publication Year :
- 2023
-
Abstract
- Conventional categorical data imputation techniques, such as mode imputation, often encounter issues related to overestimation. If the variable has too many categories, multinomial logistic regression imputation method may be impossible due to computational limitations. To rectify these limitations, we propose a two-stage imputation method. During the first stage, we utilize the Boruta variable selection method on the complete dataset to identify significant variables for the target categorical variable. Then, in the second stage, we use the important variables for the target categorical variable for logistic regression to impute missing data in binary variables, polytomous regression to impute missing data in categorical variables, and predictive mean matching to impute missing data in quantitative variables. Through analysis of both asymmetric and non-normal simulated and real data, we demonstrate that the two-stage imputation method outperforms imputation methods lacking variable selection, as evidenced by accuracy measures. During the analysis of real survey data, we also demonstrate that our suggested two-stage imputation method surpasses the current imputation approach in terms of accuracy. [ABSTRACT FROM AUTHOR]
- Subjects :
- MISSING data (Statistics)
LOGISTIC regression analysis
Subjects
Details
- Language :
- English
- ISSN :
- 22877843
- Volume :
- 30
- Issue :
- 6
- Database :
- Complementary Index
- Journal :
- Communications for Statistical Applications & Methods
- Publication Type :
- Academic Journal
- Accession number :
- 173902642
- Full Text :
- https://doi.org/10.29220/CSAM.2023.30.6.577