Back to Search Start Over

Multiple imputation of semi-continuous exposure variables that are categorized for analysis.

Authors :
Nguyen CD
Moreno-Betancur M
Rodwell L
Romaniuk H
Carlin JB
Lee KJ
Source :
Statistics in medicine [Stat Med] 2021 Nov 30; Vol. 40 (27), pp. 6093-6106. Date of Electronic Publication: 2021 Aug 23.
Publication Year :
2021

Abstract

Semi-continuous variables are characterized by a point mass at one value and a continuous range of values for remaining observations. An example is alcohol consumption quantity, with a spike of zeros representing non-drinkers and positive values for drinkers. If multiple imputation is used to handle missing values for semi-continuous variables, it is unclear how this should be implemented within the standard approaches of fully conditional specification (FCS) and multivariate normal imputation (MVNI). This question is brought into focus by the use of categorized versions of semi-continuous exposure variables in analyses (eg, no drinking, drinking below binge level, binge drinking, heavy binge drinking), raising the question of how best to achieve congeniality between imputation and analysis models. We performed a simulation study comparing nine approaches for imputing semi-continuous exposures requiring categorization for analysis. Three methods imputed the categories directly: ordinal logistic regression, and imputation of binary indicator variables representing the categories using MVNI (with two variants). Six methods (predictive mean matching, zero-inflated binomial imputation, and two-part imputation methods with variants in FCS and MVNI) imputed the semi-continuous variable, with categories derived after imputation. The ordinal and zero-inflated binomial methods had good performance across most scenarios, while MVNI methods requiring rounding after imputation did not perform well. There were mixed results for predictive mean matching and the two-part methods, depending on whether the estimands were proportions or regression coefficients. The results highlight the need to consider the parameter of interest when selecting an imputation procedure.<br /> (© 2021 John Wiley & Sons Ltd.)

Details

Language :
English
ISSN :
1097-0258
Volume :
40
Issue :
27
Database :
MEDLINE
Journal :
Statistics in medicine
Publication Type :
Academic Journal
Accession number :
34423450
Full Text :
https://doi.org/10.1002/sim.9172