Tackling Challenges in Data Pooling: Missing Data Handling in Latent Variable Models with Continuous and Categorical Indicators.

Authors :: Chen, Lihan
Miočević, Milica
Falk, Carl F.
Source :: Structural Equation Modeling; Jul/Aug2024, Vol. 31 Issue 4, p651-666, 16p
Publication Year :: 2024
Abstract: Data pooling is a powerful strategy in empirical research. However, combining multiple datasets often results in a large amount of missing data, as variables that are not present in some datasets effectively contain missing values for all participants in those datasets. Furthermore, data pooling typically leads to a mix of continuous and categorical items with nonnormal multivariate distributions. We investigated two popular approaches to handle missing data in this context: (1) applying direct maximum likelihood by treating data as continuous (con-ML), and (2) applying categorical least squares using a polychoric correlation matrix computed from pairwise deletion (cat-LS). These approaches are available for free and relatively straightforward for empirical researchers to implement. Through simulation studies with confirmatory factor analysis and latent mediation analysis, we found cat-LS to be unsuitable for pooled data analysis, whereas con-ML yielded acceptable performance for the estimation of latent path coefficients barring severe nonnormality. [ABSTRACT FROM AUTHOR]

Subjects :: LATENT variables
CONFIRMATORY factor analysis
MISSING data (Statistics)
DISTRIBUTION (Probability theory)
RESEARCH personnel

Full Text Access

Tools