Back to Search Start Over

Variable selection for latent class analysis in the presence of missing data with application to record linkage.

Authors :
Xu, Huiping
Li, Xiaochun
Zhang, Zuoyi
Grannis, Shaun
Source :
Statistical Methods in Medical Research. Jun2024, Vol. 33 Issue 6, p966-980. 15p.
Publication Year :
2024

Abstract

The Fellegi-Sunter model is a latent class model widely used in probabilistic linkage to identify records that belong to the same entity. Record linkage practitioners typically employ all available matching fields in the model with the premise that more fields convey greater information about the true match status and hence result in improved match performance. In the context of model-based clustering, it is well known that such a premise is incorrect and the inclusion of noisy variables could compromise the clustering. Variable selection procedures have therefore been developed to remove noisy variables. Although these procedures have the potential to improve record matching, they cannot be applied directly due to the ubiquity of the missing data in record linkage applications. In this paper, we modify the stepwise variable selection procedure proposed by Fop, Smart, and Murphy and extend it to account for missing data common in record linkage. Through simulation studies, our proposed method is shown to select the correct set of matching fields across various settings, leading to better-performing algorithms. The improved match performance is also seen in a real-world application. We therefore recommend the use of our proposed selection procedure to identify informative matching fields for probabilistic record linkage algorithms. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
09622802
Volume :
33
Issue :
6
Database :
Academic Search Index
Journal :
Statistical Methods in Medical Research
Publication Type :
Academic Journal
Accession number :
177758658
Full Text :
https://doi.org/10.1177/09622802241242317