Back to Search
Start Over
Methods for analyzing data from probabilistic linkage strategies based on partially identifying variables
- Source :
- Statistics in medicine, 31(30), 4231-4242. John Wiley and Sons Ltd
- Publication Year :
- 2012
- Publisher :
- Wiley, 2012.
-
Abstract
- In record linkage studies, unique identifiers are often not available, and therefore, the linkage procedure depends on combinations of partially identifying variables with low discriminating power. As a consequence, wrongly linked covariate and outcome pairs will be created and bias further analysis of the linked data. In this article, we investigated two estimators that correct for linkage error in regression analysis. We extended the estimators developed by Lahiri and Larsen and also suggested a weighted least squares approach to deal with linkage error. We considered both linear and logistic regression problems and evaluated the performance of both methods with simulations. Our results show that all wrong covariate and outcome pairs need to be removed from the analysis in order to calculate unbiased regression coefficients in both approaches. This removal requires strong assumptions on the structure of the data. In addition, the bias significantly increases when the assumptions do not hold and wrongly linked records influence the coefficient estimation. Our simulations showed that both methods had similar performance in linear regression problems. With logistic regression problems, the weighted least squares method showed less bias. Because the specific structure of the data in record linkage problems often leads to different assumptions, it?is necessary that the analyst has prior knowledge on the nature of the data. These assumptions are more easily introduced in the weighted least squares approach than in the Lahiri and Larsen estimator. Copyright (c) 2012 John Wiley & Sons, Ltd
- Subjects :
- Statistics and Probability
Linkage (software)
Epidemiology
Linear model
Estimator
Regression analysis
Logistic regression
Robust regression
Logistic Models
Bias
Research Design
Data Interpretation, Statistical
Linear regression
Covariate
Statistics
Linear Models
Econometrics
Humans
Regression Analysis
Computer Simulation
Least-Squares Analysis
Mathematics
Subjects
Details
- ISSN :
- 02776715
- Volume :
- 31
- Database :
- OpenAIRE
- Journal :
- Statistics in Medicine
- Accession number :
- edsair.doi.dedup.....04d5967a421fefa005d9ca515e225499