Back to Search
Start Over
Bayesian Graphical Entity Resolution using Exchangeable Random Partition Priors.
- Source :
- Journal of Survey Statistics & Methodology; Jun2023, Vol. 11 Issue 3, p569-596, 28p
- Publication Year :
- 2023
-
Abstract
- Entity resolution (record linkage or deduplication) is the process of identifying and linking duplicate records in databases. In this paper, we propose a Bayesian graphical approach for entity resolution that links records to latent entities, where the prior representation on the linkage structure is exchangeable. First, we adopt a flexible and tractable set of priors for the linkage structure, which corresponds to a special class of random partition models. Second, we propose a more realistic distortion model for categorical/discrete record attributes, which corrects a logical inconsistency with the standard hit-miss model. Third, we incorporate hyperpriors to improve flexibility. Fourth, we employ a partially collapsed Gibbs sampler for inferential speedups. Using a selection of private and nonprivate data sets, we investigate the impact of our modeling contributions and compare our model with two alternative Bayesian models. In addition, we conduct a simulation study for household survey data, where we vary distortion, duplication rates and data set size. We find that our model performs more consistently than the alternatives across a variety of scenarios and typically achieves the highest entity resolution accuracy (F1 score). Open source software is available for our proposed methodology, and we provide a discussion regarding our work and future directions. [ABSTRACT FROM AUTHOR]
- Subjects :
- GIBBS sampling
OPEN source software
RATE setting
HOUSEHOLD surveys
Subjects
Details
- Language :
- English
- ISSN :
- 23250984
- Volume :
- 11
- Issue :
- 3
- Database :
- Complementary Index
- Journal :
- Journal of Survey Statistics & Methodology
- Publication Type :
- Academic Journal
- Accession number :
- 164690062
- Full Text :
- https://doi.org/10.1093/jssam/smac030