Back to Search
Start Over
reclin2: a Toolkit for Record Linkage and Deduplication.
- Source :
-
R Journal . Jun2022, Vol. 14 Issue 2, p320-328. 9p. - Publication Year :
- 2022
-
Abstract
- The goal of record linkage and deduplication is to detect which records belong to the same object in data sets where the identifiers of the objects contain errors and missing values. The main design considerations of reclin2 are: modularity/flexibility, speed and the ability to handle large data sets. The first points makes it easy for users to extend the package with custom process steps. This flexibility is obtained by using simple data structures and by following as close as possible common interfaces in R. For large problems it is possible to distribute the work over multiple worker nodes. A benchmark comparison to other record linkage packages for R, shows that for this specific benchmark, the fastLink package performs best. However, this package only performs one specific type of record linkage model. The performance of reclin2 is not far behind the of fastLink while allowing for much greater flexibility. [ABSTRACT FROM AUTHOR]
- Subjects :
- *BIG data
Subjects
Details
- Language :
- English
- ISSN :
- 20734859
- Volume :
- 14
- Issue :
- 2
- Database :
- Academic Search Index
- Journal :
- R Journal
- Publication Type :
- Academic Journal
- Accession number :
- 162463005
- Full Text :
- https://doi.org/10.32614/rj-2022-038