Back to Search Start Over

An efficient learning based approach for automatic record deduplication with benchmark datasets.

Authors :
Ravikanth, M
Korra, Sampath
Mamidisetti, Gowtham
Goutham, Maganti
Bhaskar, T.
Source :
Scientific Reports; 7/15/2024, Vol. 14 Issue 1, p1-19, 19p
Publication Year :
2024

Abstract

With technological innovations, enterprises in the real world are managing every iota of data as it can be mined to derive business intelligence (BI). However, when data comes from multiple sources, it may result in duplicate records. As data is given paramount importance, it is also significant to eliminate duplicate entities towards data integration, performance and resource optimization. To realize reliable systems for record deduplication, late, deep learning could offer exciting provisions with a learning-based approach. Deep ER is one of the deep learning-based methods used recently for dealing with the elimination of duplicates in structured data. Using it as a reference model, in this paper, we propose a framework known as Enhanced Deep Learning-based Record Deduplication (EDL-RD) for improving performance further. Towards this end, we exploited a variant of Long Short Term Memory (LSTM) along with various attribute compositions, similarity metrics, and numerical and null value resolution. We proposed an algorithm known as Efficient Learning based Record Deduplication (ELbRD). The algorithm extends the reference model with the aforementioned enhancements. An empirical study has revealed that the proposed framework with extensions outperforms existing methods. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
20452322
Volume :
14
Issue :
1
Database :
Complementary Index
Journal :
Scientific Reports
Publication Type :
Academic Journal
Accession number :
178618326
Full Text :
https://doi.org/10.1038/s41598-024-63242-1