Start Over

An efficient learning based approach for automatic record deduplication with benchmark datasets.

Authors :: Ravikanth, M
Korra, Sampath
Mamidisetti, Gowtham
Goutham, Maganti
Bhaskar, T.
Source :: Scientific Reports; 7/15/2024, Vol. 14 Issue 1, p1-19, 19p
Publication Year :: 2024
Abstract: With technological innovations, enterprises in the real world are managing every iota of data as it can be mined to derive business intelligence (BI). However, when data comes from multiple sources, it may result in duplicate records. As data is given paramount importance, it is also significant to eliminate duplicate entities towards data integration, performance and resource optimization. To realize reliable systems for record deduplication, late, deep learning could offer exciting provisions with a learning-based approach. Deep ER is one of the deep learning-based methods used recently for dealing with the elimination of duplicates in structured data. Using it as a reference model, in this paper, we propose a framework known as Enhanced Deep Learning-based Record Deduplication (EDL-RD) for improving performance further. Towards this end, we exploited a variant of Long Short Term Memory (LSTM) along with various attribute compositions, similarity metrics, and numerical and null value resolution. We proposed an algorithm known as Efficient Learning based Record Deduplication (ELbRD). The algorithm extends the reference model with the aforementioned enhancements. An empirical study has revealed that the proposed framework with extensions outperforms existing methods. [ABSTRACT FROM AUTHOR]

Details

Language :: English
ISSN :: 20452322
Volume :: 14
Issue :: 1
Database :: Complementary Index
Journal :: Scientific Reports
Publication Type :: Academic Journal
Accession number :: 178618326
Full Text :: https://doi.org/10.1038/s41598-024-63242-1

Full Text Access

View/download PDF

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

An efficient learning based approach for automatic record deduplication with benchmark datasets.

Abstract

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

An efficient learning based approach for automatic record deduplication with benchmark datasets.

Abstract

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources