Back to Search Start Over

Ensemble-Based Relationship Discovery in Relational Databases

Authors :
Mathias Kern
John McCall
Benjamin Lacroix
Gilbert Owusu
David Corsar
Akinola Ogunsemi
Source :
Lecture Notes in Computer Science ISBN: 9783030637989, SGAI Conf.
Publication Year :
2020
Publisher :
Springer International Publishing, 2020.

Abstract

We performed an investigation of how several data relationship discovery algorithms can be combined to improve performance. We investigated eight relationship discovery algorithms like Cosine similarity, Soundex similarity, Name similarity, Value range similarity, etc., to identify potential links between database tables in different ways using different categories of database information. We proposed voting system and hierarchical clustering ensemble methods to reduce the generalization error of each algorithm. Voting scheme uses a given weighting metric to combine the predictions of each algorithm. Hierarchical clustering groups predictions into clusters based on similarities and then combine a member from each cluster together. We run experiments to validate the performance of each algorithm and compare performance with our ensemble methods and the state-of-the-art algorithms (FaskFK, Randomness and HoPF) using Precision, Recall and F-Measure evaluation metrics over TPCH and AdvWork datasets. Results show that performance of each algorithm is limited, indicating the importance of combining them to consolidate their strengths.

Details

ISBN :
978-3-030-63798-9
ISBNs :
9783030637989
Database :
OpenAIRE
Journal :
Lecture Notes in Computer Science ISBN: 9783030637989, SGAI Conf.
Accession number :
edsair.doi...........add3a666b2edb0957b88cf2f26db883d
Full Text :
https://doi.org/10.1007/978-3-030-63799-6_22