Back to Search Start Over

LS-Join: Local Similarity Join on String Collections.

Authors :
Wang, Jiaying
Yang, Xiaochun
Wang, Bin
Liu, Chengfei
Source :
IEEE Transactions on Knowledge & Data Engineering. Sep2017, Vol. 29 Issue 9, p1928-1942. 15p.
Publication Year :
2017

Abstract

String similarity join, as an essential operation in applications including data integration and data cleaning, has attracted significant attention in the research community. Previous studies focus on global similarity join. In this paper, we study local similarity join with edit distance constraints, which finds string pairs from two string collections that have similar substrings. We study two kinds of local similarity join problems: checking local similar pairs and locating local similar pairs. We first consider the case where if two strings are locally similar to each other, they must share a common gram of a certain length. We show how to do efficient local similarity verification based on a matching gram pair. We propose two pruning techniques and an incremental method to further improve the efficiency of finding matching gram pairs. Then, we devise a method to locate the longest similar substring pair for two local similar strings. We conducted a comprehensive experimental study to evaluate the efficiency of these techniques. [ABSTRACT FROM PUBLISHER]

Details

Language :
English
ISSN :
10414347
Volume :
29
Issue :
9
Database :
Academic Search Index
Journal :
IEEE Transactions on Knowledge & Data Engineering
Publication Type :
Academic Journal
Accession number :
124539498
Full Text :
https://doi.org/10.1109/TKDE.2017.2687460