Back to Search
Start Over
LS-Join: Local Similarity Join on String Collections.
- Source :
-
IEEE Transactions on Knowledge & Data Engineering . Sep2017, Vol. 29 Issue 9, p1928-1942. 15p. - Publication Year :
- 2017
-
Abstract
- String similarity join, as an essential operation in applications including data integration and data cleaning, has attracted significant attention in the research community. Previous studies focus on global similarity join. In this paper, we study local similarity join with edit distance constraints, which finds string pairs from two string collections that have similar substrings. We study two kinds of local similarity join problems: checking local similar pairs and locating local similar pairs. We first consider the case where if two strings are locally similar to each other, they must share a common gram of a certain length. We show how to do efficient local similarity verification based on a matching gram pair. We propose two pruning techniques and an incremental method to further improve the efficiency of finding matching gram pairs. Then, we devise a method to locate the longest similar substring pair for two local similar strings. We conducted a comprehensive experimental study to evaluate the efficiency of these techniques. [ABSTRACT FROM PUBLISHER]
Details
- Language :
- English
- ISSN :
- 10414347
- Volume :
- 29
- Issue :
- 9
- Database :
- Academic Search Index
- Journal :
- IEEE Transactions on Knowledge & Data Engineering
- Publication Type :
- Academic Journal
- Accession number :
- 124539498
- Full Text :
- https://doi.org/10.1109/TKDE.2017.2687460