Back to Search
Start Over
Parallelizing filter-and-verification based exact set similarity joins on multicores.
- Source :
-
Information Systems . Sep2022, Vol. 108, pN.PAG-N.PAG. 1p. - Publication Year :
- 2022
-
Abstract
- Set similarity join (SSJ) is a well studied problem with many algorithms proposed to speed up its performance. However, its scalability and performance are rarely discussed in modern multicore environments. Existing algorithms assume a single-threaded execution that leaves the abundant parallelism provided by modern machines unused, or use distributed setups that may not yield efficient runtimes and speedups that are proportional to the amount of hardware resources (e.g., CPU cores). In this paper, we focus on a widely-used family of SSJ algorithms that are based on the filter-and-verification paradigm, and study the potential of speeding them up in the context of multicore machines. We adapt state-of-the-art SSJ algorithms including PPJoin and AllPairs. Our experiments using 12 real-world datasets highlight important findings: (1) Using the exact number of hardware-provided hyperthreads leads to optimal runtimes for most experiments, (2) hand-crafted data structures do not always lead to better performance, and (3) PPJoin's position filter is more effective in the multithreaded case compared to the single-threaded execution. • Multi-threading has not yet been considered to speed up set similarity joins. • We propose a novel data-parallel set similarity join algorithm. • Multi-threading speeds up the set similarity join 2 to 10 times. • Implementation optimizations are not benefitial for the runtime. [ABSTRACT FROM AUTHOR]
- Subjects :
- *ALGORITHMS
*DATA structures
*SCALABILITY
*MULTICORE processors
*SPEED
Subjects
Details
- Language :
- English
- ISSN :
- 03064379
- Volume :
- 108
- Database :
- Academic Search Index
- Journal :
- Information Systems
- Publication Type :
- Academic Journal
- Accession number :
- 156852871
- Full Text :
- https://doi.org/10.1016/j.is.2021.101912