Back to Search Start Over

Parallelizing filter-and-verification based exact set similarity joins on multicores.

Authors :
Fier, Fabian
Freytag, Johann-Christoph
Source :
Information Systems. Sep2022, Vol. 108, pN.PAG-N.PAG. 1p.
Publication Year :
2022

Abstract

Set similarity join (SSJ) is a well studied problem with many algorithms proposed to speed up its performance. However, its scalability and performance are rarely discussed in modern multicore environments. Existing algorithms assume a single-threaded execution that leaves the abundant parallelism provided by modern machines unused, or use distributed setups that may not yield efficient runtimes and speedups that are proportional to the amount of hardware resources (e.g., CPU cores). In this paper, we focus on a widely-used family of SSJ algorithms that are based on the filter-and-verification paradigm, and study the potential of speeding them up in the context of multicore machines. We adapt state-of-the-art SSJ algorithms including PPJoin and AllPairs. Our experiments using 12 real-world datasets highlight important findings: (1) Using the exact number of hardware-provided hyperthreads leads to optimal runtimes for most experiments, (2) hand-crafted data structures do not always lead to better performance, and (3) PPJoin's position filter is more effective in the multithreaded case compared to the single-threaded execution. • Multi-threading has not yet been considered to speed up set similarity joins. • We propose a novel data-parallel set similarity join algorithm. • Multi-threading speeds up the set similarity join 2 to 10 times. • Implementation optimizations are not benefitial for the runtime. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
03064379
Volume :
108
Database :
Academic Search Index
Journal :
Information Systems
Publication Type :
Academic Journal
Accession number :
156852871
Full Text :
https://doi.org/10.1016/j.is.2021.101912