Back to Search Start Over

Origami

Authors :
Arif Arman
Dmitri Loguinov
Source :
Proceedings of the VLDB Endowment. 15:259-271
Publication Year :
2021
Publisher :
Association for Computing Machinery (ACM), 2021.

Abstract

Mergesort is a popular algorithm for sorting real-world workloads as it is immune to data skewness, suitable for parallelization using vectorized intrinsics, and relatively simple to multi-thread. In this paper, we introduce Origami , an in-memory merge-sort framework that is optimized for scalar, as well as all current SIMD (single-instruction multiple-data) CPU architectures. For each vector-extension set (e.g., SSE, AVX2, AVX-512), we present an in-register sorter for small sequences that is up to 8× faster than prior methods and a branchless streaming merger that achieves up to a 1.5× speed-up over the naive merge. In addition, we introduce a cache-residing quad-merge tree to avoid bottlenecking on memory bandwidth and a parallel partitioning scheme to maximize thread-level concurrency. We develop an end-to-end sort with these components and produce a highly utilized mergesort pipeline by reducing the synchronization overhead between threads. Single-threaded Origami performs up to 2× faster than the closest competitor and achieves a nearly perfect speed-up in multi-core environments.

Subjects

Subjects :
General Engineering

Details

ISSN :
21508097
Volume :
15
Database :
OpenAIRE
Journal :
Proceedings of the VLDB Endowment
Accession number :
edsair.doi...........a70f3ac6ef1e8c61234c6b2a6a3ef2e5
Full Text :
https://doi.org/10.14778/3489496.3489507