Back to Search Start Over

Suffix rank

Authors :
Jonathan Gabor
Marina Barsky
Mariano P. Consens
Alex Thomo
Source :
Proceedings of the VLDB Endowment. 13:2787-2800
Publication Year :
2020
Publisher :
Association for Computing Machinery (ACM), 2020.

Abstract

We investigate the problem of building a suffix array substring index for inputs significantly larger than main memory. This problem is especially important in the context of biological sequence analysis, where biological polymers can be thought of as very large contiguous strings. The objective is to index every substring of these long strings to facilitate efficient queries. We propose a new simple, scalable, and inherently parallelizable algorithm for building a suffix array for out-of-core strings. Our new algorithm, Suffix Rank , scales to arbitrarily large inputs, using disk as a memory extension. It solves the problem in just O (log n ) scans over the disk-resident data. We evaluate the practical performance of our new algorithm, and show that for inputs significantly larger than the available amount of RAM, it scales better than other state-of-the-art solutions, such as eSAIS, SAscan , and eGSA.

Details

ISSN :
21508097
Volume :
13
Database :
OpenAIRE
Journal :
Proceedings of the VLDB Endowment
Accession number :
edsair.doi...........27d8f67fd151331bcdb608217be80977