Back to Search
Start Over
Suffix rank
- Source :
- Proceedings of the VLDB Endowment. 13:2787-2800
- Publication Year :
- 2020
- Publisher :
- Association for Computing Machinery (ACM), 2020.
-
Abstract
- We investigate the problem of building a suffix array substring index for inputs significantly larger than main memory. This problem is especially important in the context of biological sequence analysis, where biological polymers can be thought of as very large contiguous strings. The objective is to index every substring of these long strings to facilitate efficient queries. We propose a new simple, scalable, and inherently parallelizable algorithm for building a suffix array for out-of-core strings. Our new algorithm, Suffix Rank , scales to arbitrarily large inputs, using disk as a memory extension. It solves the problem in just O (log n ) scans over the disk-resident data. We evaluate the practical performance of our new algorithm, and show that for inputs significantly larger than the available amount of RAM, it scales better than other state-of-the-art solutions, such as eSAIS, SAscan , and eGSA.
- Subjects :
- 0303 health sciences
Theoretical computer science
Computer science
030302 biochemistry & molecular biology
String (computer science)
Search engine indexing
Rank (computer programming)
General Engineering
Suffix array
Context (language use)
Substring
law.invention
03 medical and health sciences
law
Data_FILES
Suffix
Substring index
030304 developmental biology
Subjects
Details
- ISSN :
- 21508097
- Volume :
- 13
- Database :
- OpenAIRE
- Journal :
- Proceedings of the VLDB Endowment
- Accession number :
- edsair.doi...........27d8f67fd151331bcdb608217be80977