Back to Search Start Over

Run Compressed Rank/Select for Large Alphabets

Authors :
Dmitry Kosolobov
Juha Kärkkäinen
José Fuentes-Sepúlveda
Simon J. Puglisi
Bilgin, Ali
Marcellin, Michael W.
Serra-Sagrista, Joan
Storer, James A.
Department of Computer Science
Helsinki Institute for Information Technology
Bioinformatics
Algorithmic Bioinformatics
Source :
2018 Data Compression Conference, DCC
Publisher :
IEEE

Abstract

Given a string of length $n$ that is composed of $r$ runs of letters from the alphabet $\{0,1,\ldots,\sigma{-}1\}$ such that $2 \le \sigma \le r$, we describe a data structure that, provided $r \le n / \log^{\omega(1)} n$, stores the string in $r\log\frac{n\sigma}{r} + o(r\log\frac{n\sigma}{r})$ bits and supports select and access queries in $O(\log\frac{\log(n/r)}{\log\log n})$ time and rank queries in $O(\log\frac{\log(n\sigma/r)}{\log\log n})$ time. We show that $r\log\frac{n(\sigma-1)}{r} - O(\log\frac{n}{r})$ bits are necessary for any such data structure and, thus, our solution is succinct. We also describe a data structure that uses $(1 + \epsilon)r\log\frac{n\sigma}{r} + O(r)$ bits, where $\epsilon > 0$ is an arbitrary constant, with the same query times but without the restriction $r \le n / \log^{\omega(1)} n$. By simple reductions to the colored predecessor problem, we show that the query times are optimal in the important case $r \ge 2^{\log^\delta n}$, for an arbitrary constant $\delta > 0$. We implement our solution and compare it with the state of the art, showing that the closest competitors consume 31-46% more space.<br />Comment: This research has received funding from the European Union's Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie Actions H2020-MSCA-RISE-2015 BIRDS GA No. 690941. 10 pages, 1 figure, 4 tables; published in DCC'2018

Details

Language :
English
ISBN :
978-1-5386-4883-4
ISBNs :
9781538648834
Database :
OpenAIRE
Journal :
2018 Data Compression Conference, DCC
Accession number :
edsair.doi.dedup.....94b0a93c7672f5669ae119a403db3ac5
Full Text :
https://doi.org/10.1109/dcc.2018.00040