Back to Search
Start Over
Run Compressed Rank/Select for Large Alphabets
- Source :
- 2018 Data Compression Conference, DCC
- Publisher :
- IEEE
-
Abstract
- Given a string of length $n$ that is composed of $r$ runs of letters from the alphabet $\{0,1,\ldots,\sigma{-}1\}$ such that $2 \le \sigma \le r$, we describe a data structure that, provided $r \le n / \log^{\omega(1)} n$, stores the string in $r\log\frac{n\sigma}{r} + o(r\log\frac{n\sigma}{r})$ bits and supports select and access queries in $O(\log\frac{\log(n/r)}{\log\log n})$ time and rank queries in $O(\log\frac{\log(n\sigma/r)}{\log\log n})$ time. We show that $r\log\frac{n(\sigma-1)}{r} - O(\log\frac{n}{r})$ bits are necessary for any such data structure and, thus, our solution is succinct. We also describe a data structure that uses $(1 + \epsilon)r\log\frac{n\sigma}{r} + O(r)$ bits, where $\epsilon > 0$ is an arbitrary constant, with the same query times but without the restriction $r \le n / \log^{\omega(1)} n$. By simple reductions to the colored predecessor problem, we show that the query times are optimal in the important case $r \ge 2^{\log^\delta n}$, for an arbitrary constant $\delta > 0$. We implement our solution and compare it with the state of the art, showing that the closest competitors consume 31-46% more space.<br />Comment: This research has received funding from the European Union's Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie Actions H2020-MSCA-RISE-2015 BIRDS GA No. 690941. 10 pages, 1 figure, 4 tables; published in DCC'2018
- Subjects :
- FOS: Computer and information sciences
0102 computer and information sciences
02 engineering and technology
01 natural sciences
State of the art
Data structures, Arbitrary constants
Large alphabets
Combinatorics
Log-log plot
020204 information systems
TheoryofComputation_ANALYSISOFALGORITHMSANDPROBLEMCOMPLEXITY
Computer Science - Data Structures and Algorithms
0202 electrical engineering, electronic engineering, information engineering
Rank (graph theory)
Data Structures and Algorithms (cs.DS)
Run length
succinct, Data compression
String (computer science)
State (functional analysis)
rank select
Predecessor problems
Data structure
Binary logarithm
113 Computer and information sciences
010201 computation theory & mathematics
Optimal time
Alphabet
Constant (mathematics)
MathematicsofComputing_DISCRETEMATHEMATICS
Subjects
Details
- Language :
- English
- ISBN :
- 978-1-5386-4883-4
- ISBNs :
- 9781538648834
- Database :
- OpenAIRE
- Journal :
- 2018 Data Compression Conference, DCC
- Accession number :
- edsair.doi.dedup.....94b0a93c7672f5669ae119a403db3ac5
- Full Text :
- https://doi.org/10.1109/dcc.2018.00040