1. Adaptive learning of compressible strings
- Author
-
Rossano Venturini, Gabriele Fici, Nicola Prezza, Fici G., Prezza N., and Venturini R.
- Subjects
FOS: Computer and information sciences ,Centroid decomposition ,General Computer Science ,String compression ,Adaptive learning ,Kolmogorov complexity ,Context (language use) ,Data_CODINGANDINFORMATIONTHEORY ,String reconstruction ,Theoretical Computer Science ,Combinatorics ,String learning ,Lempel-Ziv ,Suffix tree ,Integer ,Computer Science - Data Structures and Algorithms ,Order (group theory) ,Data Structures and Algorithms (cs.DS) ,Time complexity ,Computer Science::Databases ,Mathematics ,Settore INF/01 - Informatica ,Linear space ,String (computer science) ,Substring ,Bounded function - Abstract
Suppose an oracle knows a string $S$ that is unknown to us and that we want to determine. The oracle can answer queries of the form "Is $s$ a substring of $S$?". In 1995, Skiena and Sundaram showed that, in the worst case, any algorithm needs to ask the oracle $\sigma n/4 -O(n)$ queries in order to be able to reconstruct the hidden string, where $\sigma$ is the size of the alphabet of $S$ and $n$ its length, and gave an algorithm that spends $(\sigma-1)n+O(\sigma \sqrt{n})$ queries to reconstruct $S$. The main contribution of our paper is to improve the above upper-bound in the context where the string is compressible. We first present a universal algorithm that, given a (computable) compressor that compresses the string to $\tau$ bits, performs $q=O(\tau)$ substring queries; this algorithm, however, runs in exponential time. For this reason, the second part of the paper focuses on more time-efficient algorithms whose number of queries is bounded by specific compressibility measures. We first show that any string of length $n$ over an integer alphabet of size $\sigma$ with $rle$ runs can be reconstructed with $q=O(rle (\sigma + \log \frac{n}{rle}))$ substring queries in linear time and space. We then present an algorithm that spends $q \in O(\sigma g\log n)$ substring queries and runs in $O(n(\log n + \log \sigma)+ q)$ time using linear space, where $g$ is the size of a smallest straight-line program generating the string., Comment: Accepted for publication in Theoretical Computer Science
- Published
- 2021