11 results on '"Pissis S.P."'
Search Results
2. Frequency covers for strings
- Author
-
Mhaskar, N., Smyth, W.F., Charalampopoulos, P., Crochemore, M., Pissis, S.P., Mhaskar, N., Smyth, W.F., Charalampopoulos, P., Crochemore, M., and Pissis, S.P.
- Abstract
We study a central problem of string processing: the compact representation of a string by its frequently-occurring substrings. In this paper we propose an effective, easily-computed form of quasi-periodicity in strings, the frequency cover; that is, the longest of those repeating substrings u of w, |u| > 1, that occurs the maximum number of times in w. The advantage of this generalization is that it is not only applicable to all strings but also that it is the only generalized notion of cover yet proposed, which can be computed efficiently in linear time and space. We describe a simple data structure called the repeating substring frequency array (ℛ
- Published
- 2018
3. Enhanced string covering
- Author
-
Flouri, T., Iliopoulos, C.S., Kociumaka, T., Pissis, S.P., Puglisi, S.J., Smyth, W.F., Tyczyński, W., Flouri, T., Iliopoulos, C.S., Kociumaka, T., Pissis, S.P., Puglisi, S.J., Smyth, W.F., and Tyczyński, W.
- Abstract
A factor u of a string y is a cover of y if every letter of y lies within some occurrence of u in y; thus every cover u is also a border—both prefix and suffix—of y. If u is a cover of a superstring of y then u is a seed of y. Covers and seeds are two formalisations of quasiperiodicity, and there exist linear-time algorithms for computing all the covers and seeds of y. A string y covered by u thus generalises the idea of a repetition; that is, a string composed of exact concatenations of u. Even though a string is coverable somewhat more frequently than it is a repetition, still a string that can be covered by a single u is rare. As a result, seeking to find a more generally applicable and descriptive notion of cover, many articles were written on the computation of a minimum k-cover of y; that is, the minimum cardinality set of strings of length k that collectively cover y. Unfortunately, this computation turns out to be NP-hard. Therefore, in this article, we propose new, simple, easily-computed, and widely applicable notions of string covering that provide an intuitive and useful characterisation of a string: the enhanced cover; the enhanced left cover; and the enhanced left seed.
- Published
- 2013
4. Tree indexing by pushdown automata and subtree repeats.
- Author
-
Flouri, T., Iliopoulos, C.S., Janousek, J., Melichar, B., and Pissis, S.P.
- Published
- 2011
5. Sub-nyquist audio fingerprinting for music recognition.
- Author
-
Chang, K.K., Pissis, S.P., Jang, J.-S.R., and Iliopoulos, C.S.
- Published
- 2010
- Full Text
- View/download PDF
6. Mapping short reads to a genomic sequence with circular structure.
- Author
-
Iliopoulos, C.S., Okanlawon, T.A., Pissis, S.P., and Tischler, G.
- Published
- 2010
- Full Text
- View/download PDF
7. Mapping uniquely occurring short sequences derived from high throughput technologies to a reference genome.
- Author
-
Antoniou, P., Daykin, J.W., Iliopoulos, C.S., Kourie, D., Mouchard, L., and Pissis, S.P.
- Published
- 2009
- Full Text
- View/download PDF
8. Practical and Efficient Algorithms for Degenerate and Weighted Sequences Derived from High Throughput Sequencing Technologies.
- Author
-
Antoniou, P., Iliopoulos, C.S., Mouchard, L., and Pissis, S.P.
- Published
- 2009
- Full Text
- View/download PDF
9. Reverse-Safe Text Indexing
- Author
-
Solon P. Pissis, Huiping Chen, Grigorios Loukides, Gabriele Fici, Giulia Bernardini, Bernardini, G., Chen, H., Fici, G., Loukides, G., Pissis, S. P., Bernardini G., Chen H., Fici G., Loukides G., and Pissis S.P.
- Subjects
Data structures ,Computer science ,Suffix tree ,suffix tree ,0102 computer and information sciences ,02 engineering and technology ,text indexing ,01 natural sciences ,Theoretical Computer Science ,law.invention ,Set (abstract data type) ,law ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,Pattern matching ,data privacy ,Settore INF/01 - Informatica ,Search engine indexing ,pattern matching ,Data structure ,Matrix multiplication ,010201 computation theory & mathematics ,Algorithm ,Adversary model ,Integer (computer science) - Abstract
We introduce the notion of reverse-safe data structures. These are data structures that prevent the reconstruction of the data they encode (i.e., they cannot be easily reversed). A data structure D is called z - reverse-safe when there exist at least z datasets with the same set of answers as the ones stored by D . The main challenge is to ensure that D stores as many answers to useful queries as possible, is constructed efficiently, and has size close to the size of the original dataset it encodes. Given a text of length n and an integer z , we propose an algorithm that constructs a z -reverse-safe data structure ( z -RSDS) that has size O(n) and answers decision and counting pattern matching queries of length at most d optimally, where d is maximal for any such z -RSDS. The construction algorithm takes O(nɷ log d) time, where ɷ is the matrix multiplication exponent. We show that, despite the nɷ factor, our engineered implementation takes only a few minutes to finish for million-letter texts. We also show that plugging our method in data analysis applications gives insignificant or no data utility loss. Furthermore, we show how our technique can be extended to support applications under realistic adversary models. Finally, we show a z -RSDS for decision pattern matching queries, whose size can be sublinear in n . A preliminary version of this article appeared in ALENEX 2020.
- Published
- 2021
- Full Text
- View/download PDF
10. Constructing Antidictionaries of Long Texts in Output-Sensitive Space
- Author
-
Solon P. Pissis, Golnaz Badkobeh, Alice Héliou, Gabriele Fici, Lorraine A.K. Ayad, Department of Informatics [King's College London], King‘s College London, Goldsmiths, University of London (Goldsmiths College), University of London [London], Dipartimento di Matematica e Informatica [Palermo], Università degli studi di Palermo - University of Palermo, Centrum Wiskunde & Informatica (CWI), Equipe de recherche européenne en algorithmique et biologie formelle et expérimentale (ERABLE), Inria Grenoble - Rhône-Alpes, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), Bioinformatics, AIMMS, Bio Informatics (IBIVU), Centrum Wiskunde & Informatica, Amsterdam (CWI), The Netherlands, Ayad L.A.K., Badkobeh G., Fici G., Heliou A., and Pissis S.P.
- Subjects
0301 basic medicine ,Antidictionary ,Settore INF/01 - Informatica ,Output sensitive algorithm ,0102 computer and information sciences ,Space (mathematics) ,01 natural sciences ,Theoretical Computer Science ,String algorithm ,Prefix ,Set (abstract data type) ,Combinatorics ,03 medical and health sciences ,030104 developmental biology ,Computational Theory and Mathematics ,010201 computation theory & mathematics ,Data compression ,Output-sensitive algorithm ,[INFO]Computer Science [cs] ,Suffix ,Alphabet ,Absent word ,Word (group theory) ,Mathematics - Abstract
A wordxthat is absent from a wordyis calledminimalif all its proper factors occur iny. Given a collection ofkwordsy1, … ,ykover an alphabetΣ, we are asked to compute the set$\mathrm {M}^{\ell }_{\{y_1,\ldots ,y_k\}}$M{y1,…,yk}ℓof minimal absent words of length at mostℓof the collection {y1, … ,yk}. The set$\mathrm {M}^{\ell }_{\{y_1,\ldots ,y_k\}}$M{y1,…,yk}ℓcontains all the wordsxsuch thatxis absent from all the words of the collection while there existi,j, such that the maximal proper suffix ofxis a factor ofyiand the maximal proper prefix ofxis a factor ofyj. In data compression, this corresponds to computing the antidictionary ofkdocuments. In bioinformatics, it corresponds to computing words that are absent from a genome ofkchromosomes. Indeed, the set$\mathrm {M}^{\ell }_{y}$Myℓof minimal absent words of a wordyis equal to$\mathrm {M}^{\ell }_{\{y_1,\ldots ,y_k\}}$M{y1,…,yk}ℓfor any decomposition ofyinto a collection of wordsy1, … ,yksuch that there is an overlap of length at leastℓ− 1 between any two consecutive words in the collection. This computation generally requiresΩ(n) space forn= |y| using any of the plenty available$\mathcal {O}(n)$O(n)-time algorithms. This is because anΩ(n)-sized text index is constructed overywhich can be impractical for largen. We do the identical computation incrementally using output-sensitive space. This goal is reasonable when$\| \mathrm {M}^{\ell }_{\{y_1,\ldots ,y_N\}}\| =o(n)$∥M{y1,…,yN}ℓ∥=o(n), for allN∈ [1,k], where ∥S∥ denotes the sum of the lengths of words in setS. For instance, in the human genome,n≈ 3 × 109but$\| \mathrm {M}^{12}_{\{y_1,\ldots ,y_k\}}\| \approx 10^{6}$∥M{y1,…,yk}12∥≈106. We consider a constant-sized alphabet for stating our results. We show thatall$\mathrm {M}^{\ell }_{y_{1}},\ldots ,\mathrm {M}^{\ell }_{\{y_1,\ldots ,y_k\}}$My1ℓ,…,M{y1,…,yk}ℓcan be computed in$\mathcal {O}(kn+{\sum }^{k}_{N=1}\| \mathrm {M}^{\ell }_{\{y_1,\ldots ,y_N\}}\| )$O(kn+∑N=1k∥M{y1,…,yN}ℓ∥)total time using$\mathcal {O}(\textsc {MaxIn}+\textsc {MaxOut})$O(MaxIn+MaxOut)space, where MaxIn is the length of the longest word in {y1, … ,yk} and$\textsc {MaxOut}=\max \limits \{\| \mathrm {M}^{\ell }_{\{y_1,\ldots ,y_N\}}\| :N\in [1,k]\}$MaxOut=max{∥M{y1,…,yN}ℓ∥:N∈[1,k]}. Proof-of-concept experimental results are also provided confirming our theoretical findings and justifying our contribution.
- Published
- 2021
- Full Text
- View/download PDF
11. Constructing Antidictionaries in Output-Sensitive Space
- Author
-
Golnaz Badkobeh, Alice Héliou, Gabriele Fici, Solon P. Pissis, Lorraine A.K. Ayad, Department of Informatics [King's College London], King‘s College London, Goldsmiths, University of London (Goldsmiths College), University of London [London], Dipartimento di Matematica e Informatica [Palermo], Università degli studi di Palermo - University of Palermo, Centrum Wiskunde & Informatica (CWI), Equipe de recherche européenne en algorithmique et biologie formelle et expérimentale (ERABLE), Inria Grenoble - Rhône-Alpes, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), Department of Computing, Goldsmiths, University of London, Dipartimento di Matematica e Informatica, Università degli Studi di Palermo, Palermo, Italy, Storer, James A., Bilgin, Ali, Serra-Sagrista, Joan, Marcellin, Michael W., Ayad L.A.K., Badkobeh G., Fici G., Heliou A., Pissis S.P., and Centrum Wiskunde & Informatica, Amsterdam (CWI), The Netherlands
- Subjects
FOS: Computer and information sciences ,Settore ING-INF/05 - Sistemi Di Elaborazione Delle Informazioni ,Output sensitive algorithms ,String algorithms ,Physics ,Antidictionarie ,Settore INF/01 - Informatica ,Output sensitive algorithm ,0102 computer and information sciences ,Absent words ,Space (mathematics) ,01 natural sciences ,Antidictionaries ,Combinatorics ,010201 computation theory & mathematics ,TheoryofComputation_ANALYSISOFALGORITHMSANDPROBLEMCOMPLEXITY ,Data compression ,Computer Science - Data Structures and Algorithms ,Data Structures and Algorithms (cs.DS) ,Computer Science::Symbolic Computation ,[INFO]Computer Science [cs] ,Absent word ,Alphabet ,Word (group theory) - Abstract
A word $x$ that is absent from a word $y$ is called minimal if all its proper factors occur in $y$. Given a collection of $k$ words $y_1,y_2,\ldots,y_k$ over an alphabet $\Sigma$, we are asked to compute the set $\mathrm{M}^{\ell}_{y_{1}\#\ldots\#y_{k}}$ of minimal absent words of length at most $\ell$ of word $y=y_1\#y_2\#\ldots\#y_k$, $\#\notin\Sigma$. In data compression, this corresponds to computing the antidictionary of $k$ documents. In bioinformatics, it corresponds to computing words that are absent from a genome of $k$ chromosomes. This computation generally requires $\Omega(n)$ space for $n=|y|$ using any of the plenty available $\mathcal{O}(n)$-time algorithms. This is because an $\Omega(n)$-sized text index is constructed over $y$ which can be impractical for large $n$. We do the identical computation incrementally using output-sensitive space. This goal is reasonable when $||\mathrm{M}^{\ell}_{y_{1}\#\ldots\#y_{N}}||=o(n)$, for all $N\in[1,k]$. For instance, in the human genome, $n \approx 3\times 10^9$ but $||\mathrm{M}^{12}_{y_{1}\#\ldots\#y_{k}}|| \approx 10^6$. We consider a constant-sized alphabet for stating our results. We show that all $\mathrm{M}^{\ell}_{y_{1}},\ldots,\mathrm{M}^{\ell}_{y_{1}\#\ldots\#y_{k}}$ can be computed in $\mathcal{O}(kn+\sum^{k}_{N=1}||\mathrm{M}^{\ell}_{y_{1}\#\ldots\#y_{N}}||)$ total time using $\mathcal{O}(\mathrm{MaxIn}+\mathrm{MaxOut})$ space, where $\mathrm{MaxIn}$ is the length of the longest word in $\{y_1,\ldots,y_{k}\}$ and $\mathrm{MaxOut}=\max\{||\mathrm{M}^{\ell}_{y_{1}\#\ldots\#y_{N}}||:N\in[1,k]\}$. Proof-of-concept experimental results are also provided confirming our theoretical findings and justifying our contribution., Comment: Version accepted to DCC 2019
- Published
- 2019
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.