Back to Search Start Over

Lien entre Transformée de Burrows-Wheeler (BWT) et BWT étendue (XBW) via l'automate d'Aho-Corasick : applications en compression par encodage des longueurs

Authors :
Cazaux, Bastien
Rivals, Eric
Pisanti, Nadia
Pissis, Solon P.
Department of Computer Science
Méthodes et Algorithmes pour la Bioinformatique (MAB)
Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier (LIRMM)
Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS)-Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS)
Institut de Biologie Computationnelle (IBC)
Institut National de la Recherche Agronomique (INRA)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS)
Support from the Institut de Biologie Computationnelle (ANR-11-BINF-0002).Eric Rivals: thanks the GEM Flagship project funded from Labex NUMEV (ANR-10-LABX-0020).
University of Pisa
Nadia Pisanti
Solon P. Pissis
ANR-11-BINF-0002,IBC,Institut de Biologie Computationnelle de Montpellier(2011)
ANR-10-LABX-0020/10-LABX-0020,NUMEV,Digital and Hardware Solutions and Modeling for the Environement and Life Sciences(2010)
ANR-11-BINF-0002,IBC,Institut de biologie Computationnelle(2011)
ANR-10-LABX-0020,NUMEV,Digital and Hardware Solutions and Modeling for the Environement and Life Sciences(2010)
Centre National de la Recherche Scientifique (CNRS)-Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS)-Université de Montpellier (UM)
Université de Montpellier (UM)-Institut National de la Recherche Agronomique (INRA)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)
Source :
Leibniz International Proceedings in Informatics (LIPIcs), 30th Annual Symposium on Combinatorial Pattern Matching (CPM), 30th Annual Symposium on Combinatorial Pattern Matching (CPM), University of Pisa, Jun 2019, Pise, Italy. pp.24:1--24:20, ⟨10.4230/LIPIcs.CPM.2019.24⟩, CPM 2019-30th Annual Symposium on Combinatorial Pattern Matching, CPM 2019-30th Annual Symposium on Combinatorial Pattern Matching, University of Pisa, Jun 2019, Pise, Italy. pp.24:1--24:20, ⟨10.4230/LIPIcs.CPM.2019.24⟩
Publication Year :
2019
Publisher :
Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2019.

Abstract

URN: urn:nbn:de:0030-drops-104955URL: http://drops.dagstuhl.de/opus/volltexte/2019/10495/ISBN ={978-3-95977-103-0},ISSN ={1868-8969},; International audience; The boom of genomic sequencing makes compression of sets of sequences inescapable. This underlies the need for multi-string indexing data structures that helps compressing the data. The most prominent example of such data structures is the Burrows-Wheeler Transform (BWT), a reversible permutation of a text that improves its compressibility. A similar data structure, the eXtended Burrows-Wheeler Transform (XBW), is able to index a tree labelled with alphabet symbols. A link between a multi-string BWT and the Aho-Corasick automaton has already been found and led to a way to build a XBW from a multi-string BWT. We exhibit a stronger link between a multi-string BWT and a XBW by using the order of the concatenation in the multi-string. This bijective link has several applications: first, it allows one to build one data structure from the other; second, it enables one to compute an ordering of the input strings that optimises a Run-Length measure (i.e., the compressibility) of the BWT or of the XBW.

Details

Language :
English
ISSN :
18688969
Database :
OpenAIRE
Journal :
Leibniz International Proceedings in Informatics (LIPIcs), 30th Annual Symposium on Combinatorial Pattern Matching (CPM), 30th Annual Symposium on Combinatorial Pattern Matching (CPM), University of Pisa, Jun 2019, Pise, Italy. pp.24:1--24:20, ⟨10.4230/LIPIcs.CPM.2019.24⟩, CPM 2019-30th Annual Symposium on Combinatorial Pattern Matching, CPM 2019-30th Annual Symposium on Combinatorial Pattern Matching, University of Pisa, Jun 2019, Pise, Italy. pp.24:1--24:20, ⟨10.4230/LIPIcs.CPM.2019.24⟩
Accession number :
edsair.doi.dedup.....c335a150fb18d8dbd27fd1d7f9198c57