Author: "Cazaux, Bastien" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Cazaux, Bastien"' showing total 132 results

Start Over Author "Cazaux, Bastien"

132 results on '"Cazaux, Bastien"'

1. Algorithms and Complexity on Indexing Founder Graphs

Author: Equi, Massimo, Norri, Tuukka, Alanko, Jarno, Cazaux, Bastien, Tomescu, Alexandru I., and Mäkinen, Veli
Published: 2023
Full Text: View/download PDF

2. A Linear Time Algorithm for Constructing Hierarchical Overlap Graphs

Author: Park, Sangsoo, Park, Sung Gwan, Cazaux, Bastien, Park, Kunsoo, and Rivals, Eric
Subjects: Computer Science - Data Structures and Algorithms
Abstract: The hierarchical overlap graph (HOG) is a graph that encodes overlaps from a given set P of n strings, as the overlap graph does. A best known algorithm constructs HOG in O(||P|| log n) time and O(||P||) space, where ||P|| is the sum of lengths of the strings in P. In this paper we present a new algorithm to construct HOG in O(||P||) time and space. Hence, the construction time and space of HOG are better than those of the overlap graph, which are O(||P|| + n^2)., Comment: 8 pages, 2 figures, submitted to CPM 2021
Published: 2021

3. Algorithms and Complexity on Indexing Founder Graphs

Author: Equi, Massimo, Norri, Tuukka, Alanko, Jarno, Cazaux, Bastien, Tomescu, Alexandru I., and Mäkinen, Veli
Subjects: Computer Science - Data Structures and Algorithms, Computer Science - Computational Complexity, E.1, E.4, F.1.3, F.2.2
Abstract: We study the problem of matching a string in a labeled graph. Previous research has shown that unless the Orthogonal Vectors Hypothesis (OVH) is false, one cannot solve this problem in strongly sub-quadratic time, nor index the graph in polynomial time to answer queries efficiently (Equi et al. ICALP 2019, SOFSEM 2021). These conditional lower-bounds cover even deterministic graphs with binary alphabet, but there naturally exist also graph classes that are easy to index: E.g. Wheeler graphs (Gagie et al. Theor. Comp. Sci. 2017) cover graphs admitting a Burrows-Wheeler transform -based indexing scheme. However, it is NP-complete to recognize if a graph is a Wheeler graph (Gibney, Thankachan, ESA 2019). We propose an approach to alleviate the construction bottleneck of Wheeler graphs. Rather than starting from an arbitrary graph, we study graphs induced from multiple sequence alignments (MSAs). Elastic degenerate strings (Bernadini et al. SPIRE 2017, ICALP 2019) can be seen as such graphs, and we introduce here their generalization: elastic founder graphs. We first prove that even such induced graphs are hard to index under OVH. Then we introduce two subclasses, repeat-free and semi-repeat-free graphs, that are easy to index. We give a linear time algorithm to construct a repeat-free non-elastic founder graph from a gapless MSA, and (parameterized) near-linear time algorithms to construct semi-repeat-free (repeat-free, respectively) elastic founder graphs from general MSAs. Finally, we show that repeat-free elastic founder graphs admit a reduction to Wheeler graphs in polynomial time., Comment: This is an extended full version of WABI 2020 paper (https://doi.org/10.4230/LIPIcs.WABI.2020.7), whose preprint is in arXiv:2005.09342, and of ISAAC 2021 paper (to appear)
Published: 2021

4. Greedy-reduction from Shortest Linear Superstring to Shortest Circular Superstring

Author: Cazaux, Bastien and Rivals, Eric
Subjects: Computer Science - Data Structures and Algorithms, 68W32, F.2.2
Abstract: A superstring of a set of strings correspond to a string which contains all the other strings as substrings. The problem of finding the Shortest Linear Superstring is a well-know and well-studied problem in stringology. We present here a variant of this problem, the Shortest Circular Superstring problem where the sought superstring is a circular string. We show a strong link between these two problems and prove that the Shortest Circular Superstring problem is NP-complete. Moreover, we propose a new conjecture on the approximation ratio of the Shortest Circular Superstring problem.
Published: 2020

5. Linear Time Construction of Indexable Founder Block Graphs

Author: Mäkinen, Veli, Cazaux, Bastien, Equi, Massimo, Norri, Tuukka, and Tomescu, Alexandru I.
Subjects: Computer Science - Data Structures and Algorithms, E.1, F.2.2, J.3
Abstract: We introduce a compact pangenome representation based on an optimal segmentation concept that aims to reconstruct founder sequences from a multiple sequence alignment (MSA). Such founder sequences have the feature that each row of the MSA is a recombination of the founders. Several linear time dynamic programming algorithms have been previously devised to optimize segmentations that induce founder blocks that then can be concatenated into a set of founder sequences. All possible concatenation orders can be expressed as a founder block graph. We observe a key property of such graphs: if the node labels (founder segments) do not repeat in the paths of the graph, such graphs can be indexed for efficient string matching. We call such graphs segment repeat-free founder block graphs. We give a linear time algorithm to construct a segment repeat-free founder block graph given an MSA. The algorithm combines techniques from the founder segmentation algorithms (Cazaux et al. SPIRE 2019) and fully-functional bidirectional Burrows-Wheeler index (Belazzougui and Cunial, CPM 2019). We derive a succinct index structure to support queries of arbitrary length in the paths of the graph. Experiments on an MSA of SAR-CoV-2 strains are reported. An MSA of size $410\times 29811$ is compacted in one minute into a segment repeat-free founder block graph of 3900 nodes and 4440 edges. The maximum length and total length of node labels is 12 and 34968, respectively. The index on the graph takes only $3\%$ of the size of the MSA.
Published: 2020

6. Strong link between BWT and XBW via Aho-Corasick automaton and applications to Run-Length Encoding

Author: Cazaux, Bastien and Rivals, Eric
Subjects: Computer Science - Data Structures and Algorithms
Abstract: The boom of genomic sequencing makes compression of set of sequences inescapable. This underlies the need for multi-string indexing data structures that helps compressing the data. The most prominent example of such data structures is the Burrows-Wheeler Transform (BWT), a reversible permutation of a text that improves its compressibility. A similar data structure, the eXtended Burrows-Wheeler Transform (XBW), is able to index a tree labelled with alphabet symbols. A link between a multi-string BWT and the Aho-Corasick automaton has already been found and led to a way to build a XBW from a multi-string BWT. We exhibit a stronger link between a multi-string BWT and a XBW by using the order of the concatenation in the multi-string. This bijective link has several applications: first, it allows to build one data structure from the other; second, it enables one to compute an ordering of the input strings that optimises a Run-Length measure (i.e., the compressibility) of the BWT or of the XBW.
Published: 2018
Full Text: View/download PDF

7. Minimum Segmentation for Pan-genomic Founder Reconstruction in Linear Time

Author: Norri, Tuukka, Cazaux, Bastien, Kosolobov, Dmitry, and Mäkinen, Veli
Subjects: Computer Science - Data Structures and Algorithms
Abstract: Given a threshold $L$ and a set $\mathcal{R} = \{R_1, \ldots, R_m\}$ of $m$ haplotype sequences, each having length $n$, the minimum segmentation problem for founder reconstruction is to partition the sequences into disjoint segments $\mathcal{R}[i_1{+}1,i_2], \mathcal{R}[i_2{+}1, i_3], \ldots, \mathcal{R}[i_{r-1}{+}1, i_r]$, where $0 = i_1 < \cdots < i_r = n$ and $\mathcal{R}[i_{j-1}{+}1, i_j]$ is the set $\{R_1[i_{j-1}{+}1, i_j], \ldots, R_m[i_{j-1}{+}1, i_j]\}$, such that the length of each segment, $i_j - i_{j-1}$, is at least $L$ and $K = \max_j\{ |\mathcal{R}[i_{j-1}{+}1, i_j]| \}$ is minimized. The distinct substrings in the segments $\mathcal{R}[i_{j-1}{+}1, i_j]$ represent founder blocks that can be concatenated to form $K$ founder sequences representing the original $\mathcal{R}$ such that crossovers happen only at segment boundaries. We give an optimal $O(mn)$ time algorithm to solve the problem, improving over earlier $O(mn^2)$. This improvement enables to exploit the algorithm on a pan-genomic setting of haplotypes being complete human chromosomes, with a goal of finding a representative set of references that can be indexed for read alignment and variant calling.
Published: 2018
Full Text: View/download PDF

8. Hierarchical Overlap Graph

Author: Cazaux, Bastien and Rivals, Eric
Subjects: Computer Science - Data Structures and Algorithms, 68W32, 68W40, F.2.2
Abstract: Given a set of finite words, the Overlap Graph (OG) is a complete weighted digraph where each word is a node and where the weight of an arc equals the length of the longest overlap of one word onto the other (Overlap is an asymmetric notion). The OG serves to assemble DNA fragments or to compute shortest superstrings which are a compressed representation of the input. The OG requires a space is quadratic in the number of words, which limits its scalability. The Hierarchical Overlap Graph (HOG) is an alternative graph that also encodes all maximal overlaps, but uses a space that is linear in the sum of the lengths of the input words. We propose the first algorithm to build the HOG in linear space for words of equal length., Comment: 11 pages, 3 figures, appendix, 6 graphics, 2 algorithms, 13 references
Published: 2018
Full Text: View/download PDF

9. The Compressed Overlap Index

Author: Canovas, Rodrigo, Cazaux, Bastien, and Rivals, Eric
Subjects: Computer Science - Data Structures and Algorithms
Abstract: For analysing text algorithms, for computing superstrings, or for testing random number generators, one needs to compute all overlaps between any pairs of words in a given set. The positions of overlaps of a word onto itself, or of two words, are needed to compute the absence probability of a word in a random text, or the numbers of common words shared by two random texts. In all these contexts, one needs to compute or to query overlaps between pairs of words in a given set. For this sake, we designed COvI, a compressed overlap index that supports multiple queries on overlaps: like computing the correlation of two words, or listing pairs of words whose longest overlap is maximal among all possible pairs. COvI stores overlaps in a hierarchical and non-redundant manner. We propose an implementation that can handle datasets of millions of words and still answer queries efficiently. Comparison with a baseline solution - called FullAC - relying on the Aho-Corasick automaton shows that COvI provides significant advantages. For similar construction times, COvI requires half the memory FullAC, and still solves complex queries much faster.
Published: 2017

10. Efficient Construction of Hierarchical Overlap Graphs

Author: Park, Sung Gwan, Cazaux, Bastien, Park, Kunsoo, Rivals, Eric, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Boucher, Christina, editor, and Thankachan, Sharma V., editor
Published: 2020
Full Text: View/download PDF

11. Read Mapping on de Bruijn graph

Author: Limasset, Antoine, Cazaux, Bastien, Rivals, Eric, and Peterlongo, Pierre
Subjects: Computer Science - Data Structures and Algorithms, Quantitative Biology - Genomics
Abstract: Background Next Generation Sequencing (NGS) has dramatically enhanced our ability to sequence genomes, but not to assemble them. In practice, many published genome sequences remain in the state of a large set of contigs. Each contig describes the sequence found along some path of the assembly graph, however, the set of contigs does not record all the sequence information contained in that graph. Although many subsequent analyses can be performed with the set of contigs, one may ask whether mapping reads on the contigs is as informative as mapping them on the paths of the assembly graph. Currently, one lacks practical tools to perform mapping on such graphs. Results Here, we propose a formal definition of mapping on a de Bruijn graph, analyse the problem complexity which turns out to be NP-complete, and provide a practical solution.We propose a pipeline called GGMAP (Greedy Graph MAPping). Its novelty is a procedure to map reads on branching paths of the graph, for which we designed a heuristic algorithm called BGREAT (de Bruijn Graph REAd mapping Tool). For the sake of efficiency, BGREAT rewrites a read sequence as a succession of unitigs sequences. GGMAP can map millions of reads per CPU hour on a de Bruijn graph built from a large set of human genomic reads. Surprisingly, results show that up to 22% more reads can be mapped on the graph but not on the contig set. Conclusions Although mapping reads on a de Bruijn graph is complex task, our proposal offers a practical solution combining efficiency with an improved mapping capacity compared to assembly-based mapping even for complex eukaryotic data. Availability: github.com/Malfoy/BGREAT Keywords: Read mapping; De bruijn graphs; NGS; NP-completeness, Comment: BMC Bioinformatics
Published: 2015
Full Text: View/download PDF

12. Conway–Bromage–Lyndon (CBL): an exact, dynamic representation of k-mer sets.

Author: Martayan, Igor, Cazaux, Bastien, Limasset, Antoine, and Marchet, Camille
Subjects: *ROTATIONAL motion, *KNIVES, *LIBRARIES, *VOCABULARY
Abstract: Summary In this article, we introduce the Conway–Bromage–Lyndon (CBL) structure, a compressed, dynamic and exact method for representing k -mer sets. Originating from Conway and Bromage's concept, CBL innovatively employs the smallest cyclic rotations of k -mers, akin to Lyndon words, to leverage lexicographic redundancies. In order to support dynamic operations and set operations, we propose a dynamic bit vector structure that draws a parallel with Elias-Fano's scheme. This structure is encapsulated in a Rust library, demonstrating a balanced blend of construction efficiency, cache locality, and compression. Our findings suggest that CBL outperforms existing dynamic k -mer set methods. Unique to this work, CBL stands out as the only known exact k -mer structure offering in-place set operations. Its different combined abilities position it as a flexible Swiss knife structure for k -mer set management. Availability and implementation https://github.com/imartayan/CBL. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

13. Linear Time Maximum Segmentation Problems in Column Stream Model

Author: Cazaux, Bastien, Kosolobov, Dmitry, Mäkinen, Veli, Norri, Tuukka, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Brisaboa, Nieves R., editor, and Puglisi, Simon J., editor
Published: 2019
Full Text: View/download PDF

14. Tinted de Bruijn Graphs for efficient read extraction from sequencing datasets

Author: Vandamme, Lea, primary, Cazaux, Bastien, additional, and Limasset, Antoine, additional
Published: 2024
Full Text: View/download PDF

15. Conway-Bromage-Lyndon (CBL): an exact, dynamic representation of k-mer sets

Author: Martayan, Igor, primary, Cazaux, Bastien, additional, Limasset, Antoine, additional, and Marchet, Camille, additional
Published: 2024
Full Text: View/download PDF

16. Efficient Construction of Hierarchical Overlap Graphs

Author: Park, Sung Gwan, primary, Cazaux, Bastien, additional, Park, Kunsoo, additional, and Rivals, Eric, additional
Published: 2020
Full Text: View/download PDF

17. Relationship between superstring and compression measures: New insights on the greedy conjecture

Author: Cazaux, Bastien and Rivals, Eric
Published: 2018
Full Text: View/download PDF

18. A survey of mapping algorithms in the long-reads era

Author: Sahlin, Kristoffer, primary, Baudeau, Thomas, additional, Cazaux, Bastien, additional, and Marchet, Camille, additional
Published: 2023
Full Text: View/download PDF

19. Superstring Graph: A New Approach for Genome Assembly

Author: Cazaux, Bastien, Sacomoto, Gustavo, Rivals, Eric, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Dondi, Riccardo, editor, Fertin, Guillaume, editor, and Mauri, Giancarlo, editor
Published: 2016
Full Text: View/download PDF

20. Finding all maximal perfect haplotype blocks in linear time

Author: Alanko, Jarno, Bannai, Hideo, Cazaux, Bastien, Peterlongo, Pierre, and Stoye, Jens
Published: 2020
Full Text: View/download PDF

21. Linear Time Maximum Segmentation Problems in Column Stream Model

Author: Cazaux, Bastien, primary, Kosolobov, Dmitry, additional, Mäkinen, Veli, additional, and Norri, Tuukka, additional
Published: 2019
Full Text: View/download PDF

22. Construction of a de Bruijn Graph for Assembly from a Truncated Suffix Tree

Author: Cazaux, Bastien, Lecroq, Thierry, Rivals, Eric, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Dediu, Adrian-Horia, editor, Formenti, Enrico, editor, Martín-Vide, Carlos, editor, and Truthe, Bianca, editor
Published: 2015
Full Text: View/download PDF

23. dipwmsearch: a Python package for searching di-PWM motifs

Author: Mille, Marie, primary, Ripoll, Julie, additional, Cazaux, Bastien, additional, and Rivals, Eric, additional
Published: 2023
Full Text: View/download PDF

24. A survey of mapping algorithms in the long-reads era

Author: Sahlin, Kristoffer, Baudeau, Thomas, Cazaux, Bastien, Marchet, Camille, Sahlin, Kristoffer, Baudeau, Thomas, Cazaux, Bastien, and Marchet, Camille
Abstract: It has been over a decade since the first publication of a method dedicated entirely to mapping long-reads. The distinctive characteristics of long reads resulted in methods moving from the seed-and-extend framework used for short reads to a seed-and-chain framework due to the seed abundance in each read. The main novelties are based on alternative seed constructs or chaining formulations. Dozens of tools now exist, whose heuristics have evolved considerably. We provide an overview of the methods used in long-read mappers. Since they are driven by implementation-specific parameters, we develop an original visualization tool to understand the parameter settings (http:// bcazaux.polytech-lille.net/Minimap2/).
Published: 2023
Full Text: View/download PDF

25. The power of greedy algorithms for approximating Max-ATSP, Cyclic Cover, and superstrings

Author: Cazaux, Bastien and Rivals, Eric
Published: 2016
Full Text: View/download PDF

26. From Indexing Data Structures to de Bruijn Graphs

Author: Cazaux, Bastien, Lecroq, Thierry, Rivals, Eric, Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Kobsa, Alfred, editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Terzopoulos, Demetri, editor, Tygar, Doug, editor, Weikum, Gerhard, editor, Kulikov, Alexander S., editor, Kuznetsov, Sergei O., editor, and Pevzner, Pavel, editor
Published: 2014
Full Text: View/download PDF

27. Linear time minimum segmentation enables scalable founder reconstruction

Author: Norri, Tuukka, Cazaux, Bastien, Kosolobov, Dmitry, and Mäkinen, Veli
Published: 2019
Full Text: View/download PDF

28. HaploBlocks: Efficient Detection of Positive Selection in Large Population Genomic Datasets

Author: Kirsch-Gerweck, Benedikt, primary, Bohnenkämper, Leonard, additional, Henrichs, Michel T, additional, Alanko, Jarno N, additional, Bannai, Hideo, additional, Cazaux, Bastien, additional, Peterlongo, Pierre, additional, Burger, Joachim, additional, Stoye, Jens, additional, and Diekmann, Yoan, additional
Published: 2023
Full Text: View/download PDF

29. dipwmsearch: a python package for searching di-PWM motifs

Author: Mille, Marie, primary, Ripoll, Julie, additional, Cazaux, Bastien, additional, and Rivals, Eric, additional
Published: 2022
Full Text: View/download PDF

30. Algorithms and Complexity on Indexing Founder Graphs

Author: Equi, Massimo, primary, Norri, Tuukka, additional, Alanko, Jarno, additional, Cazaux, Bastien, additional, Tomescu, Alexandru I., additional, and Mäkinen, Veli, additional
Published: 2022
Full Text: View/download PDF

31. Toward Optimal Fingerprint Indexing for Large Scale Genomics

Author: Clément Agret and Bastien Cazaux and Antoine Limasset, Agret, Clément, Cazaux, Bastien, Limasset, Antoine, Clément Agret and Bastien Cazaux and Antoine Limasset, Agret, Clément, Cazaux, Bastien, and Limasset, Antoine
Abstract: Motivation. To keep up with the scale of genomic databases, several methods rely on local sensitive hashing methods to efficiently find potential matches within large genome collections. Existing solutions rely on Minhash or Hyperloglog fingerprints and require reading the whole index to perform a query. Such solutions can not be considered scalable with the growing amount of documents to index. Results. We present NIQKI, a novel structure with well-designed fingerprints that lead to theoretical and practical query time improvements, outperforming state-of-the-art by orders of magnitude. Our contribution is threefold. First, we generalize the concept of Hyperminhash fingerprints in (h,m)-HMH fingerprints that can be tuned to present the lowest false positive rate given the expected sub-sampling applied. Second, we provide a structure able to index any kind of fingerprints based on inverted indexes that provide optimal queries, namely linear with the size of the output. Third, we implemented these approaches in a tool dubbed NIQKI that can index and calculate pairwise distances for over one million bacterial genomes from GenBank in a few days on a small cluster. We show that our approach can be orders of magnitude faster than state-of-the-art with comparable precision. We believe this approach can lead to tremendous improvements, allowing fast queries and scaling on extensive genomic databases.
Published: 2022
Full Text: View/download PDF

32. A survey of mapping algorithms in the long-reads era

Author: Sahlin, Kristoffer, primary, Baudeau, Thomas, additional, Cazaux, Bastien, additional, and Marchet, Camille, additional
Published: 2022
Full Text: View/download PDF

33. Superstring Graph: A New Approach for Genome Assembly

Author: Cazaux, Bastien, primary, Sacomoto, Gustavo, additional, and Rivals, Eric, additional
Published: 2016
Full Text: View/download PDF

34. Algorithms and Complexity on Indexing Elastic Founder Graphs

Author: Equi, Massimo, Norri, Tuukka, Alanko, Jarno, Cazaux, Bastien, Tomescu, Alexandru I., M��kinen, Veli, Ahn, Hee-Kap, Sadakane, Kunihiko, Department of Computer Science, Algorithmic Bioinformatics, Genome-scale Algorithmics research group / Veli Mäkinen, Bioinformatics, and Helsinki Institute for Information Technology
Subjects: Theory of computation �� Sorting and searching, Theory of computation �� Graph algorithms analysis, segmentation, Theory of computation → Problems, reductions and completeness, 113 Computer and information sciences, Theory of computation �� Pattern matching, Applied computing �� Genomics, Applied computing → Genomics, Theory of computation → Graph algorithms analysis, Theory of computation → Sorting and searching, Theory of computation → Dynamic programming, Theory of computation �� Problems, reductions and completeness, Theory of computation �� Dynamic programming, multiple sequence alignment, orthogonal vectors hypothesis, Theory of computation → Pattern matching
Abstract: We study the problem of matching a string in a labeled graph. Previous research has shown that unless the Orthogonal Vectors Hypothesis (OVH) is false, one cannot solve this problem in strongly sub-quadratic time, nor index the graph in polynomial time to answer queries efficiently (Equi et al. ICALP 2019, SOFSEM 2021). These conditional lower-bounds cover even deterministic graphs with binary alphabet, but there naturally exist also graph classes that are easy to index: E.g. Wheeler graphs (Gagie et al. Theor. Comp. Sci. 2017) cover graphs admitting a Burrows-Wheeler transform -based indexing scheme. However, it is NP-complete to recognize if a graph is a Wheeler graph (Gibney, Thankachan, ESA 2019). We propose an approach to alleviate the construction bottleneck of Wheeler graphs. Rather than starting from an arbitrary graph, we study graphs induced from multiple sequence alignments. Elastic degenerate strings (Bernadini et al. SPIRE 2017, ICALP 2019) can be seen as such graphs, and we introduce here their generalization: elastic founder graphs. We first prove that even such induced graphs are hard to index under OVH. Then we introduce two subclasses that are easy to index. Moreover, we give a near-linear time algorithm to construct indexable elastic founder graphs. This algorithm is based on an earlier segmentation algorithm for gapless multiple sequence alignments inducing non-elastic founder graphs (M��kinen et al., WABI 2020), but uses more involved techniques to cope with repetitive string collections synchronized with gaps. Finally, we show that one of the subclasses admits a reduction to Wheeler graphs in polynomial time., LIPIcs, Vol. 212, 32nd International Symposium on Algorithms and Computation (ISAAC 2021), pages 20:1-20:18
Published: 2021

35. Construction of a de Bruijn Graph for Assembly from a Truncated Suffix Tree

Author: Cazaux, Bastien, primary, Lecroq, Thierry, additional, and Rivals, Eric, additional
Published: 2015
Full Text: View/download PDF

36. Search algorithms for dinucleotide Position Weight Matrices

Author: Mille, Marie, Ripoll, Julie, Cazaux, Bastien, Rivals, Eric, Méthodes et Algorithmes pour la Bioinformatique (MAB), Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier (LIRMM), Centre National de la Recherche Scientifique (CNRS)-Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS)-Université de Montpellier (UM), Mémoires - Université de Montpellier - Faculté des sciences (UM FS), Université de Montpellier (UM), Centre National de la Recherche Scientifique (CNRS)-Université de Montpellier (UM), project FluoRib (INCa grant n°2018-131), Institut Pasteur, Paris, Société Française de Bioinformatique (S.F.B.I.), and ANR-10-LABX-0020,NUMEV,Digital and Hardware Solutions and Modeling for the Environement and Life Sciences(2010)
Subjects: index, PSSM, online search, pattern matching, binding site, motif, genomics, RNA, PWM, DNA, [INFO.INFO-BI]Computer Science [cs]/Bioinformatics [q-bio.QM], offline search, transcription factor
Abstract: National audience; Transcription regulation is an important cellular process. Specialized proteins, called Transcription Factors (TF), bind on short, specific, DNA sequences to regulate the expression of nearby genes. The sequences recognized by a TF in the vicinity of different genes are not identical, but similar. One captures the similarity of those binding site in different representations, which are generally called /motifs/. The most widely used sort of motifs are Position Weight Matrix (PWM) (also known as a position-specific weight matrix (PSWM) or position-specific scoring matrix (PSSM)). A PWM is built from a multiple alignment of "true" binding sequences and capture the observed variation of nucleotides at the different positions. Several databases (JASPAR, TRANSFAC, etc.) collect PWMs for known TFs. Those PWMs are used to scan new DNA sequences to find putative binding sites and possibly to annotate them. In the case of complete genomes, the scanning procedure for many PWM may last a long time \cite{Pizzi2011}.PWM assume that the distinct positions of the bound sequence are independent of each other. However, several works have observed that a mutation at given position influence the probability of mutation at neighboring positions. To overcome this limitation of PWMs, Kulakovskiy et al. have proposed a more complex sort of motif, called di-PWMs, which model the frequency of occurrence of dinucleotides in the binding sites (instead of mononucleotides for PWMs) \cite{KULAKOVSKIY2013}. Their studies show that di-PWMs improve in sensitivity compared to PWMs, and thus produce less false positives when scanning a sequence.Our aim is to design new search algorithms for di-PWMs, either online or offline, and to implement them. Our online scanning algorithm computes a partial score for some positions in the current window, and estimates the maximum achievable score for the whole window. If this score does not match requested threshold, the window can be discarded. Our offline approach works in two steps. As for read mapping, the genome is first preprocessed to produce an index data structure. Then, for any given motif and a threshold score, we enumerate potential matching words (that word whose score lies above the threshold) and search their occurrences in the index in optimal time. The difficulty is to design an efficient enumeration algorithm. Such an approach was developed for PWMs in the MOTIF software \cite{MOTIF}.If time suffices, we plan to compare experimentally our implementations to that of an existing search tool for di-PWMs, called SPRY-SARUS \cite{Pizzi2011}.
Published: 2021

37. Toward optimal fingerprint indexing for large scale genomics

Author: Agret, Clement, primary, Cazaux, Bastien, additional, and Limasset, Antoine, additional
Published: 2021
Full Text: View/download PDF

38. From Indexing Data Structures to de Bruijn Graphs

Author: Cazaux, Bastien, primary, Lecroq, Thierry, additional, and Rivals, Eric, additional
Published: 2014
Full Text: View/download PDF

39. Founder reconstruction enables scalable and seamless pangenomic analysis

Author: Norri, Tuukka, primary, Cazaux, Bastien, additional, Dönges, Saska, additional, Valenzuela, Daniel, additional, and Mäkinen, Veli, additional
Published: 2021
Full Text: View/download PDF

40. Algorithms and Complexity on Indexing Elastic Founder Graphs

Author: Massimo Equi and Tuukka Norri and Jarno Alanko and Bastien Cazaux and Alexandru I. Tomescu and Veli Mäkinen, Equi, Massimo, Norri, Tuukka, Alanko, Jarno, Cazaux, Bastien, Tomescu, Alexandru I., Mäkinen, Veli, Massimo Equi and Tuukka Norri and Jarno Alanko and Bastien Cazaux and Alexandru I. Tomescu and Veli Mäkinen, Equi, Massimo, Norri, Tuukka, Alanko, Jarno, Cazaux, Bastien, Tomescu, Alexandru I., and Mäkinen, Veli
Abstract: We study the problem of matching a string in a labeled graph. Previous research has shown that unless the Orthogonal Vectors Hypothesis (OVH) is false, one cannot solve this problem in strongly sub-quadratic time, nor index the graph in polynomial time to answer queries efficiently (Equi et al. ICALP 2019, SOFSEM 2021). These conditional lower-bounds cover even deterministic graphs with binary alphabet, but there naturally exist also graph classes that are easy to index: E.g. Wheeler graphs (Gagie et al. Theor. Comp. Sci. 2017) cover graphs admitting a Burrows-Wheeler transform -based indexing scheme. However, it is NP-complete to recognize if a graph is a Wheeler graph (Gibney, Thankachan, ESA 2019). We propose an approach to alleviate the construction bottleneck of Wheeler graphs. Rather than starting from an arbitrary graph, we study graphs induced from multiple sequence alignments. Elastic degenerate strings (Bernadini et al. SPIRE 2017, ICALP 2019) can be seen as such graphs, and we introduce here their generalization: elastic founder graphs. We first prove that even such induced graphs are hard to index under OVH. Then we introduce two subclasses that are easy to index. Moreover, we give a near-linear time algorithm to construct indexable elastic founder graphs. This algorithm is based on an earlier segmentation algorithm for gapless multiple sequence alignments inducing non-elastic founder graphs (Mäkinen et al., WABI 2020), but uses more involved techniques to cope with repetitive string collections synchronized with gaps. Finally, we show that one of the subclasses admits a reduction to Wheeler graphs in polynomial time.
Published: 2021
Full Text: View/download PDF

41. A Linear Time Algorithm for Constructing Hierarchical Overlap Graphs

Author: Sangsoo Park and Sung Gwan Park and Bastien Cazaux and Kunsoo Park and Eric Rivals, Park, Sangsoo, Park, Sung Gwan, Cazaux, Bastien, Park, Kunsoo, Rivals, Eric, Sangsoo Park and Sung Gwan Park and Bastien Cazaux and Kunsoo Park and Eric Rivals, Park, Sangsoo, Park, Sung Gwan, Cazaux, Bastien, Park, Kunsoo, and Rivals, Eric
Abstract: The hierarchical overlap graph (HOG) is a graph that encodes overlaps from a given set P of n strings, as the overlap graph does. A best known algorithm constructs HOG in O(||P|| log n) time and O(||P||) space, where ||P|| is the sum of lengths of the strings in P. In this paper we present a new algorithm to construct HOG in O(||P||) time and space. Hence, the construction time and space of HOG are better than those of the overlap graph, which are O(||P|| + n²).
Published: 2021
Full Text: View/download PDF

42. Hierarchical Overlap Graph

Author: Cazaux, Bastien and Rivals, Eric
Published: 2020
Full Text: View/download PDF

43. Linear Time Construction of Indexable Founder Block Graphs

Author: Cazaux, Bastien, Equi, Massimo, Norri, Tuukka, Tomescu, Alexandru I., Cazaux, Bastien, Equi, Massimo, Norri, Tuukka, and Tomescu, Alexandru I.
Published: 2020
Full Text: View/download PDF

44. Linear Time Construction of Indexable Founder Block Graphs

Author: Veli Mäkinen and Bastien Cazaux and Massimo Equi and Tuukka Norri and Alexandru I. Tomescu, Mäkinen, Veli, Cazaux, Bastien, Equi, Massimo, Norri, Tuukka, Tomescu, Alexandru I., Veli Mäkinen and Bastien Cazaux and Massimo Equi and Tuukka Norri and Alexandru I. Tomescu, Mäkinen, Veli, Cazaux, Bastien, Equi, Massimo, Norri, Tuukka, and Tomescu, Alexandru I.
Abstract: We introduce a compact pangenome representation based on an optimal segmentation concept that aims to reconstruct founder sequences from a multiple sequence alignment (MSA). Such founder sequences have the feature that each row of the MSA is a recombination of the founders. Several linear time dynamic programming algorithms have been previously devised to optimize segmentations that induce founder blocks that then can be concatenated into a set of founder sequences. All possible concatenation orders can be expressed as a founder block graph. We observe a key property of such graphs: if the node labels (founder segments) do not repeat in the paths of the graph, such graphs can be indexed for efficient string matching. We call such graphs segment repeat-free founder block graphs. We give a linear time algorithm to construct a segment repeat-free founder block graph given an MSA. The algorithm combines techniques from the founder segmentation algorithms (Cazaux et al. SPIRE 2019) and fully-functional bidirectional Burrows-Wheeler index (Belazzougui and Cunial, CPM 2019). We derive a succinct index structure to support queries of arbitrary length in the paths of the graph. Experiments on an MSA of SARS-CoV-2 strains are reported. An MSA of size 410× 29811 is compacted in one minute into a segment repeat-free founder block graph of 3900 nodes and 4440 edges. The maximum length and total length of node labels is 12 and 34968, respectively. The index on the graph takes only 3% of the size of the MSA.
Published: 2020
Full Text: View/download PDF

45. Linking BWT and XBW via Aho-Corasick Automaton: Applications to Run-Length Encoding

Author: Cazaux, Bastien, Rivals, Eric, Méthodes et Algorithmes pour la Bioinformatique (MAB), Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier (LIRMM), Centre National de la Recherche Scientifique (CNRS)-Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS)-Université de Montpellier (UM), Institut de Biologie Computationnelle (IBC), Université de Montpellier (UM)-Institut National de la Recherche Agronomique (INRA)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS), Support from the Institut de Biologie Computationnelle (ANR-11-BINF-0002).Eric Rivals: thanks the GEM Flagship project funded from Labex NUMEV (ANR-10-LABX-0020)., University of Pisa, Nadia Pisanti, Solon P. Pissis, ANR-11-BINF-0002,IBC,Institut de biologie Computationnelle(2011), ANR-10-LABX-0020,NUMEV,Digital and Hardware Solutions and Modeling for the Environement and Life Sciences(2010), Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS)-Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS), Institut National de la Recherche Agronomique (INRA)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS), ANR-11-BINF-0002,IBC,Institut de Biologie Computationnelle de Montpellier(2011), and ANR-10-LABX-0020/10-LABX-0020,NUMEV,Digital and Hardware Solutions and Modeling for the Environement and Life Sciences(2010)
Subjects: Run Length Encoding (RLE), Aho-Corasick automaton, Aho-Corasick Tree, [INFO.INFO-DS]Computer Science [cs]/Data Structures and Algorithms [cs.DS], Burrows Wheeler Transform, Compression, XBWT, [INFO.INFO-DM]Computer Science [cs]/Discrete Mathematics [cs.DM], Data Structures, Algorithm, 2012 ACM Subject Classification Mathematics of computing → Discrete mathematics, Theory of computation → Randomness, geometry and discrete structures, Theory of computation → Data structures and algorithms for data management, Indexing, [INFO.INFO-BI]Computer Science [cs]/Bioinformatics [q-bio.QM], Stringology
Abstract: URN: urn:nbn:de:0030-drops-104955URL: http://drops.dagstuhl.de/opus/volltexte/2019/10495/ISBN ={978-3-95977-103-0},ISSN ={1868-8969},; International audience; The boom of genomic sequencing makes compression of sets of sequences inescapable. This underlies the need for multi-string indexing data structures that helps compressing the data. The most prominent example of such data structures is the Burrows-Wheeler Transform (BWT), a reversible permutation of a text that improves its compressibility. A similar data structure, the eXtended Burrows-Wheeler Transform (XBW), is able to index a tree labelled with alphabet symbols. A link between a multi-string BWT and the Aho-Corasick automaton has already been found and led to a way to build a XBW from a multi-string BWT. We exhibit a stronger link between a multi-string BWT and a XBW by using the order of the concatenation in the multi-string. This bijective link has several applications: first, it allows one to build one data structure from the other; second, it enables one to compute an ordering of the input strings that optimises a Run-Length measure (i.e., the compressibility) of the BWT or of the XBW.
Published: 2019

46. Lien entre Transformée de Burrows-Wheeler (BWT) et BWT étendue (XBW) via l'automate d'Aho-Corasick : applications en compression par encodage des longueurs

Author: Cazaux, Bastien, Rivals, Eric, Pisanti, Nadia, Pissis, Solon P., Department of Computer Science, Méthodes et Algorithmes pour la Bioinformatique (MAB), Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier (LIRMM), Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS)-Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS), Institut de Biologie Computationnelle (IBC), Institut National de la Recherche Agronomique (INRA)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS), Support from the Institut de Biologie Computationnelle (ANR-11-BINF-0002).Eric Rivals: thanks the GEM Flagship project funded from Labex NUMEV (ANR-10-LABX-0020)., University of Pisa, Nadia Pisanti, Solon P. Pissis, ANR-11-BINF-0002,IBC,Institut de Biologie Computationnelle de Montpellier(2011), ANR-10-LABX-0020/10-LABX-0020,NUMEV,Digital and Hardware Solutions and Modeling for the Environement and Life Sciences(2010), ANR-11-BINF-0002,IBC,Institut de biologie Computationnelle(2011), ANR-10-LABX-0020,NUMEV,Digital and Hardware Solutions and Modeling for the Environement and Life Sciences(2010), Centre National de la Recherche Scientifique (CNRS)-Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS)-Université de Montpellier (UM), and Université de Montpellier (UM)-Institut National de la Recherche Agronomique (INRA)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)
Subjects: Run Length Encoding (RLE), 000 Computer science, knowledge, general works, Aho-Corasick automaton, Aho-Corasick Tree, [INFO.INFO-DS]Computer Science [cs]/Data Structures and Algorithms [cs.DS], education, Burrows Wheeler Transform, Compression, Data_CODINGANDINFORMATIONTHEORY, 0102 computer and information sciences, 02 engineering and technology, XBWT, [INFO.INFO-DM]Computer Science [cs]/Discrete Mathematics [cs.DM], Data Structures, 113 Computer and information sciences, 01 natural sciences, Algorithm, 2012 ACM Subject Classification Mathematics of computing → Discrete mathematics, Theory of computation → Randomness, geometry and discrete structures, Theory of computation → Data structures and algorithms for data management, 010201 computation theory & mathematics, Computer Science, 0202 electrical engineering, electronic engineering, information engineering, Indexing, 020201 artificial intelligence & image processing, [INFO.INFO-BI]Computer Science [cs]/Bioinformatics [q-bio.QM], Stringology
Abstract: URN: urn:nbn:de:0030-drops-104955URL: http://drops.dagstuhl.de/opus/volltexte/2019/10495/ISBN ={978-3-95977-103-0},ISSN ={1868-8969},; International audience; The boom of genomic sequencing makes compression of sets of sequences inescapable. This underlies the need for multi-string indexing data structures that helps compressing the data. The most prominent example of such data structures is the Burrows-Wheeler Transform (BWT), a reversible permutation of a text that improves its compressibility. A similar data structure, the eXtended Burrows-Wheeler Transform (XBW), is able to index a tree labelled with alphabet symbols. A link between a multi-string BWT and the Aho-Corasick automaton has already been found and led to a way to build a XBW from a multi-string BWT. We exhibit a stronger link between a multi-string BWT and a XBW by using the order of the concatenation in the multi-string. This bijective link has several applications: first, it allows one to build one data structure from the other; second, it enables one to compute an ordering of the input strings that optimises a Run-Length measure (i.e., the compressibility) of the BWT or of the XBW.
Published: 2019

47. Finding All Maximal Perfect Haplotype Blocks in Linear Time

Author: Alanko, Jarno, Bannai, Hideo, Cazaux, Bastien, Peterlongo, Pierre, Stoye, Jens, Alanko, Jarno, Bannai, Hideo, Cazaux, Bastien, Peterlongo, Pierre, and Stoye, Jens
Abstract: Recent large-scale community sequencing efforts allow at an unprecedented level of detail the identification of genomic regions that show signatures of natural selection. Traditional methods for identifying such regions from individuals' haplotype data, however, require excessive computing times and therefore are not applicable to current datasets. In 2019, Cunha et al. (Proceedings of BSB 2019) suggested the maximal perfect haplotype block as a very simple combinatorial pattern, forming the basis of a new method to perform rapid genome-wide selection scans. The algorithm they presented for identifying these blocks, however, had a worst-case running time quadratic in the genome length. It was posed as an open problem whether an optimal, linear-time algorithm exists. In this paper we give two algorithms that achieve this time bound, one conceptually very simple one using suffix trees and a second one using the positional Burrows-Wheeler Transform, that is very efficient also in practice.
Published: 2019
Full Text: View/download PDF

48. Finding All Maximal Perfect Haplotype Blocks in Linear Time

Author: Jarno Alanko and Hideo Bannai and Bastien Cazaux and Pierre Peterlongo and Jens Stoye, Alanko, Jarno, Bannai, Hideo, Cazaux, Bastien, Peterlongo, Pierre, Stoye, Jens, Jarno Alanko and Hideo Bannai and Bastien Cazaux and Pierre Peterlongo and Jens Stoye, Alanko, Jarno, Bannai, Hideo, Cazaux, Bastien, Peterlongo, Pierre, and Stoye, Jens
Abstract: Recent large-scale community sequencing efforts allow at an unprecedented level of detail the identification of genomic regions that show signatures of natural selection. Traditional methods for identifying such regions from individuals' haplotype data, however, require excessive computing times and therefore are not applicable to current datasets. In 2019, Cunha et al. (Proceedings of BSB 2019) suggested the maximal perfect haplotype block as a very simple combinatorial pattern, forming the basis of a new method to perform rapid genome-wide selection scans. The algorithm they presented for identifying these blocks, however, had a worst-case running time quadratic in the genome length. It was posed as an open problem whether an optimal, linear-time algorithm exists. In this paper we give two algorithms that achieve this time bound, one conceptually very simple one using suffix trees and a second one using the positional Burrows-Wheeler Transform, that is very efficient also in practice.
Published: 2019
Full Text: View/download PDF

49. Linking BWT and XBW via Aho-Corasick Automaton: Applications to Run-Length Encoding

Author: Bastien Cazaux and Eric Rivals, Cazaux, Bastien, Rivals, Eric, Bastien Cazaux and Eric Rivals, Cazaux, Bastien, and Rivals, Eric
Abstract: The boom of genomic sequencing makes compression of sets of sequences inescapable. This underlies the need for multi-string indexing data structures that helps compressing the data. The most prominent example of such data structures is the Burrows-Wheeler Transform (BWT), a reversible permutation of a text that improves its compressibility. A similar data structure, the eXtended Burrows-Wheeler Transform (XBW), is able to index a tree labelled with alphabet symbols. A link between a multi-string BWT and the Aho-Corasick automaton has already been found and led to a way to build a XBW from a multi-string BWT. We exhibit a stronger link between a multi-string BWT and a XBW by using the order of the concatenation in the multi-string. This bijective link has several applications: first, it allows one to build one data structure from the other; second, it enables one to compute an ordering of the input strings that optimises a Run-Length measure (i.e., the compressibility) of the BWT or of the XBW.
Published: 2019
Full Text: View/download PDF

50. Superstrings with multiplicities

Author: Rivals, Eric, Cazaux, Bastien, Institut de Biologie Computationnelle (IBC), Université de Montpellier (UM)-Institut National de la Recherche Agronomique (INRA)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS), Méthodes et Algorithmes pour la Bioinformatique (MAB), Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier (LIRMM), Centre National de la Recherche Scientifique (CNRS)-Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS)-Université de Montpellier (UM), Défi MASTODONS C3G http://www.lirmm.fr/~rivals/C3G/ from CNRS, Qingdao University, Gonzalo Navarro, David Sankoff, Binhai Zhu, ANR-11-BINF-0002,IBC,Institut de biologie Computationnelle(2011), Institut National de la Recherche Agronomique (INRA)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS), Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS)-Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS), ANR-11-BINF-0002,IBC,Institut de Biologie Computationnelle de Montpellier(2011), Navarro, Gonzalo, Sankoff, David, Zhu, Binhai, and Department of Computer Science
Subjects: [INFO.INFO-CC]Computer Science [cs]/Computational Complexity [cs.CC], APX, subset system, ACM: F.: Theory of Computation/F.2: ANALYSIS OF ALGORITHMS AND PROBLEM COMPLEXITY, [INFO.INFO-DS]Computer Science [cs]/Data Structures and Algorithms [cs.DS], Cyclic covers, 02 engineering and technology, ACM: G.: Mathematics of Computing/G.2: DISCRETE MATHEMATICS, 0202 electrical engineering, electronic engineering, information engineering, overlap, algorithm analysis, Approximation, 060201 languages & linguistics, 000 Computer science, knowledge, general works, Pattern matching, Approximation, 06 humanities and the arts, Cyclic cover, Subset system, String theory, 113 Computer and information sciences, ACM: F.: Theory of Computation, Approximation algorithms, Greedy algorithms, greedy algorithm, 0602 languages and literature, Computer Science, 020201 artificial intelligence & image processing, ACM: G.: Mathematics of Computing/G.1: NUMERICAL ANALYSIS/G.1.2: Approximation, [INFO.INFO-BI]Computer Science [cs]/Bioinformatics [q-bio.QM]
Abstract: A superstring of a set of words P = s1, · · · , sp is a string that contains each word of P as substring. Given P, the well known Shortest Linear Superstring problem (SLS), asks for a shortest superstring of P. In a variant of SLS, called Multi-SLS, each word si comes with an integer m(i), its multiplicity, that sets a constraint on its number of occurrences, and the goal is to find a shortest superstring that contains at least m(i) occurrences of si. Multi-SLS generalizes SLS and is obviously as hard to solve, but it has been studied only in special cases (with words of length 2 or with a fixed number of words). The approximability of Multi-SLS in the general case remains open. Here, we study the approximability of Multi-SLS and that of the companion problem Multi-SCCS, which asks for a shortest cyclic cover instead of shortest superstring. First, we investigate the approximation of a greedy algorithm for maximizing the compression offered by a superstring or by a cyclic cover: the approximation ratio is 1/2 for Multi-SLS and 1 for Multi-SCCS. Then, we exhibit a linear time approximation algorithm, Concat-Greedy, and show it achieves a ratio of 4 regarding the superstring length. This demonstrates that for both measures Multi-SLS belongs to the class of APX problems. © 2018 Yoshifumi Sakai; licensed under Creative Commons License CC-BY.
Published: 2018

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

132 results on '"Cazaux, Bastien"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources