1. Detection of significant patterns by compression algorithms: the case of approximate tandem repeats in DNA sequences
- Author
-
Marie-Odile Delorme, Alain Hénaut, Olivier Delgrange, Jean-Paul Delahaye, Max Dauchet, Emmanuelle Ollivier, and Eric Rivals
- Subjects
Statistics and Probability ,Molecular Sequence Data ,Saccharomyces cerevisiae ,Type (model theory) ,Biology ,ENCODE ,Biochemistry ,DNA sequencing ,Tandem repeat ,DNA, Fungal ,Molecular Biology ,Structural unit ,Repetitive Sequences, Nucleic Acid ,Sequence (medicine) ,Base Sequence ,Nucleic acid sequence ,DNA ,Sequence Analysis, DNA ,Computer Science Applications ,Computational Mathematics ,Computational Theory and Mathematics ,Evaluation Studies as Topic ,Chromosomes, Fungal ,Algorithm ,Algorithms ,Software ,Data compression - Abstract
Motivation: Compression algorithms can be used to analyse genetic sequences. A compression algorithm tests a given property on the sequence and uses it to encode the sequence: if the property is true, it reveals some structure of the sequence which can be described briefly, this yields a description of the sequence which is shorter than the sequence of nucleotides given in extenso. The more a sequence is compressed by the algorithm, the more significant is the property for that sequence. Results: We present a compression algorithm that tests the presence of a particular type of dosDNA (defined ordered sequence-DNA): approximate tandem repeats of small motifs (i.e. of lengths
- Published
- 1997
- Full Text
- View/download PDF