Back to Search
Start Over
Efficient algorithms for Longest Common Subsequence of two bucket orders to speed up pairwise genetic map comparison
- Source :
- PLoS ONE, PLoS ONE, Public Library of Science, 2018, 13 (12), pp.e0208838. ⟨10.1371/journal.pone.0208838⟩, PLoS ONE, 2018, 13 (12), pp.e0208838. ⟨10.1371/journal.pone.0208838⟩, PLoS ONE, Vol 13, Iss 12, p e0208838 (2018), Plos One 12 (13), . (2018)
- Publication Year :
- 2018
- Publisher :
- HAL CCSD, 2018.
-
Abstract
- AGAP: équipe GE2pop; International audience; Genetic maps order genetic markers along chromosomes. They are, for instance, extensively used in marker-assisted selection to accelerate breeding programs. Even for the same species, people often have to deal with several alternative maps obtained using different ordering methods or different datasets, e.g. resulting from different segregating populations. Having efficient tools to identify the consistency and discrepancy of alternative maps is thus essential to facilitate genetic map comparisons. We propose to encode genetic maps by bucket order, a kind of order, which takes into account the blurred parts of the marker order while being an efficient data structure to achieve low complexity algorithms. The main result of this paper is an O(n log(n)) procedure to identify the largest agreements between two bucket orders of n elements, their Longest Common Subsequence (LCS), providing an efficient solution to highlight discrepancies between two genetic maps. The LCS of two maps, being the largest set of their collinear markers, is used as a building block to compute pairwise map congruence, to visually emphasize maker collinearity and in some scaffolding methods relying on genetic maps to improve genome assembly. As the LCS computation is a key subroutine of all these genetic map related tools, replacing the current LCS subroutine of those methods by ours - to do the exact same work but faster-could significantly speed up those methods without changing their accuracy. To ease such transition we provide all required algorithmic details in this self contained paper as well as an R package implementing them, named LCSLCIS, which is freely available at: https://github.com/holtzy/LCSLCIS.
- Subjects :
- Heredity
Computer science
Genetic Linkage
Sequence assembly
02 engineering and technology
01 natural sciences
Linkage Disequilibrium
Longest common subsequence problem
ComputingMilieux_MISCELLANEOUS
Multidisciplinary
Applied Mathematics
Simulation and Modeling
Chromosome Mapping
Software Engineering
010201 computation theory & mathematics
Physical Sciences
Comparators
Medicine
Engineering and Technology
Algorithm
Algorithms
Research Article
Genetic Markers
Computer and Information Sciences
Sequence analysis
Science
0206 medical engineering
[INFO.INFO-DS]Computer Science [cs]/Data Structures and Algorithms [cs.DS]
0102 computer and information sciences
Research and Analysis Methods
Set (abstract data type)
Genetics
Molecular Biology Techniques
Linkage Mapping
Molecular Biology
Preprocessing
Selection (genetic algorithm)
Block (data storage)
Models, Genetic
Gene Mapping
Chromosome
Biology and Life Sciences
Sequence Analysis, DNA
Data structure
Genetic marker
Pairwise comparison
Electronics
[INFO.INFO-BI]Computer Science [cs]/Bioinformatics [q-bio.QM]
020602 bioinformatics
Mathematics
Subjects
Details
- Language :
- English
- ISSN :
- 19326203
- Database :
- OpenAIRE
- Journal :
- PLoS ONE, PLoS ONE, Public Library of Science, 2018, 13 (12), pp.e0208838. ⟨10.1371/journal.pone.0208838⟩, PLoS ONE, 2018, 13 (12), pp.e0208838. ⟨10.1371/journal.pone.0208838⟩, PLoS ONE, Vol 13, Iss 12, p e0208838 (2018), Plos One 12 (13), . (2018)
- Accession number :
- edsair.doi.dedup.....daf1f6784929ff66c9b87ce1e01073f3