Back to Search Start Over

Efficient algorithms for Longest Common Subsequence of two bucket orders to speed up pairwise genetic map comparison

Authors :
Lisa De Mattéo
Sèverine Bérard
Vincent Ranwez
Yan Holtz
Institut des Sciences de l'Evolution de Montpellier (UMR ISEM)
Centre de Coopération Internationale en Recherche Agronomique pour le Développement (Cirad)-École pratique des hautes études (EPHE)-Université de Montpellier (UM)-Institut de recherche pour le développement [IRD] : UR226-Centre National de la Recherche Scientifique (CNRS)
Queensland Brain Institute
University of Queensland [Brisbane]
Amélioration génétique et adaptation des plantes méditerranéennes et tropicales (UMR AGAP)
Institut national d’études supérieures agronomiques de Montpellier (Montpellier SupAgro)-Institut National de la Recherche Agronomique (INRA)-Centre de Coopération Internationale en Recherche Agronomique pour le Développement (Cirad)-Centre international d'études supérieures en sciences agronomiques (Montpellier SupAgro)
Centre de Coopération Internationale en Recherche Agronomique pour le Développement (Cirad)-École Pratique des Hautes Études (EPHE)
Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Université de Montpellier (UM)-Institut de recherche pour le développement [IRD] : UR226-Centre National de la Recherche Scientifique (CNRS)
Centre de Coopération Internationale en Recherche Agronomique pour le Développement (Cirad)-Institut National de la Recherche Agronomique (INRA)-Centre international d'études supérieures en sciences agronomiques (Montpellier SupAgro)-Institut national d’études supérieures agronomiques de Montpellier (Montpellier SupAgro)
Centre de Coopération Internationale en Recherche Agronomique pour le Développement (Cirad)-École pratique des hautes études (EPHE)
Institut national d'enseignement supérieur pour l'agriculture, l'alimentation et l'environnement (Institut Agro)-Institut national d'enseignement supérieur pour l'agriculture, l'alimentation et l'environnement (Institut Agro)
Source :
PLoS ONE, PLoS ONE, Public Library of Science, 2018, 13 (12), pp.e0208838. ⟨10.1371/journal.pone.0208838⟩, PLoS ONE, 2018, 13 (12), pp.e0208838. ⟨10.1371/journal.pone.0208838⟩, PLoS ONE, Vol 13, Iss 12, p e0208838 (2018), Plos One 12 (13), . (2018)
Publication Year :
2018
Publisher :
HAL CCSD, 2018.

Abstract

AGAP: équipe GE2pop; International audience; Genetic maps order genetic markers along chromosomes. They are, for instance, extensively used in marker-assisted selection to accelerate breeding programs. Even for the same species, people often have to deal with several alternative maps obtained using different ordering methods or different datasets, e.g. resulting from different segregating populations. Having efficient tools to identify the consistency and discrepancy of alternative maps is thus essential to facilitate genetic map comparisons. We propose to encode genetic maps by bucket order, a kind of order, which takes into account the blurred parts of the marker order while being an efficient data structure to achieve low complexity algorithms. The main result of this paper is an O(n log(n)) procedure to identify the largest agreements between two bucket orders of n elements, their Longest Common Subsequence (LCS), providing an efficient solution to highlight discrepancies between two genetic maps. The LCS of two maps, being the largest set of their collinear markers, is used as a building block to compute pairwise map congruence, to visually emphasize maker collinearity and in some scaffolding methods relying on genetic maps to improve genome assembly. As the LCS computation is a key subroutine of all these genetic map related tools, replacing the current LCS subroutine of those methods by ours - to do the exact same work but faster-could significantly speed up those methods without changing their accuracy. To ease such transition we provide all required algorithmic details in this self contained paper as well as an R package implementing them, named LCSLCIS, which is freely available at: https://github.com/holtzy/LCSLCIS.

Details

Language :
English
ISSN :
19326203
Database :
OpenAIRE
Journal :
PLoS ONE, PLoS ONE, Public Library of Science, 2018, 13 (12), pp.e0208838. ⟨10.1371/journal.pone.0208838⟩, PLoS ONE, 2018, 13 (12), pp.e0208838. ⟨10.1371/journal.pone.0208838⟩, PLoS ONE, Vol 13, Iss 12, p e0208838 (2018), Plos One 12 (13), . (2018)
Accession number :
edsair.doi.dedup.....daf1f6784929ff66c9b87ce1e01073f3