Back to Search Start Over

Finishing bacterial genome assemblies with Mix

Authors :
Virginie Dupuy
Alexis Groppi
Florence Tardy
Christine Citti
Pascal Sirand-Pugnet
Hayssam Soueidan
Florence Maurier
Macha Nikolski
NKI-AVL
The National Cancer Institute
Centre de Bioinformatique de Bordeaux (CBIB)
CGFB
Génomique, développement et pouvoir pathogène (GD2P)
Université Bordeaux Segalen - Bordeaux 2-Institut National de la Recherche Agronomique (INRA)
Laboratoire de Lyon [ANSES]
Agence nationale de sécurité sanitaire de l'alimentation, de l'environnement et du travail (ANSES)
Interactions hôtes-agents pathogènes [Toulouse] (IHAP)
Institut National de la Recherche Agronomique (INRA)-Ecole Nationale Vétérinaire de Toulouse (ENVT)
Institut National Polytechnique (Toulouse) (Toulouse INP)
Université Fédérale Toulouse Midi-Pyrénées-Université Fédérale Toulouse Midi-Pyrénées-Institut National Polytechnique (Toulouse) (Toulouse INP)
Université Fédérale Toulouse Midi-Pyrénées-Université Fédérale Toulouse Midi-Pyrénées
Ecole Nationale Vétérinaire de Toulouse (ENVT)
INRA - Mathématiques et Informatique Appliquées (Unité MIAJ)
Institut National de la Recherche Agronomique (INRA)
Laboratoire Bordelais de Recherche en Informatique (LaBRI)
Université de Bordeaux (UB)-Centre National de la Recherche Scientifique (CNRS)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)
Institut National de la Recherche Agronomique (INRA)-Université Bordeaux Segalen - Bordeaux 2
Laboratoire de Lyon
Centre National de la Recherche Scientifique (CNRS)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Université Sciences et Technologies - Bordeaux 1-Université Bordeaux Segalen - Bordeaux 2
Netherlands Cancer Institute
Centre de Bioinformatique de Bordeaux
Université Sciences et Technologies - Bordeaux 1
Biologie du fruit et pathologie (BFP)
Université Bordeaux Segalen - Bordeaux 2-Institut National de la Recherche Agronomique (INRA)-Université Sciences et Technologies - Bordeaux 1
Université Toulouse III - Paul Sabatier (UT3)
Université Fédérale Toulouse Midi-Pyrénées
Contrôle des maladies animales exotiques et émergentes (UMR CMAEE)
Institut National de la Recherche Agronomique (INRA)-Centre de Coopération Internationale en Recherche Agronomique pour le Développement (Cirad)
Institut National de Recherche en Informatique et en Automatique (INRIA). FRA.
Nikolski, Macha
Source :
HAL, BMC Bioinformatics, BMC Bioinformatics, BioMed Central, 2013, pp.doi:10.1186/1471-2105-14-S15-S16, 11. Annual Research in Computational Molecular Biology (RECOMB) Satellite Workshop on Comparative Genomics, 11. Annual Research in Computational Molecular Biology (RECOMB) Satellite Workshop on Comparative Genomics, Institut National de Recherche en Informatique et en Automatique (INRIA). FRA., Oct 2013, Lyon, France. ⟨10.1186/1471-2105-14-S15-S16⟩, Scopus-Elsevier, Eleventh Annual Research in Computational Molecular Biology (RECOMB) Satellite Workshop on Comparative Genomics. (14 (suppl. 15))2013; 11. Annual Research in Computational Molecular Biology (RECOMB) Satellite Workshop on Comparative Genomics, Lyon, FRA, 2013-10-17-2013-10-19, 1-11
Publication Year :
2013
Publisher :
BioMed Central Ltd, 2013.

Abstract

Among challenges that hamper reaping the benefits of genome assembly are both unfinished assemblies and the ensuing experimental costs. First, numerous software solutions for genome de novo assembly are available, each having its advantages and drawbacks, without clear guidelines as to how to choose among them. Second, these solutions produce draft assemblies that often require a resource intensive finishing phase. In this paper we address these two aspects by developing Mix , a tool that mixes two or more draft assemblies, without relying on a reference genome and having the goal to reduce contig fragmentation and thus speed-up genome finishing. The proposed algorithm builds an extension graph where vertices represent extremities of contigs and edges represent existing alignments between these extremities. These alignment edges are used for contig extension. The resulting output assembly corresponds to a set of paths in the extension graph that maximizes the cumulative contig length. We evaluate the performance of Mix on bacterial NGS data from the GAGE-B study and apply it to newly sequenced Mycoplasma genomes. Resulting final assemblies demonstrate a significant improvement in the overall assembly quality. In particular, Mix is consistent by providing better overall quality results even when the choice is guided solely by standard assembly statistics, as is the case for de novo projects. Mix is implemented in Python and is available at https://github.com/cbib/MIX , novel data for our Mycoplasma study is available at http://services.cbib.u-bordeaux2.fr/mix/ .

Details

Language :
English
ISSN :
14712105
Database :
OpenAIRE
Journal :
HAL, BMC Bioinformatics, BMC Bioinformatics, BioMed Central, 2013, pp.doi:10.1186/1471-2105-14-S15-S16, 11. Annual Research in Computational Molecular Biology (RECOMB) Satellite Workshop on Comparative Genomics, 11. Annual Research in Computational Molecular Biology (RECOMB) Satellite Workshop on Comparative Genomics, Institut National de Recherche en Informatique et en Automatique (INRIA). FRA., Oct 2013, Lyon, France. ⟨10.1186/1471-2105-14-S15-S16⟩, Scopus-Elsevier, Eleventh Annual Research in Computational Molecular Biology (RECOMB) Satellite Workshop on Comparative Genomics. (14 (suppl. 15))2013; 11. Annual Research in Computational Molecular Biology (RECOMB) Satellite Workshop on Comparative Genomics, Lyon, FRA, 2013-10-17-2013-10-19, 1-11
Accession number :
edsair.doi.dedup.....45c236426671aef226eabaea9fdbbb93
Full Text :
https://doi.org/10.1186/1471-2105-14-S15-S16