Back to Search Start Over

MeDuSa: a multi-draft based scaffolder

Authors :
Marco Fondi
Marco Galardini
Emanuele Bosi
Pierluigi Crescenzi
Marie-France Sagot
Beatrice Donati
Sara Brunetti
Pietro Liò
Renato Fani
Dipartimento di Biologia Evoluzionistica 'Leo Pardi'
Università degli Studi di Firenze = University of Florence [Firenze] (UNIFI)
Dipartimento di Sistemi e Informatica (DSI)
An algorithmic view on genomes, cells, and environments (BAMBOO)
Laboratoire de Biométrie et Biologie Evolutive - UMR 5558 (LBBE)
Université Claude Bernard Lyon 1 (UCBL)
Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-VetAgro Sup - Institut national d'enseignement supérieur et de recherche en alimentation, santé animale, sciences agronomiques et de l'environnement (VAS)-Centre National de la Recherche Scientifique (CNRS)-Université Claude Bernard Lyon 1 (UCBL)
Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-VetAgro Sup - Institut national d'enseignement supérieur et de recherche en alimentation, santé animale, sciences agronomiques et de l'environnement (VAS)-Centre National de la Recherche Scientifique (CNRS)-Inria Grenoble - Rhône-Alpes
Institut National de Recherche en Informatique et en Automatique (Inria)
Baobab
Département PEGASE [LBBE] (PEGASE)
Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-VetAgro Sup - Institut national d'enseignement supérieur et de recherche en alimentation, santé animale, sciences agronomiques et de l'environnement (VAS)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire de Biométrie et Biologie Evolutive - UMR 5558 (LBBE)
Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-VetAgro Sup - Institut national d'enseignement supérieur et de recherche en alimentation, santé animale, sciences agronomiques et de l'environnement (VAS)-Centre National de la Recherche Scientifique (CNRS)
Dipartimento di Ingegneria dell'informazione e scienze matematiche [Siena] (DIISM)
Università degli Studi di Siena = University of Siena (UNISI)
Equipe de recherche européenne en algorithmique et biologie formelle et expérimentale (ERABLE)
Inria Grenoble - Rhône-Alpes
Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)
Computer Laboratory [Cambridge]
University of Cambridge [UK] (CAM)
European Project: 247073,EC:FP7:ERC,ERC-2009-AdG,SISYPHE(2010)
Università degli Studi di Firenze = University of Florence (UniFI)
Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire de Biométrie et Biologie Evolutive - UMR 5558 (LBBE)
Université de Lyon-Université de Lyon-VetAgro Sup - Institut national d'enseignement supérieur et de recherche en alimentation, santé animale, sciences agronomiques et de l'environnement (VAS)-Centre National de la Recherche Scientifique (CNRS)
Source :
Bioinformatics, Bioinformatics, Oxford University Press (OUP), 2015, pii: btv171 [Epub ahead of print]. ⟨10.1093/bioinformatics/btv171⟩, Bioinformatics, 2015, pii: btv171 [Epub ahead of print]. ⟨10.1093/bioinformatics/btv171⟩
Publication Year :
2015

Abstract

Motivation: Completing the genome sequence of an organism is an important task in comparative, functional and structural genomics. However, this remains a challenging issue from both a computational and an experimental viewpoint. Genome scaffolding (i.e. the process of ordering and orientating contigs) of de novo assemblies usually represents the first step in most genome finishing pipelines. Results: In this article we present MeDuSa (Multi-Draft based Scaffolder), an algorithm for genome scaffolding. MeDuSa exploits information obtained from a set of (draft or closed) genomes from related organisms to determine the correct order and orientation of the contigs. MeDuSa formalizes the scaffolding problem by means of a combinatorial optimization formulation on graphs and implements an efficient constant factor approximation algorithm to solve it. In contrast to currently used scaffolders, it does not require either prior knowledge on the microrganisms dataset under analysis (e.g. their phylogenetic relationships) or the availability of paired end read libraries. This makes usability and running time two additional important features of our method. Moreover, benchmarks and tests on real bacterial datasets showed that MeDuSa is highly accurate and, in most cases, outperforms traditional scaffolders. The possibility to use MeDuSa on eukaryotic datasets has also been evaluated, leading to interesting results. Availability and implementation: MeDuSa web server: http://combo.dbe.unifi.it/medusa. A stand-alone version of the software can be downloaded from https://github.com/combogenomics/medusa/releases. All results presented in this work have been obtained with MeDuSa v. 1.3. Contact: marco.fondi@unifi.it Supplementary information: Supplementary data are available at Bioinformatics online.

Details

Language :
English
ISSN :
13674803 and 13674811
Database :
OpenAIRE
Journal :
Bioinformatics, Bioinformatics, Oxford University Press (OUP), 2015, pii: btv171 [Epub ahead of print]. ⟨10.1093/bioinformatics/btv171⟩, Bioinformatics, 2015, pii: btv171 [Epub ahead of print]. ⟨10.1093/bioinformatics/btv171⟩
Accession number :
edsair.doi.dedup.....0b694938e857c63e4352ae4e21339d8f
Full Text :
https://doi.org/10.1093/bioinformatics/btv171⟩