Back to Search
Start Over
MeDuSa: a multi-draft based scaffolder
- Source :
- Bioinformatics, Bioinformatics, Oxford University Press (OUP), 2015, pii: btv171 [Epub ahead of print]. ⟨10.1093/bioinformatics/btv171⟩, Bioinformatics, 2015, pii: btv171 [Epub ahead of print]. ⟨10.1093/bioinformatics/btv171⟩
- Publication Year :
- 2015
-
Abstract
- Motivation: Completing the genome sequence of an organism is an important task in comparative, functional and structural genomics. However, this remains a challenging issue from both a computational and an experimental viewpoint. Genome scaffolding (i.e. the process of ordering and orientating contigs) of de novo assemblies usually represents the first step in most genome finishing pipelines. Results: In this article we present MeDuSa (Multi-Draft based Scaffolder), an algorithm for genome scaffolding. MeDuSa exploits information obtained from a set of (draft or closed) genomes from related organisms to determine the correct order and orientation of the contigs. MeDuSa formalizes the scaffolding problem by means of a combinatorial optimization formulation on graphs and implements an efficient constant factor approximation algorithm to solve it. In contrast to currently used scaffolders, it does not require either prior knowledge on the microrganisms dataset under analysis (e.g. their phylogenetic relationships) or the availability of paired end read libraries. This makes usability and running time two additional important features of our method. Moreover, benchmarks and tests on real bacterial datasets showed that MeDuSa is highly accurate and, in most cases, outperforms traditional scaffolders. The possibility to use MeDuSa on eukaryotic datasets has also been evaluated, leading to interesting results. Availability and implementation: MeDuSa web server: http://combo.dbe.unifi.it/medusa. A stand-alone version of the software can be downloaded from https://github.com/combogenomics/medusa/releases. All results presented in this work have been obtained with MeDuSa v. 1.3. Contact: marco.fondi@unifi.it Supplementary information: Supplementary data are available at Bioinformatics online.
- Subjects :
- Statistics and Probability
Web server
Theoretical computer science
[INFO.INFO-DS]Computer Science [cs]/Data Structures and Algorithms [cs.DS]
Genomics
Biology
computer.software_genre
Biochemistry
Genome
Structural genomics
Set (abstract data type)
03 medical and health sciences
Contig Mapping
Software
Molecular Biology
030304 developmental biology
0303 health sciences
030306 microbiology
business.industry
Approximation algorithm
Bioinformatics
genomics
[SDV.BIBS]Life Sciences [q-bio]/Quantitative Methods [q-bio.QM]
Computer Science Applications
Computational Mathematics
Task (computing)
Computational Theory and Mathematics
Data mining
[INFO.INFO-BI]Computer Science [cs]/Bioinformatics [q-bio.QM]
business
computer
Algorithms
Subjects
Details
- Language :
- English
- ISSN :
- 13674803 and 13674811
- Database :
- OpenAIRE
- Journal :
- Bioinformatics, Bioinformatics, Oxford University Press (OUP), 2015, pii: btv171 [Epub ahead of print]. ⟨10.1093/bioinformatics/btv171⟩, Bioinformatics, 2015, pii: btv171 [Epub ahead of print]. ⟨10.1093/bioinformatics/btv171⟩
- Accession number :
- edsair.doi.dedup.....0b694938e857c63e4352ae4e21339d8f
- Full Text :
- https://doi.org/10.1093/bioinformatics/btv171⟩