Back to Search
Start Over
De novo meta-assembly of ultra-deep sequencing data
- Source :
- Bioinformatics, Bioinformatics (Oxford, England), vol 31, iss 12
- Publication Year :
- 2015
- Publisher :
- Oxford University Press, 2015.
-
Abstract
- We introduce a new divide and conquer approach to deal with the problem of de novo genome assembly in the presence of ultra-deep sequencing data (i.e. coverage of 1000x or higher). Our proposed meta-assembler Slicembler partitions the input data into optimal-sized ‘slices’ and uses a standard assembly tool (e.g. Velvet, SPAdes, IDBA_UD and Ray) to assemble each slice individually. Slicembler uses majority voting among the individual assemblies to identify long contigs that can be merged to the consensus assembly. To improve its efficiency, Slicembler uses a generalized suffix tree to identify these frequent contigs (or fraction thereof). Extensive experimental results on real ultra-deep sequencing data (8000x coverage) and simulated data show that Slicembler significantly improves the quality of the assembly compared with the performance of the base assembler. In fact, most of the times, Slicembler generates error-free assemblies. We also show that Slicembler is much more resistant against high sequencing error rate than the base assembler. Availability and implementation: Slicembler can be accessed at http://slicembler.cs.ucr.edu/. Contact: hamid.mirebrahim@email.ucr.edu
- Subjects :
- Statistics and Probability
Bioinformatics
Computer science
Generalized suffix tree
Sequence assembly
Ismb/Eccb 2015 Proceedings Papers Committee July 10 to July 14, 2015, Dublin, Ireland
computer.software_genre
Genes, Plant
Biochemistry
Mathematical Sciences
03 medical and health sciences
0302 clinical medicine
Information and Computing Sciences
Molecular Biology
030304 developmental biology
0303 health sciences
Contig
High-Throughput Nucleotide Sequencing
Ultra deep sequencing
Hordeum
DNA
Plant
Sequence Analysis, DNA
Biological Sciences
Base (topology)
Computer Science Applications
Computational Mathematics
Computational Theory and Mathematics
Genes
030220 oncology & carcinogenesis
Data mining
Sequence Analysis
computer
Algorithms
Subjects
Details
- Language :
- English
- ISSN :
- 13674811 and 13674803
- Volume :
- 31
- Issue :
- 12
- Database :
- OpenAIRE
- Journal :
- Bioinformatics
- Accession number :
- edsair.doi.dedup.....320fb465254a7d8221c7b61998c9184b