Back to Search
Start Over
Combining de novo and reference-guided assembly with scaffold_builder
- Source :
- Source Code for Biology and Medicine
- Publisher :
- Springer Nature
-
Abstract
- Genome sequencing has become routine, however genome assembly still remains a challenge despite the computational advances in the last decade. In particular, the abundance of repeat elements in genomes makes it difficult to assemble them into a single complete sequence. Identical repeats shorter than the average read length can generally be assembled without issue. However, longer repeats such as ribosomal RNA operons cannot be accurately assembled using existing tools. The application Scaffold_builder was designed to generate scaffolds – super contigs of sequences joined by N-bases – based on the similarity to a closely related reference sequence. This is independent of mate-pair information and can be used complementarily for genome assembly, e.g. when mate-pairs are not available or have already been exploited. Scaffold_builder was evaluated using simulated pyrosequencing reads of the bacterial genomes Escherichia coli 042, Lactobacillus salivarius UCC118 and Salmonella enterica subsp. enterica serovar Typhi str. P-stx-12. Moreover, we sequenced two genomes from Salmonella enterica serovar Typhimurium LT2 G455 and Salmonella enterica serovar Typhimurium SDT1291 and show that Scaffold_builder decreases the number of contig sequences by 53% while more than doubling their average length. Scaffold_builder is written in Python and is available at http://edwards.sdsu.edu/scaffold_builder. A web-based implementation is additionally provided to allow users to submit a reference genome and a set of contigs to be scaffolded.
- Subjects :
- Salmonella typhimurium
Information Systems and Management
Salmonella enterica serovar typhimurium
Sequence assembly
Health Informatics
Bacterial genome size
Scaffolding
Genome sequencing
Genome
DNA sequencing
03 medical and health sciences
Complete sequence
Next generation sequencing
Software Review
De novo assembly
030304 developmental biology
Genetics
0303 health sciences
biology
Contig
030306 microbiology
biology.organism_classification
Computer Science Applications
Salmonella enterica
Reference genome
Information Systems
Subjects
Details
- Language :
- English
- ISSN :
- 17510473
- Volume :
- 8
- Issue :
- 1
- Database :
- OpenAIRE
- Journal :
- Source Code for Biology and Medicine
- Accession number :
- edsair.doi.dedup.....4167402d3004769d40db2a2d0435b744
- Full Text :
- https://doi.org/10.1186/1751-0473-8-23