Back to Search
Start Over
Pathset graphs: a novel approach for comprehensive utilization of paired reads in genome assembly
- Source :
- Journal of computational biology : a journal of computational molecular cell biology. 20(4)
- Publication Year :
- 2012
-
Abstract
- One of the key advances in genome assembly that has led to a significant improvement in contig lengths has been improved algorithms for utilization of paired reads (mate-pairs). While in most assemblers, mate-pair information is used in a post-processing step, the recently proposed Paired de Bruijn Graph (PDBG) approach incorporates the mate-pair information directly in the assembly graph structure. However, the PDBG approach faces difficulties when the variation in the insert sizes is high. To address this problem, we first transform mate-pairs into edge-pair histograms that allow one to better estimate the distance between edges in the assembly graph that represent regions linked by multiple mate-pairs. Further, we combine the ideas of mate-pair transformation and PDBGs to construct new data structures for genome assembly: pathsets and pathset graphs.
- Subjects :
- Theoretical computer science
Sequence assembly
Biology
Machine learning
computer.software_genre
Genome
Contig Mapping
De Bruijn graph
symbols.namesake
Histogram
Databases, Genetic
Genetics
Escherichia coli
Molecular Biology
RECOMB 2012: Part 3 of 3Guest Editor: Benny ChorResearch Articles
Contig
business.industry
Sequence Analysis, DNA
Data structure
Graph
Computational Mathematics
Computational Theory and Mathematics
Modeling and Simulation
symbols
Artificial intelligence
business
computer
Algorithms
Subjects
Details
- ISSN :
- 15578666
- Volume :
- 20
- Issue :
- 4
- Database :
- OpenAIRE
- Journal :
- Journal of computational biology : a journal of computational molecular cell biology
- Accession number :
- edsair.doi.dedup.....7fd2aea5ebb0e976b60c0b3d53eab612