Back to Search
Start Over
Overlap graphs and de Bruijn graphs: data structures for de novo genome assembly in the big data era
- Source :
- Quantitative Biology. 7:278-292
- Publication Year :
- 2019
- Publisher :
- Engineering Sciences Press, 2019.
-
Abstract
- Background: De novo genome assembly relies on two kinds of graphs: de Bruijn graphs and overlap graphs. Overlap graphs are the basis for the Celera assembler, while de Bruijn graphs have become the dominant technical device in the last decade. Those two kinds of graphs are collectively called assembly graphs. Results: In this review, we discuss the most recent advances in the problem of constructing, representing and navigating assembly graphs, focusing on very large datasets. We will also explore some computational techniques, such as the Bloom filter, to compactly store graphs while keeping all functionalities intact. Conclusions: We complete our analysis with a discussion on the algorithmic issues of assembling from long reads (e.g., PacBio and Oxford Nanopore). Finally, we present some of the most relevant open problems in this field. [Figure not available: see fulltext.]
- Subjects :
- De Bruijn sequence
Theoretical computer science
Basis (linear algebra)
string graphs
business.industry
Computer science
Applied Mathematics
Big data
Sequence assembly
overlap graph
Bloom filter
long read
Data structure
Biochemistry, Genetics and Molecular Biology (miscellaneous)
Field (computer science)
Computer Science Applications
Modeling and Simulation
genome assembly
Nanopore sequencing
business
de Bruijn graph
MathematicsofComputing_DISCRETEMATHEMATICS
Subjects
Details
- ISSN :
- 20954697 and 20954689
- Volume :
- 7
- Database :
- OpenAIRE
- Journal :
- Quantitative Biology
- Accession number :
- edsair.doi.dedup.....548b232c8690c4019dad42d93dd535c9
- Full Text :
- https://doi.org/10.1007/s40484-019-0181-x