Back to Search
Start Over
Mash Screen: High-throughput sequence containment estimation for genome discovery
- Source :
- Genome Biology, Genome Biology, Vol 20, Iss 1, Pp 1-13 (2019)
- Publication Year :
- 2019
- Publisher :
- Cold Spring Harbor Laboratory, 2019.
-
Abstract
- The MinHash algorithm has proven effective for rapidly estimating the resemblance of two genomes or metagenomes. However, this method cannot reliably estimate the containment of a genome within a metagenome. Here, we describe an online algorithm capable of measuring the containment of genomes and proteomes within either assembled or unassembled sequencing read sets. We describe several use cases, including contamination screening and retrospective analysis of metagenomes for novel genome discovery. Using this tool, we provide containment estimates for every NCBI RefSeq genome within every SRA metagenome and demonstrate the identification of a novel polyomavirus species from a public metagenome.
- Subjects :
- lcsh:QH426-470
Proteome
High throughput sequence
Computer science
Method
MinHash
Computational biology
Biology
Genome
03 medical and health sciences
RefSeq
Sequencing
Humans
Viral Discovery
lcsh:QH301-705.5
030304 developmental biology
0303 health sciences
Containment (computer programming)
030306 microbiology
DNA Contamination
Human genetics
High-Throughput Screening Assays
lcsh:Genetics
lcsh:Biology (General)
Metagenomics
Polyomavirus
SRA
Algorithms
Subjects
Details
- Database :
- OpenAIRE
- Journal :
- Genome Biology, Genome Biology, Vol 20, Iss 1, Pp 1-13 (2019)
- Accession number :
- edsair.doi.dedup.....c348e0d85d2924190f218ccb0b73ad13