Back to Search
Start Over
DeepMAsED: evaluating the quality of metagenomic assemblies.
- Source :
-
Bioinformatics (Oxford, England) [Bioinformatics] 2020 May 01; Vol. 36 (10), pp. 3011-3017. - Publication Year :
- 2020
-
Abstract
- Motivation: Methodological advances in metagenome assembly are rapidly increasing in the number of published metagenome assemblies. However, identifying misassemblies is challenging due to a lack of closely related reference genomes that can act as pseudo ground truth. Existing reference-free methods are no longer maintained, can make strong assumptions that may not hold across a diversity of research projects, and have not been validated on large-scale metagenome assemblies.<br />Results: We present DeepMAsED, a deep learning approach for identifying misassembled contigs without the need for reference genomes. Moreover, we provide an in silico pipeline for generating large-scale, realistic metagenome assemblies for comprehensive model training and testing. DeepMAsED accuracy substantially exceeds the state-of-the-art when applied to large and complex metagenome assemblies. Our model estimates a 1% contig misassembly rate in two recent large-scale metagenome assembly publications.<br />Conclusions: DeepMAsED accurately identifies misassemblies in metagenome-assembled contigs from a broad diversity of bacteria and archaea without the need for reference genomes or strong modeling assumptions. Running DeepMAsED is straight-forward, as well as is model re-training with our dataset generation pipeline. Therefore, DeepMAsED is a flexible misassembly classifier that can be applied to a wide range of metagenome assembly projects.<br />Availability and Implementation: DeepMAsED is available from GitHub at https://github.com/leylabmpi/DeepMAsED.<br />Supplementary Information: Supplementary data are available at Bioinformatics online.<br /> (© The Author(s) 2020. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.)
- Subjects :
- Bacteria
Computer Simulation
Metagenomics
Sequence Analysis, DNA
Metagenome
Software
Subjects
Details
- Language :
- English
- ISSN :
- 1367-4811
- Volume :
- 36
- Issue :
- 10
- Database :
- MEDLINE
- Journal :
- Bioinformatics (Oxford, England)
- Publication Type :
- Academic Journal
- Accession number :
- 32096824
- Full Text :
- https://doi.org/10.1093/bioinformatics/btaa124