Back to Search Start Over

DeepMAsED: evaluating the quality of metagenomic assemblies.

Authors :
Mineeva O
Rojas-Carulla M
Ley RE
Schölkopf B
Youngblut ND
Source :
Bioinformatics (Oxford, England) [Bioinformatics] 2020 May 01; Vol. 36 (10), pp. 3011-3017.
Publication Year :
2020

Abstract

Motivation: Methodological advances in metagenome assembly are rapidly increasing in the number of published metagenome assemblies. However, identifying misassemblies is challenging due to a lack of closely related reference genomes that can act as pseudo ground truth. Existing reference-free methods are no longer maintained, can make strong assumptions that may not hold across a diversity of research projects, and have not been validated on large-scale metagenome assemblies.<br />Results: We present DeepMAsED, a deep learning approach for identifying misassembled contigs without the need for reference genomes. Moreover, we provide an in silico pipeline for generating large-scale, realistic metagenome assemblies for comprehensive model training and testing. DeepMAsED accuracy substantially exceeds the state-of-the-art when applied to large and complex metagenome assemblies. Our model estimates a 1% contig misassembly rate in two recent large-scale metagenome assembly publications.<br />Conclusions: DeepMAsED accurately identifies misassemblies in metagenome-assembled contigs from a broad diversity of bacteria and archaea without the need for reference genomes or strong modeling assumptions. Running DeepMAsED is straight-forward, as well as is model re-training with our dataset generation pipeline. Therefore, DeepMAsED is a flexible misassembly classifier that can be applied to a wide range of metagenome assembly projects.<br />Availability and Implementation: DeepMAsED is available from GitHub at https://github.com/leylabmpi/DeepMAsED.<br />Supplementary Information: Supplementary data are available at Bioinformatics online.<br /> (© The Author(s) 2020. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.)

Details

Language :
English
ISSN :
1367-4811
Volume :
36
Issue :
10
Database :
MEDLINE
Journal :
Bioinformatics (Oxford, England)
Publication Type :
Academic Journal
Accession number :
32096824
Full Text :
https://doi.org/10.1093/bioinformatics/btaa124