Back to Search Start Over

Improvement of eukaryotic proteins prediction from soil metagenomes

Authors :
Georgios Koutsovoulos
Mathilde Clément
Etienne Danchin
Justine Lipuma
Corinne Rancurel
Marc Bailly-Bechet
Carole Belliardo
Publication Year :
2021
Publisher :
Cold Spring Harbor Laboratory, 2021.

Abstract

Background: During the last decades, shotgun metagenomics and metabarcoding have highlighted the diversity of microorganisms from environmental or host-associated samples. Most assembled metagenome public repositories use annotation pipelines tailored for prokaryotes regardless of the taxonomic origin of contigs and metagenome-assembled genomes (MAGs). Consequently, eukaryotic contigs and MAGs, with intrinsically different gene features, are not optimally annotated, resulting in an incorrect representation of the eukaryotic component of biodiversity, despite their biological relevance. Results: Using an automated analysis pipeline, we have filtered 7.9 billion of contigs from 6,873 soil metagenomes in the IMG/M database of the Joint Genome Institute to identify eukaryotic contigs. We have re-annotated genes using eukaryote-tailored methods, yielding 8 million eukaryotic proteins. Of these, 5.6 million could be traced back to non-chimeric higher confidence eukaryotic contigs. Our pipeline improves eukaryotic proteins completeness, contiguity and quality. Moreover, the better quality of eukaryotic proteins combined with a more comprehensive assignment method improves the taxonomic annotation as well. Conclusions | Using public soil metagenomic data, we provide a dataset of eukaryotic soil proteins with improved completeness and quality as well as a more reliable taxonomic annotation. This unique resource is of interest for any scientist aiming at studying the composition, biological functions and gene flux in soil communities involving eukaryotes.

Details

Database :
OpenAIRE
Accession number :
edsair.doi...........643f2f8dee845f55e7df5342cea690ba