Back to Search Start Over

MetaNovo: a probabilistic approach to peptide and polymorphism discovery in complex metaproteomic datasets

Authors :
Potgieter, Matthys G
Nel, Andrew JM
Fortuin, Suereta
Garnett, Shaun
Wendoh, Jerome M.
Tabb, David L.
Mulder, Nicola J
Blackburn, Jonathan M
Publication Year :
2019
Publisher :
Cold Spring Harbor Laboratory, 2019.

Abstract

Metagenome-driven microbiome research is providing important new insights in fields as diverse as the pathogenesis of human disease, the metabolic interactions of complex microbial ecosystems involved in agriculture, and climate change. However, poor correlations typically observed between RNA and protein expression datasets even for single organisms make it hard to infer microbial protein expression with any accuracy from metagenomic data, thus restricting movement beyond microbial catalogues and into functional analysis of microbial effector molecules. By contrast, mass spectrometry analysis of microbiome data at the protein level in theory allows direct measurement of dynamic changes in microbial protein composition, localisation and modification that may mediate host/pathogen interactions in complex microbial ecosystems, but analysis of such metaproteomic datasets remains challenging. Here we describe a novel data analysis approach, MetaNovo , that searches complex datasets against the entire known protein universe, whilst still controlling false discovery rates, thus enabling metaproteomic data analyses without requiring prior expectation of likely sample composition or metagenomic data generation that classically inform construction of focussed, relatively small search libraries. MetaNovo directly identifies and quantifies the expressed metaproteomes, and estimates the microbial composition present in complex microbiome samples, using scalable de novo sequence tag matching and probabilistic optimization of very large, unbiased sequence databases prior to target-decoy search. We validated MetaNovo against the results obtained from the recently published MetaPro-IQ pipeline on 8 human mucosal-luminal interface samples, with comparable numbers of peptide and protein identifications being found when searching relatively small databases. We then showed that using an unbiased search of the entire release of UniProt (ca. 90 million protein sequences 1 ) MetaNovo was able to identify a similar bacterial taxonomic distribution compared to that found using a small, focused matched metagenome database, but now also simultaneously identified proteins present in the samples that are derived from other organisms missed by 16S or shotgun sequencing and by previous metaproteomic methods. Using MetaNovo to analyze a set of single-organism human neuroblastoma cell-line samples ( SH-SY5Y ) against UniProt we achieved a comparable MS/MS identification rate during target-decoy search to using the UniProt human Reference proteome, with 22583 (85.99 %) of the total set of identified peptides shared in common. Taxonomic analysis of 612 peptides not found in the canonical set of human proteins yielded 158 peptides unique to the Chordata phylum as potential human variant identifications. Of these, 40 had previously been predicted and 9 identified using whole genome sequencing in a proteogenomic study of the same cell-line. By estimating taxonomic and peptide level information on microbiome samples directly from tandem mass spectrometry data, MetaNovo enables simultaneous identification of human, bacterial, helminth, fungal, viral and other eukaryotic proteins in a sample, thus allowing correlations between changes in microbial protein abundance and change in the host proteome to be drawn based on a single analysis. Data are available via ProteomeXchange with identifier PXD014214. The MetaNovo software is available from GitHub 2 and can be run as a standalone Singularity or Docker container available from the Docker Hub 3 .

Details

Language :
English
Database :
OpenAIRE
Accession number :
edsair.sharebioRxiv..c37eb7f02fbfaf3b6a8ec0eec3b28946
Full Text :
https://doi.org/10.1101/605550