Back to Search
Start Over
Alignment behaviors of short peptides provide a roadmap for functional profiling of metagenomic data
- Source :
- BMC Genomics
- Publisher :
- Springer Nature
-
Abstract
- Background Functional assignments for short-read metagenomic data pose a significant computational challenge due to perceived unpredictability of alignment behavior and the inability to infer useful functional information from translated protein-fragments/peptides. To address this problem, we have examined the predictability of short peptide alignments by systematically studying alignment behavior of large sets of short peptides generated from well-characterized proteins as well as hypothetical proteins in the KEGG database. Results Using test sets of peptides modeling the length and phylogenetic distributions of short-read metagenomic data, we observed that peptides from well-characterized proteins had indistinguishable alignments to proteins from the same orthologous family and proteins from different families. Nonetheless, the patterns contained remarkable phylogenetic and structural signals, with alignments of even very short peptides naturally restricted to their orthologous family and/or proteins having similar structural folds. In stark contrast, peptides from “hypothetical proteins” had only sparse hit patterns with low frequencies and much lower identities. By weighting the structure-driven alignments and filtering peptides with behaviors similar to those derived from “hypothetical proteins”, we demonstrate that the accuracy of abundance predictions of protein families is dramatically improved. Conclusions Evolutionary processes have dispersed protein folds across multiple protein families, precluding accurate functional assignment to short peptides, whose alignment behavior is non-random and driven by structure. Algorithms that filter sparse peptides and weight hit patterns of peptides from “known space” dramatically improve quantification of functions from diverse mixtures of peptides and should substantially improve applications of metagenomic analyses requiring accurate quantitative measures of functional families. Electronic supplementary material The online version of this article (doi:10.1186/s12864-015-2272-z) contains supplementary material, which is available to authorized users.
- Subjects :
- chemistry.chemical_classification
Genetics
Phylogenetic tree
Protein family
Gene Expression Profiling
Peptide
Sequence alignment
Computational biology
Biology
Proteomics
Evolution, Molecular
chemistry
Metagenomics
Phylogenetics
DNA microarray
Peptides
Sequence Alignment
Phylogeny
Research Article
Biotechnology
Subjects
Details
- Language :
- English
- ISSN :
- 14712164
- Volume :
- 16
- Issue :
- 1
- Database :
- OpenAIRE
- Journal :
- BMC Genomics
- Accession number :
- edsair.doi.dedup.....0d1a4a1a5796fa71e9e4881dd9fc7fbb
- Full Text :
- https://doi.org/10.1186/s12864-015-2272-z