Back to Search Start Over

Remote homology and the functions of metagenomic dark matter

Authors :
Briallen eLobb
Daniel Aaron Kurtz
Gabriel eMoreno-Hagelsieb
Andrew Charles Doxey
Source :
Frontiers in Genetics, Vol 6 (2015)
Publication Year :
2015
Publisher :
Frontiers Media S.A., 2015.

Abstract

Predicted open reading frames (ORFs) that lack detectable homology to known proteins are termed ORFans. Despite their prevalence in metagenomes, the extent to which ORFans encode real proteins, the degree to which they can be annotated, and their functional contributions, remain unclear. To gain insights into these questions, we applied sensitive remote-homology detection methods to functionally analyze ORFans from soil, marine, and human gut metagenome collections. ORFans were identified, clustered into sequence families, and annotated through profile-profile comparison to proteins of known structure.We found that a considerable number of metagenomic ORFans (73,896 of 484,121, 15.3%) exhibit significant remote homology to structurally characterized proteins, providing a means for ORFan functional profiling. The extent of detected remote homology significantly exceeds that obtained for artificial protein families (1.4%). In addition, predicted ORFan functions show significant functional consistency with their gene neighbors (p < 0.001) as expected for real genes. Compared to genes annotated through standard homology searches, ORFans have intriguing functional differences such as an enrichment of virus-related functions and biological processes associated with extreme sequence diversity. Each environment also possesses many unique ORFan families that likely play important community roles such as identified ORFan polysaccharide degradation genes unique to the human gut metagenome. Lastly, ORFans are a valuable resource for finding novel enzymes of interest, as we demonstrate by identifying hundreds of ORFan metalloproteases that conserve a catalytic site despite a lack of overall sequence similarity to known proteins. Our ORFan functional predictions are a valuable resource for discovering novel protein families and exploring the boundaries of protein sequence space. Our resource of annotated metagenomic ORFans is available at http://doxey.uwaterloo.ca.

Details

Language :
English
ISSN :
16648021
Volume :
6
Database :
Directory of Open Access Journals
Journal :
Frontiers in Genetics
Publication Type :
Academic Journal
Accession number :
edsdoj.6e901de622654540ab242218f8920899
Document Type :
article
Full Text :
https://doi.org/10.3389/fgene.2015.00234