Back to Search
Start Over
Remote homology and the functions of metagenomic dark matter
- Source :
- Frontiers in Genetics, Vol 6 (2015)
- Publication Year :
- 2015
- Publisher :
- Frontiers Media S.A., 2015.
-
Abstract
- Predicted open reading frames (ORFs) that lack detectable homology to known proteins are termed ORFans. Despite their prevalence in metagenomes, the extent to which ORFans encode real proteins, the degree to which they can be annotated, and their functional contributions, remain unclear. To gain insights into these questions, we applied sensitive remote-homology detection methods to functionally analyze ORFans from soil, marine, and human gut metagenome collections. ORFans were identified, clustered into sequence families, and annotated through profile-profile comparison to proteins of known structure.We found that a considerable number of metagenomic ORFans (73,896 of 484,121, 15.3%) exhibit significant remote homology to structurally characterized proteins, providing a means for ORFan functional profiling. The extent of detected remote homology significantly exceeds that obtained for artificial protein families (1.4%). In addition, predicted ORFan functions show significant functional consistency with their gene neighbors (p < 0.001) as expected for real genes. Compared to genes annotated through standard homology searches, ORFans have intriguing functional differences such as an enrichment of virus-related functions and biological processes associated with extreme sequence diversity. Each environment also possesses many unique ORFan families that likely play important community roles such as identified ORFan polysaccharide degradation genes unique to the human gut metagenome. Lastly, ORFans are a valuable resource for finding novel enzymes of interest, as we demonstrate by identifying hundreds of ORFan metalloproteases that conserve a catalytic site despite a lack of overall sequence similarity to known proteins. Our ORFan functional predictions are a valuable resource for discovering novel protein families and exploring the boundaries of protein sequence space. Our resource of annotated metagenomic ORFans is available at http://doxey.uwaterloo.ca.
Details
- Language :
- English
- ISSN :
- 16648021
- Volume :
- 6
- Database :
- Directory of Open Access Journals
- Journal :
- Frontiers in Genetics
- Publication Type :
- Academic Journal
- Accession number :
- edsdoj.6e901de622654540ab242218f8920899
- Document Type :
- article
- Full Text :
- https://doi.org/10.3389/fgene.2015.00234