Back to Search Start Over

Large-scale protein function prediction using heterogeneous ensembles

Authors :
T. M. Murali
Gaurav Pandey
Jeffrey N. Law
Shiv D. Kale
Linhua Wang
Source :
F1000Research
Publication Year :
2018

Abstract

Heterogeneous ensembles are an effective approach in scenarios where the ideal data type and/or individual predictor are unclear for a given problem. These ensembles have shown promise for protein function prediction (PFP), but their ability to improve PFP at a large scale is unclear. The overall goal of this study is to critically assess this ability of a variety of heterogeneous ensemble methods across a multitude of functional terms, proteins and organisms. Our results show that these methods, especially Stacking using Logistic Regression, indeed produce more accurate predictions for a variety of Gene Ontology terms differing in size and specificity. To enable the application of these methods to other related problems, we have publicly shared the HPC-enabled code underlying this work as LargeGOPred (https://github.com/GauravPandeyLab/LargeGOPred). This work was supported in part by National Institutes of Health [R01GM114434] and by an IBM faculty award to GP. It was also partially supported by the Office of the Director of National Intelligence (ODNI), Intelligence Advanced Research Projects Activity (IARPA), via the Army Research Office (ARO) under Cooperative Agreement Number [W911NF-17-2-0105].

Details

ISSN :
20461402
Volume :
7
Database :
OpenAIRE
Journal :
F1000Research
Accession number :
edsair.doi.dedup.....dfac271424bbb5479b7f926a198127fb