Back to Search
Start Over
Large-scale protein function prediction using heterogeneous ensembles
- Source :
- F1000Research
- Publication Year :
- 2018
-
Abstract
- Heterogeneous ensembles are an effective approach in scenarios where the ideal data type and/or individual predictor are unclear for a given problem. These ensembles have shown promise for protein function prediction (PFP), but their ability to improve PFP at a large scale is unclear. The overall goal of this study is to critically assess this ability of a variety of heterogeneous ensemble methods across a multitude of functional terms, proteins and organisms. Our results show that these methods, especially Stacking using Logistic Regression, indeed produce more accurate predictions for a variety of Gene Ontology terms differing in size and specificity. To enable the application of these methods to other related problems, we have publicly shared the HPC-enabled code underlying this work as LargeGOPred (https://github.com/GauravPandeyLab/LargeGOPred). This work was supported in part by National Institutes of Health [R01GM114434] and by an IBM faculty award to GP. It was also partially supported by the Office of the Director of National Intelligence (ODNI), Intelligence Advanced Research Projects Activity (IARPA), via the Army Research Office (ARO) under Cooperative Agreement Number [W911NF-17-2-0105].
- Subjects :
- 0301 basic medicine
Computer science
Logistic regression
Machine learning
computer.software_genre
Data type
protein function prediction,heterogeneous ensembles,machine learning, high-performance computing, performance evaluation
General Biochemistry, Genetics and Molecular Biology
Machine Learning
03 medical and health sciences
Bacterial Proteins
Protein function prediction
General Pharmacology, Toxicology and Pharmaceutics
General Immunology and Microbiology
business.industry
Gene ontology
Scale (chemistry)
high-performance computing
heterogeneous ensembles
Articles
General Medicine
Method Article
Supercomputer
Ensemble learning
performance evaluation
030104 developmental biology
Gene Ontology
Logistic Models
Artificial intelligence
business
computer
protein function prediction
Subjects
Details
- ISSN :
- 20461402
- Volume :
- 7
- Database :
- OpenAIRE
- Journal :
- F1000Research
- Accession number :
- edsair.doi.dedup.....dfac271424bbb5479b7f926a198127fb