1. Parametric Bayesian priors and better choice of negative examples improve protein function prediction
- Author
-
Kevin Drew, Noah Youngs, Richard Bonneau, Dennis Shasha, and Duncan Penfold-Brown
- Subjects
Statistics and Probability ,Proteome ,Computer science ,Gene regulatory network ,Machine learning ,computer.software_genre ,Biochemistry ,Genome ,Mice ,Artificial Intelligence ,Yeasts ,Protein Interaction Mapping ,Animals ,Gene Regulatory Networks ,Protein function prediction ,Molecular Biology ,Parametric statistics ,Protein function ,business.industry ,Proteins ,Bayes Theorem ,Molecular Sequence Annotation ,Function (mathematics) ,Original Papers ,Yeast ,Computer Science Applications ,Computational Mathematics ,ComputingMethodologies_PATTERNRECOGNITION ,Computational Theory and Mathematics ,Key (cryptography) ,Data mining ,Artificial intelligence ,Heuristics ,business ,computer ,Algorithms - Abstract
Motivation: Computational biologists have demonstrated the utility of using machine learning methods to predict protein function from an integration of multiple genome-wide data types. Yet, even the best performing function prediction algorithms rely on heuristics for important components of the algorithm, such as choosing negative examples (proteins without a given function) or determining key parameters. The improper choice of negative examples, in particular, can hamper the accuracy of protein function prediction. Results: We present a novel approach for choosing negative examples, using a parameterizable Bayesian prior computed from all observed annotation data, which also generates priors used during function prediction. We incorporate this new method into the GeneMANIA function prediction algorithm and demonstrate improved accuracy of our algorithm over current top-performing function prediction methods on the yeast and mouse proteomes across all metrics tested. Availability: Code and Data are available at: http://bonneaulab.bio.nyu.edu/funcprop.html Contact: shasha@courant.nyu.edu or bonneau@cs.nyu.edu Supplementary information: Supplementary data are available at Bioinformatics online.
- Published
- 2013
- Full Text
- View/download PDF