Back to Search
Start Over
Navigating the protein fitness landscape with Gaussian processes
- Source :
- Proceedings of the National Academy of Sciences (PNAS)
- Publication Year :
- 2013
- Publisher :
- National Academy of Sciences, 2013.
-
Abstract
- Knowing how protein sequence maps to function (the “fitness landscape”) is critical for understanding protein evolution as well as for engineering proteins with new and useful properties. We demonstrate that the protein fitness landscape can be inferred from experimental data, using Gaussian processes, a Bayesian learning technique. Gaussian process landscapes can model various protein sequence properties, including functional status, thermostability, enzyme activity, and ligand binding affinity. Trained on experimental data, these models achieve unrivaled quantitative accuracy. Furthermore, the explicit representation of model uncertainty allows for efficient searches through the vast space of possible sequences. We develop and test two protein sequence design algorithms motivated by Bayesian decision theory. The first one identifies small sets of sequences that are informative about the landscape; the second one identifies optimized sequences by iteratively improving the Gaussian process model in regions of the landscape that are predicted to be optimized. We demonstrate the ability of Gaussian processes to guide the search through protein sequence space by designing, constructing, and testing chimeric cytochrome P450s. These algorithms allowed us to engineer active P450 enzymes that are more thermostable than any previously made by chimeragenesis, rational design, or directed evolution.
- Subjects :
- Models, Molecular
Fitness landscape
Sequence analysis
Recombinant Fusion Proteins
Normal Distribution
Biology
010402 general chemistry
Bayesian inference
Machine learning
computer.software_genre
Protein Engineering
01 natural sciences
Evolution, Molecular
03 medical and health sciences
symbols.namesake
Bayes' theorem
Protein sequencing
Cytochrome P-450 Enzyme System
Sequence Analysis, Protein
Databases, Protein
Gaussian process
030304 developmental biology
0303 health sciences
Multidisciplinary
business.industry
Protein Stability
Proteins
Bayes Theorem
Protein engineering
Directed evolution
0104 chemical sciences
PNAS Plus
symbols
Artificial intelligence
business
computer
Algorithms
Subjects
Details
- Language :
- English
- Database :
- OpenAIRE
- Journal :
- Proceedings of the National Academy of Sciences (PNAS)
- Accession number :
- edsair.doi.dedup.....f34165e39800cfbb8c9906215996bdcc