30 results on '"Dauparas J"'
Search Results
2. Hallucinating symmetric protein assemblies
- Author
-
Wicky, B. I. M., primary, Milles, L. F., additional, Courbet, A., additional, Ragotte, R. J., additional, Dauparas, J., additional, Kinfu, E., additional, Tipps, S., additional, Kibler, R. D., additional, Baek, M., additional, DiMaio, F., additional, Li, X., additional, Carter, L., additional, Kang, A., additional, Nguyen, H., additional, Bera, A. K., additional, and Baker, D., additional
- Published
- 2022
- Full Text
- View/download PDF
3. Hallucinating protein assemblies
- Author
-
Wicky, B. I. M., primary, Milles, L. F., additional, Courbet, A., additional, Ragotte, R. J., additional, Dauparas, J., additional, Kinfu, E., additional, Tipps, S., additional, Kibler, R. D., additional, Baek, M., additional, DiMaio, F., additional, Li, X., additional, Carter, L., additional, Kang, A., additional, Nguyen, H., additional, Bera, A. K., additional, and Baker, D., additional
- Published
- 2022
- Full Text
- View/download PDF
4. Exploiting prior knowledge in machine learning model design
- Author
-
Wagstaff, E, Posner, I, Fuchs, F, Engelcke, M, Hamid, S, Dauparas, J, Lamb, K, Malhotra, G, Vlontzos, A, Baydin, A, Bhiwandiwalla, A, Gal, Y, Kalaitzis , A, Reina, A, Bhatt, A, and Osborne, M
- Subjects
Machine learning - Abstract
The design of models which are appropriate for specific tasks is an important activity in machine learning. This thesis considers multiple ways in which knowledge about the task at hand can be incorporated into the design of a machine learning model: (i) by using Bayesian models, which incorporate prior knowledge in probability distributions; (ii) by designing models to respect the symmetries of the task; (iii) by adapting models in a practical setting. For Bayesian models, we propose a probabilistic numeric method for computing integrals which arise in the context of Bayesian inference. Respecting the symmetries of a task comes under the framework of group equivariance and invariance. We provide a theoretical investigation of Deep Sets, a prominent permutation-invariant model, and propose an extension of the SE(3)-Transformer, an SE(3)-equivariant model. Finally, we present an investigation of the application of machine learning techniques to problems in atmospheric physics, in which the relevant models are adapted to specific known features of the problem.
- Published
- 2022
5. Computational design of bifaceted protein nanomaterials with tailorable properties.
- Author
-
Rankovic S, Carr KD, Decarreau J, Skotheim R, Kibler RD, Ols S, Lee S, Chun J, Tooley M, Dauparas J, Eisenach HE, Glögl M, Weidle C, Borst AJ, Baker D, and King NP
- Abstract
Recent advances in computational methods have led to considerable progress in the design of self-assembling protein nanoparticles. However, nearly all nanoparticles designed to date exhibit strict point group symmetry, with each subunit occupying an identical, symmetrically related environment. This property limits the structural diversity that can be achieved and precludes anisotropic functionalization. Here, we describe a general computational strategy for designing multi-component bifaceted protein nanomaterials with two distinctly addressable sides. The method centers on docking pseudosymmetric heterooligomeric building blocks in architectures with dihedral symmetry and designing an asymmetric protein-protein interface between them. We used this approach to obtain an initial 30-subunit assembly with pseudo-D5 symmetry, and then generated an additional 15 variants in which we controllably altered the size and morphology of the bifaceted nanoparticles by designing de novo extensions to one of the subunits. Functionalization of the two distinct faces of the nanoparticles with de novo protein minibinders enabled specific colocalization of two populations of polystyrene microparticles coated with target protein receptors. The ability to accurately design anisotropic protein nanomaterials with precisely tunable structures and functions will be broadly useful in applications that require colocalizing two or more distinct target moieties., Competing Interests: Competing interests The authors declare no competing interests.
- Published
- 2024
- Full Text
- View/download PDF
6. Modeling protein-small molecule conformational ensembles with ChemNet.
- Author
-
Anishchenko I, Kipnis Y, Kalvet I, Zhou G, Krishna R, Pellock SJ, Lauko A, Lee GR, An L, Dauparas J, DiMaio F, and Baker D
- Abstract
Modeling the conformational heterogeneity of protein-small molecule systems is an outstanding challenge. We reasoned that while residue level descriptions of biomolecules are efficient for de novo structure prediction, for probing heterogeneity of interactions with small molecules in the folded state an entirely atomic level description could have advantages in speed and generality. We developed a graph neural network called ChemNet trained to recapitulate correct atomic positions from partially corrupted input structures from the Cambridge Structural Database and the Protein Data Bank; the nodes of the graph are the atoms in the system. ChemNet accurately generates structures of diverse organic small molecules given knowledge of their atom composition and bonding, and given a description of the larger protein context, and builds up structures of small molecules and protein side chains for protein-small molecule docking. Because ChemNet is rapid and stochastic, ensembles of predictions can be readily generated to map conformational heterogeneity. In enzyme design efforts described here and elsewhere, we find that using ChemNet to assess the accuracy and pre-organization of the designed active sites results in higher success rates and higher activities; we obtain a preorganized retroaldolase with a k
cat / KM of 11000 M-1 min-1 , considerably higher than any pre-deep learning design for this reaction. We anticipate that ChemNet will be widely useful for rapidly generating conformational ensembles of small molecule and small molecule-protein systems, and for designing higher activity preorganized enzymes., Competing Interests: Competing interests: A provisional patent (application number 63/535,404) covering ChemNet network presented in this paper has been filed by the University of Washington. D.B., I.A., G.Z., R.K, F.D. are inventors on this patent. D.B. is a cofounder and shareholder of Vilya, an early-stage biotechnology company that has licensed the provisional patent.- Published
- 2024
- Full Text
- View/download PDF
7. Binding and sensing diverse small molecules using shape-complementary pseudocycles.
- Author
-
An L, Said M, Tran L, Majumder S, Goreshnik I, Lee GR, Juergens D, Dauparas J, Anishchenko I, Coventry B, Bera AK, Kang A, Levine PM, Alvarez V, Pillai A, Norn C, Feldman D, Zorine D, Hicks DR, Li X, Sanchez MG, Vafeados DK, Salveson PJ, Vorobieva AA, and Baker D
- Subjects
- Binding Sites, Ligands, Methotrexate chemistry, Molecular Docking Simulation, Nanopores, Protein Multimerization, Thyroxine chemistry, Deep Learning, Protein Binding, Proteins chemistry, Small Molecule Libraries chemistry
- Abstract
We describe an approach for designing high-affinity small molecule-binding proteins poised for downstream sensing. We use deep learning-generated pseudocycles with repeating structural units surrounding central binding pockets with widely varying shapes that depend on the geometry and number of the repeat units. We dock small molecules of interest into the most shape complementary of these pseudocycles, design the interaction surfaces for high binding affinity, and experimentally screen to identify designs with the highest affinity. We obtain binders to four diverse molecules, including the polar and flexible methotrexate and thyroxine. Taking advantage of the modular repeat structure and central binding pockets, we construct chemically induced dimerization systems and low-noise nanopore sensors by splitting designs into domains that reassemble upon ligand addition.
- Published
- 2024
- Full Text
- View/download PDF
8. Computational design of soluble and functional membrane protein analogues.
- Author
-
Goverde CA, Pacesa M, Goldbach N, Dornfeld LJ, Balbi PEM, Georgeon S, Rosset S, Kapoor S, Choudhury J, Dauparas J, Schellhaas C, Kozlov S, Baker D, Ovchinnikov S, Vecchio AJ, and Correia BE
- Subjects
- Humans, Models, Molecular, Protein Stability, Proteome chemistry, Receptors, G-Protein-Coupled chemistry, Receptors, G-Protein-Coupled metabolism, Amino Acid Motifs, Proof of Concept Study, Deep Learning, Membrane Proteins chemistry, Membrane Proteins metabolism, Protein Folding, Solubility, Computer-Aided Design
- Abstract
De novo design of complex protein folds using solely computational means remains a substantial challenge
1 . Here we use a robust deep learning pipeline to design complex folds and soluble analogues of integral membrane proteins. Unique membrane topologies, such as those from G-protein-coupled receptors2 , are not found in the soluble proteome, and we demonstrate that their structural features can be recapitulated in solution. Biophysical analyses demonstrate the high thermal stability of the designs, and experimental structures show remarkable design accuracy. The soluble analogues were functionalized with native structural motifs, as a proof of concept for bringing membrane protein functions to the soluble proteome, potentially enabling new approaches in drug discovery. In summary, we have designed complex protein topologies and enriched them with functionalities from membrane proteins, with high experimental success rates, leading to a de facto expansion of the functional soluble fold space., (© 2024. The Author(s).)- Published
- 2024
- Full Text
- View/download PDF
9. Rapid and automated design of two-component protein nanomaterials using ProteinMPNN.
- Author
-
de Haas RJ, Brunette N, Goodson A, Dauparas J, Yi SY, Yang EC, Dowling Q, Nguyen H, Kang A, Bera AK, Sankaran B, de Vries R, Baker D, and King NP
- Subjects
- Models, Molecular, Amino Acid Sequence, Biotechnology, Protein Conformation, Proteins chemistry, Nanostructures
- Abstract
The design of protein-protein interfaces using physics-based design methods such as Rosetta requires substantial computational resources and manual refinement by expert structural biologists. Deep learning methods promise to simplify protein-protein interface design and enable its application to a wide variety of problems by researchers from various scientific disciplines. Here, we test the ability of a deep learning method for protein sequence design, ProteinMPNN, to design two-component tetrahedral protein nanomaterials and benchmark its performance against Rosetta. ProteinMPNN had a similar success rate to Rosetta, yielding 13 new experimentally confirmed assemblies, but required orders of magnitude less computation and no manual refinement. The interfaces designed by ProteinMPNN were substantially more polar than those designed by Rosetta, which facilitated in vitro assembly of the designed nanomaterials from independently purified components. Crystal structures of several of the assemblies confirmed the accuracy of the design method at high resolution. Our results showcase the potential of deep learning-based methods to unlock the widespread application of designed protein-protein interfaces and self-assembling protein nanomaterials in biotechnology., Competing Interests: Competing interests statement:N.P.K. is a cofounder, shareholder, paid consultant, and chair of the scientific advisory board of Icosavax, Inc. The King lab has received unrelated sponsored research agreements from Pfizer and GlaxoSmithKline.
- Published
- 2024
- Full Text
- View/download PDF
10. Computational design of soluble functional analogues of integral membrane proteins.
- Author
-
Goverde CA, Pacesa M, Goldbach N, Dornfeld LJ, Balbi PEM, Georgeon S, Rosset S, Kapoor S, Choudhury J, Dauparas J, Schellhaas C, Kozlov S, Baker D, Ovchinnikov S, Vecchio AJ, and Correia BE
- Abstract
De novo design of complex protein folds using solely computational means remains a significant challenge. Here, we use a robust deep learning pipeline to design complex folds and soluble analogues of integral membrane proteins. Unique membrane topologies, such as those from GPCRs, are not found in the soluble proteome and we demonstrate that their structural features can be recapitulated in solution. Biophysical analyses reveal high thermal stability of the designs and experimental structures show remarkable design accuracy. The soluble analogues were functionalized with native structural motifs, standing as a proof-of-concept for bringing membrane protein functions to the soluble proteome, potentially enabling new approaches in drug discovery. In summary, we designed complex protein topologies and enriched them with functionalities from membrane proteins, with high experimental success rates, leading to a de facto expansion of the functional soluble fold space.
- Published
- 2024
- Full Text
- View/download PDF
11. Blueprinting extendable nanomaterials with standardized protein blocks.
- Author
-
Huddy TF, Hsia Y, Kibler RD, Xu J, Bethel N, Nagarajan D, Redler R, Leung PJY, Weidle C, Courbet A, Yang EC, Bera AK, Coudray N, Calise SJ, Davila-Hernandez FA, Han HL, Carr KD, Li Z, McHugh R, Reggiano G, Kang A, Sankaran B, Dickinson MS, Coventry B, Brunette TJ, Liu Y, Dauparas J, Borst AJ, Ekiert D, Kollman JM, Bhabha G, and Baker D
- Subjects
- Crystallography, X-Ray, Microscopy, Electron, Reproducibility of Results, Nanostructures chemistry, Proteins chemistry, Proteins metabolism
- Abstract
A wooden house frame consists of many different lumber pieces, but because of the regularity of these building blocks, the structure can be designed using straightforward geometrical principles. The design of multicomponent protein assemblies, in comparison, has been much more complex, largely owing to the irregular shapes of protein structures
1 . Here we describe extendable linear, curved and angled protein building blocks, as well as inter-block interactions, that conform to specified geometric standards; assemblies designed using these blocks inherit their extendability and regular interaction surfaces, enabling them to be expanded or contracted by varying the number of modules, and reinforced with secondary struts. Using X-ray crystallography and electron microscopy, we validate nanomaterial designs ranging from simple polygonal and circular oligomers that can be concentrically nested, up to large polyhedral nanocages and unbounded straight 'train track' assemblies with reconfigurable sizes and geometries that can be readily blueprinted. Because of the complexity of protein structures and sequence-structure relationships, it has not previously been possible to build up large protein assemblies by deliberate placement of protein backbones onto a blank three-dimensional canvas; the simplicity and geometric regularity of our design platform now enables construction of protein nanomaterials according to 'back of an envelope' architectural blueprints., (© 2024. The Author(s).)- Published
- 2024
- Full Text
- View/download PDF
12. Improving Protein Expression, Stability, and Function with ProteinMPNN.
- Author
-
Sumida KH, Núñez-Franco R, Kalvet I, Pellock SJ, Wicky BIM, Milles LF, Dauparas J, Wang J, Kipnis Y, Jameson N, Kang A, De La Cruz J, Sankaran B, Bera AK, Jiménez-Osés G, and Baker D
- Subjects
- Temperature, Recombinant Fusion Proteins, Endopeptidases metabolism
- Abstract
Natural proteins are highly optimized for function but are often difficult to produce at a scale suitable for biotechnological applications due to poor expression in heterologous systems, limited solubility, and sensitivity to temperature. Thus, a general method that improves the physical properties of native proteins while maintaining function could have wide utility for protein-based technologies. Here, we show that the deep neural network ProteinMPNN, together with evolutionary and structural information, provides a route to increasing protein expression, stability, and function. For both myoglobin and tobacco etch virus (TEV) protease, we generated designs with improved expression, elevated melting temperatures, and improved function. For TEV protease, we identified multiple designs with improved catalytic activity as compared to the parent sequence and previously reported TEV variants. Our approach should be broadly useful for improving the expression, stability, and function of biotechnologically important proteins.
- Published
- 2024
- Full Text
- View/download PDF
13. De novo design of diverse small molecule binders and sensors using Shape Complementary Pseudocycles.
- Author
-
An L, Said M, Tran L, Majumder S, Goreshnik I, Lee GR, Juergens D, Dauparas J, Anishchenko I, Coventry B, Bera AK, Kang A, Levine PM, Alvarez V, Pillai A, Norn C, Feldman D, Zorine D, Hicks DR, Li X, Sanchez MG, Vafeados DK, Salveson PJ, Vorobieva AA, and Baker D
- Abstract
A general method for designing proteins to bind and sense any small molecule of interest would be widely useful. Due to the small number of atoms to interact with, binding to small molecules with high affinity requires highly shape complementary pockets, and transducing binding events into signals is challenging. Here we describe an integrated deep learning and energy based approach for designing high shape complementarity binders to small molecules that are poised for downstream sensing applications. We employ deep learning generated psuedocycles with repeating structural units surrounding central pockets; depending on the geometry of the structural unit and repeat number, these pockets span wide ranges of sizes and shapes. For a small molecule target of interest, we extensively sample high shape complementarity pseudocycles to generate large numbers of customized potential binding pockets; the ligand binding poses and the interacting interfaces are then optimized for high affinity binding. We computationally design binders to four diverse molecules, including for the first time polar flexible molecules such as methotrexate and thyroxine, which are expressed at high levels and have nanomolar affinities straight out of the computer. Co-crystal structures are nearly identical to the design models. Taking advantage of the modular repeating structure of pseudocycles and central location of the binding pockets, we constructed low noise nanopore sensors and chemically induced dimerization systems by splitting the binders into domains which assemble into the original pseudocycle pocket upon target molecule addition., One Sentence Summary: We use a pseuodocycle-based shape complementarity optimizing approach to design nanomolar binders to diverse ligands, including the flexible and polar methotrexate and thyroxine, that can be directly converted into ligand-gated nanopores and chemically induced dimerization systems.
- Published
- 2023
- Full Text
- View/download PDF
14. Small-molecule binding and sensing with a designed protein family.
- Author
-
Lee GR, Pellock SJ, Norn C, Tischer D, Dauparas J, Anischenko I, Mercer JAM, Kang A, Bera A, Nguyen H, Goreshnik I, Vafeados D, Roullier N, Han HL, Coventry B, Haddox HK, Liu DR, Yeh AH, and Baker D
- Abstract
Despite transformative advances in protein design with deep learning, the design of small-molecule-binding proteins and sensors for arbitrary ligands remains a grand challenge. Here we combine deep learning and physics-based methods to generate a family of proteins with diverse and designable pocket geometries, which we employ to computationally design binders for six chemically and structurally distinct small-molecule targets. Biophysical characterization of the designed binders revealed nanomolar to low micromolar binding affinities and atomic-level design accuracy. The bound ligands are exposed at one edge of the binding pocket, enabling the de novo design of chemically induced dimerization (CID) systems; we take advantage of this to create a biosensor with nanomolar sensitivity for cortisol. Our approach provides a general method to design proteins that bind and sense small molecules for a wide range of analytical, environmental, and biomedical applications.
- Published
- 2023
- Full Text
- View/download PDF
15. Hallucination of closed repeat proteins containing central pockets.
- Author
-
An L, Hicks DR, Zorine D, Dauparas J, Wicky BIM, Milles LF, Courbet A, Bera AK, Nguyen H, Kang A, Carter L, and Baker D
- Subjects
- Humans, Proteins chemistry, Hallucinations
- Abstract
In pseudocyclic proteins, such as TIM barrels, β barrels, and some helical transmembrane channels, a single subunit is repeated in a cyclic pattern, giving rise to a central cavity that can serve as a pocket for ligand binding or enzymatic activity. Inspired by these proteins, we devised a deep-learning-based approach to broadly exploring the space of closed repeat proteins starting from only a specification of the repeat number and length. Biophysical data for 38 structurally diverse pseudocyclic designs produced in Escherichia coli are consistent with the design models, and the three crystal structures we were able to obtain are very close to the designed structures. Docking studies suggest the diversity of folds and central pockets provide effective starting points for designing small-molecule binders and enzymes., (© 2023. The Author(s).)
- Published
- 2023
- Full Text
- View/download PDF
16. Design of stimulus-responsive two-state hinge proteins.
- Author
-
Praetorius F, Leung PJY, Tessmer MH, Broerman A, Demakis C, Dishman AF, Pillai A, Idris A, Juergens D, Dauparas J, Li X, Levine PM, Lamb M, Ballard RK, Gerben SR, Nguyen H, Kang A, Sankaran B, Bera AK, Volkman BF, Nivala J, Stoll S, and Baker D
- Subjects
- Crystallography, X-Ray, Ligands, Protein Conformation, Protein Engineering methods
- Abstract
In nature, proteins that switch between two conformations in response to environmental stimuli structurally transduce biochemical information in a manner analogous to how transistors control information flow in computing devices. Designing proteins with two distinct but fully structured conformations is a challenge for protein design as it requires sculpting an energy landscape with two distinct minima. Here we describe the design of "hinge" proteins that populate one designed state in the absence of ligand and a second designed state in the presence of ligand. X-ray crystallography, electron microscopy, double electron-electron resonance spectroscopy, and binding measurements demonstrate that despite the significant structural differences the two states are designed with atomic level accuracy and that the conformational and binding equilibria are closely coupled.
- Published
- 2023
- Full Text
- View/download PDF
17. Rapid and automated design of two-component protein nanomaterials using ProteinMPNN.
- Author
-
de Haas RJ, Brunette N, Goodson A, Dauparas J, Yi SY, Yang EC, Dowling Q, Nguyen H, Kang A, Bera AK, Sankaran B, de Vries R, Baker D, and King NP
- Abstract
The design of novel protein-protein interfaces using physics-based design methods such as Rosetta requires substantial computational resources and manual refinement by expert structural biologists. A new generation of deep learning methods promises to simplify protein-protein interface design and enable its application to a wide variety of problems by researchers from various scientific disciplines. Here we test the ability of a deep learning method for protein sequence design, ProteinMPNN, to design two-component tetrahedral protein nanomaterials and benchmark its performance against Rosetta. ProteinMPNN had a similar success rate to Rosetta, yielding 13 new experimentally confirmed assemblies, but required orders of magnitude less computation and no manual refinement. The interfaces designed by ProteinMPNN were substantially more polar than those designed by Rosetta, which facilitated in vitro assembly of the designed nanomaterials from independently purified components. Crystal structures of several of the assemblies confirmed the accuracy of the design method at high resolution. Our results showcase the potential of deep learning-based methods to unlock the widespread application of designed protein-protein interfaces and self-assembling protein nanomaterials in biotechnology.
- Published
- 2023
- Full Text
- View/download PDF
18. Mega-scale experimental analysis of protein folding stability in biology and design.
- Author
-
Tsuboyama K, Dauparas J, Chen J, Laine E, Mohseni Behbahani Y, Weinstein JJ, Mangan NM, Ovchinnikov S, and Rocklin GJ
- Subjects
- Amino Acids genetics, Amino Acids metabolism, DNA, Complementary genetics, Protein Stability, Thermodynamics, Proteolysis, Protein Domains genetics, Mutation, Biology methods, Protein Folding, Proteins chemistry, Proteins genetics, Proteins metabolism, Protein Engineering methods
- Abstract
Advances in DNA sequencing and machine learning are providing insights into protein sequences and structures on an enormous scale
1 . However, the energetics driving folding are invisible in these structures and remain largely unknown2 . The hidden thermodynamics of folding can drive disease3,4 , shape protein evolution5-7 and guide protein engineering8-10 , and new approaches are needed to reveal these thermodynamics for every sequence and structure. Here we present cDNA display proteolysis, a method for measuring thermodynamic folding stability for up to 900,000 protein domains in a one-week experiment. From 1.8 million measurements in total, we curated a set of around 776,000 high-quality folding stabilities covering all single amino acid variants and selected double mutants of 331 natural and 148 de novo designed protein domains 40-72 amino acids in length. Using this extensive dataset, we quantified (1) environmental factors influencing amino acid fitness, (2) thermodynamic couplings (including unexpected interactions) between protein sites, and (3) the global divergence between evolutionary amino acid usage and protein folding stability. We also examined how our approach could identify stability determinants in designed proteins and evaluate design methods. The cDNA display proteolysis method is fast, accurate and uniquely scalable, and promises to reveal the quantitative rules for how amino acid sequences encode folding stability., (© 2023. The Author(s).)- Published
- 2023
- Full Text
- View/download PDF
19. Blueprinting expandable nanomaterials with standardized protein building blocks.
- Author
-
Huddy TF, Hsia Y, Kibler RD, Xu J, Bethel N, Nagarajan D, Redler R, Leung PJY, Courbet A, Yang EC, Bera AK, Coudray N, Calise SJ, Davila-Hernandez FA, Weidle C, Han HL, Li Z, McHugh R, Reggiano G, Kang A, Sankaran B, Dickinson MS, Coventry B, Brunette TJ, Liu Y, Dauparas J, Borst AJ, Ekiert D, Kollman JM, Bhabha G, and Baker D
- Abstract
A wooden house frame consists of many different lumber pieces, but because of the regularity of these building blocks, the structure can be designed using straightforward geometrical principles. The design of multicomponent protein assemblies in comparison has been much more complex, largely due to the irregular shapes of protein structures
1 . Here we describe extendable linear, curved, and angled protein building blocks, as well as inter-block interactions that conform to specified geometric standards; assemblies designed using these blocks inherit their extendability and regular interaction surfaces, enabling them to be expanded or contracted by varying the number of modules, and reinforced with secondary struts. Using X-ray crystallography and electron microscopy, we validate nanomaterial designs ranging from simple polygonal and circular oligomers that can be concentrically nested, up to large polyhedral nanocages and unbounded straight "train track" assemblies with reconfigurable sizes and geometries that can be readily blueprinted. Because of the complexity of protein structures and sequence-structure relationships, it has not been previously possible to build up large protein assemblies by deliberate placement of protein backbones onto a blank 3D canvas; the simplicity and geometric regularity of our design platform now enables construction of protein nanomaterials according to "back of an envelope" architectural blueprints.- Published
- 2023
- Full Text
- View/download PDF
20. Improving de novo protein binder design with deep learning.
- Author
-
Bennett NR, Coventry B, Goreshnik I, Huang B, Allen A, Vafeados D, Peng YP, Dauparas J, Baek M, Stewart L, DiMaio F, De Munck S, Savvides SN, and Baker D
- Subjects
- Protein Engineering, Proteins metabolism, Protein Binding, Deep Learning
- Abstract
Recently it has become possible to de novo design high affinity protein binding proteins from target structural information alone. There is, however, considerable room for improvement as the overall design success rate is low. Here, we explore the augmentation of energy-based protein binder design using deep learning. We find that using AlphaFold2 or RoseTTAFold to assess the probability that a designed sequence adopts the designed monomer structure, and the probability that this structure binds the target as designed, increases design success rates nearly 10-fold. We find further that sequence design using ProteinMPNN rather than Rosetta considerably increases computational efficiency., (© 2023. The Author(s).)
- Published
- 2023
- Full Text
- View/download PDF
21. Peptide-binding specificity prediction using fine-tuned protein structure prediction networks.
- Author
-
Motmaen A, Dauparas J, Baek M, Abedi MH, Baker D, and Bradley P
- Subjects
- Protein Binding, Genes, MHC Class II, PDZ Domains, Peptides chemistry, Histocompatibility Antigens Class II metabolism
- Abstract
Peptide-binding proteins play key roles in biology, and predicting their binding specificity is a long-standing challenge. While considerable protein structural information is available, the most successful current methods use sequence information alone, in part because it has been a challenge to model the subtle structural changes accompanying sequence substitutions. Protein structure prediction networks such as AlphaFold model sequence-structure relationships very accurately, and we reasoned that if it were possible to specifically train such networks on binding data, more generalizable models could be created. We show that placing a classifier on top of the AlphaFold network and fine-tuning the combined network parameters for both classification and structure prediction accuracy leads to a model with strong generalizable performance on a wide range of Class I and Class II peptide-MHC interactions that approaches the overall performance of the state-of-the-art NetMHCpan sequence-based method. The peptide-MHC optimized model shows excellent performance in distinguishing binding and non-binding peptides to SH3 and PDZ domains. This ability to generalize well beyond the training set far exceeds that of sequence-only models and should be particularly powerful for systems where less experimental data are available.
- Published
- 2023
- Full Text
- View/download PDF
22. De novo design of luciferases using deep learning.
- Author
-
Yeh AH, Norn C, Kipnis Y, Tischer D, Pellock SJ, Evans D, Ma P, Lee GR, Zhang JZ, Anishchenko I, Coventry B, Cao L, Dauparas J, Halabiya S, DeWitt M, Carter L, Houk KN, and Baker D
- Subjects
- Biocatalysis, Catalytic Domain, Enzyme Stability, Hot Temperature, Luciferins metabolism, Luminescence, Oxidation-Reduction, Substrate Specificity, Deep Learning, Luciferases chemistry, Luciferases metabolism
- Abstract
De novo enzyme design has sought to introduce active sites and substrate-binding pockets that are predicted to catalyse a reaction of interest into geometrically compatible native scaffolds
1,2 , but has been limited by a lack of suitable protein structures and the complexity of native protein sequence-structure relationships. Here we describe a deep-learning-based 'family-wide hallucination' approach that generates large numbers of idealized protein structures containing diverse pocket shapes and designed sequences that encode them. We use these scaffolds to design artificial luciferases that selectively catalyse the oxidative chemiluminescence of the synthetic luciferin substrates diphenylterazine3 and 2-deoxycoelenterazine. The designed active sites position an arginine guanidinium group adjacent to an anion that develops during the reaction in a binding pocket with high shape complementarity. For both luciferin substrates, we obtain designed luciferases with high selectivity; the most active of these is a small (13.9 kDa) and thermostable (with a melting temperature higher than 95 °C) enzyme that has a catalytic efficiency on diphenylterazine (kcat /Km = 106 M-1 s-1 ) comparable to that of native luciferases, but a much higher substrate specificity. The creation of highly active and specific biocatalysts from scratch with broad applications in biomedicine is a key milestone for computational enzyme design, and our approach should enable generation of a wide range of luciferases and other enzymes., (© 2023. The Author(s).)- Published
- 2023
- Full Text
- View/download PDF
23. End-to-end learning of multiple sequence alignments with differentiable Smith-Waterman.
- Author
-
Petti S, Bhattacharya N, Rao R, Dauparas J, Thomas N, Zhou J, Rush AM, Koo P, and Ovchinnikov S
- Subjects
- Humans, Sequence Alignment, Neural Networks, Computer, Amino Acid Sequence, Algorithms, Proteins chemistry
- Abstract
Motivation: Multiple sequence alignments (MSAs) of homologous sequences contain information on structural and functional constraints and their evolutionary histories. Despite their importance for many downstream tasks, such as structure prediction, MSA generation is often treated as a separate pre-processing step, without any guidance from the application it will be used for., Results: Here, we implement a smooth and differentiable version of the Smith-Waterman pairwise alignment algorithm that enables jointly learning an MSA and a downstream machine learning system in an end-to-end fashion. To demonstrate its utility, we introduce SMURF (Smooth Markov Unaligned Random Field), a new method that jointly learns an alignment and the parameters of a Markov Random Field for unsupervised contact prediction. We find that SMURF learns MSAs that mildly improve contact prediction on a diverse set of protein and RNA families. As a proof of concept, we demonstrate that by connecting our differentiable alignment module to AlphaFold2 and maximizing predicted confidence, we can learn MSAs that improve structure predictions over the initial MSAs. Interestingly, the alignments that improve AlphaFold predictions are self-inconsistent and can be viewed as adversarial. This work highlights the potential of differentiable dynamic programming to improve neural network pipelines that rely on an alignment and the potential dangers of optimizing predictions of protein sequences with methods that are not fully understood., Availability and Implementation: Our code and examples are available at: https://github.com/spetti/SMURF., Supplementary Information: Supplementary data are available at Bioinformatics online., (© The Author(s) 2022. Published by Oxford University Press.)
- Published
- 2023
- Full Text
- View/download PDF
24. Scaffolding protein functional sites using deep learning.
- Author
-
Wang J, Lisanza S, Juergens D, Tischer D, Watson JL, Castro KM, Ragotte R, Saragovi A, Milles LF, Baek M, Anishchenko I, Yang W, Hicks DR, Expòsit M, Schlichthaerle T, Chun JH, Dauparas J, Bennett N, Wicky BIM, Muenks A, DiMaio F, Correia B, Ovchinnikov S, and Baker D
- Subjects
- Binding Sites, Catalysis, Protein Binding, Protein Folding, Protein Structure, Secondary, Deep Learning, Protein Engineering methods, Proteins chemistry
- Abstract
The binding and catalytic functions of proteins are generally mediated by a small number of functional residues held in place by the overall protein structure. Here, we describe deep learning approaches for scaffolding such functional sites without needing to prespecify the fold or secondary structure of the scaffold. The first approach, "constrained hallucination," optimizes sequences such that their predicted structures contain the desired functional site. The second approach, "inpainting," starts from the functional site and fills in additional sequence and structure to create a viable protein scaffold in a single forward pass through a specifically trained RoseTTAFold network. We use these two methods to design candidate immunogens, receptor traps, metalloproteins, enzymes, and protein-binding proteins and validate the designs using a combination of in silico and experimental tests.
- Published
- 2022
- Full Text
- View/download PDF
25. Interpreting Potts and Transformer Protein Models Through the Lens of Simplified Attention.
- Author
-
Bhattacharya N, Thomas N, Rao R, Dauparas J, Koo PK, Baker D, Song YS, and Ovchinnikov S
- Subjects
- Attention, Humans, Sequence Alignment, Computational Biology, Proteins genetics
- Abstract
The established approach to unsupervised protein contact prediction estimates coevolving positions using undirected graphical models. This approach trains a Potts model on a Multiple Sequence Alignment. Increasingly large Transformers are being pretrained on unlabeled, unaligned protein sequence databases and showing competitive performance on protein contact prediction. We argue that attention is a principled model of protein interactions, grounded in real properties of protein family data. We introduce an energy-based attention layer, factored attention, which, in a certain limit, recovers a Potts model, and use it to contrast Potts and Transformers. We show that the Transformer leverages hierarchical signal in protein family databases not captured by single-layer models. This raises the exciting possibility for the development of powerful structured models of protein family databases.
- Published
- 2022
26. Protein tertiary structure prediction and refinement using deep learning and Rosetta in CASP14.
- Author
-
Anishchenko I, Baek M, Park H, Hiranuma N, Kim DE, Dauparas J, Mansoor S, Humphreys IR, and Baker D
- Subjects
- Humans, Metagenome genetics, Sequence Analysis, Protein, Computational Biology methods, Deep Learning, Protein Structure, Tertiary, Proteins chemistry, Proteins genetics, Proteins metabolism, Software
- Abstract
The trRosetta structure prediction method employs deep learning to generate predicted residue-residue distance and orientation distributions from which 3D models are built. We sought to improve the method by incorporating as inputs (in addition to sequence information) both language model embeddings and template information weighted by sequence similarity to the target. We also developed a refinement pipeline that recombines models generated by template-free and template utilizing versions of trRosetta guided by the DeepAccNet accuracy predictor. Both benchmark tests and CASP results show that the new pipeline is a considerable improvement over the original trRosetta, and it is faster and requires less computing resources, completing the entire modeling process in a median < 3 h in CASP14. Our human group improved results with this pipeline primarily by identifying additional homologous sequences for input into the network. We also used the DeepAccNet accuracy predictor to guide Rosetta high-resolution refinement for submissions in the regular and refinement categories; although performance was quite good on a CASP relative scale, the overall improvements were rather modest in part due to missing inter-domain or inter-chain contacts., (© 2021 The Authors. Proteins: Structure, Function, and Bioinformatics published by Wiley Periodicals LLC.)
- Published
- 2021
- Full Text
- View/download PDF
27. Accurate prediction of protein structures and interactions using a three-track neural network.
- Author
-
Baek M, DiMaio F, Anishchenko I, Dauparas J, Ovchinnikov S, Lee GR, Wang J, Cong Q, Kinch LN, Schaeffer RD, Millán C, Park H, Adams C, Glassman CR, DeGiovanni A, Pereira JH, Rodrigues AV, van Dijk AA, Ebrecht AC, Opperman DJ, Sagmeister T, Buhlheller C, Pavkov-Keller T, Rathinaswamy MK, Dalwadi U, Yip CK, Burke JE, Garcia KC, Grishin NV, Adams PD, Read RJ, and Baker D
- Subjects
- ADAM Proteins chemistry, Amino Acid Sequence, Computer Simulation, Cryoelectron Microscopy, Crystallography, X-Ray, Databases, Protein, Membrane Proteins chemistry, Models, Molecular, Multiprotein Complexes chemistry, Neural Networks, Computer, Protein Subunits chemistry, Proteins physiology, Receptors, G-Protein-Coupled chemistry, Sphingosine N-Acyltransferase chemistry, Deep Learning, Protein Conformation, Protein Folding, Proteins chemistry
- Abstract
DeepMind presented notably accurate predictions at the recent 14th Critical Assessment of Structure Prediction (CASP14) conference. We explored network architectures that incorporate related ideas and obtained the best performance with a three-track network in which information at the one-dimensional (1D) sequence level, the 2D distance map level, and the 3D coordinate level is successively transformed and integrated. The three-track network produces structure predictions with accuracies approaching those of DeepMind in CASP14, enables the rapid solution of challenging x-ray crystallography and cryo-electron microscopy structure modeling problems, and provides insights into the functions of proteins of currently unknown structure. The network also enables rapid generation of accurate protein-protein complex models from sequence information alone, short-circuiting traditional approaches that require modeling of individual subunits followed by docking. We make the method available to the scientific community to speed biological research., (Copyright © 2021 The Authors, some rights reserved; exclusive licensee American Association for the Advancement of Science. No claim to original U.S. Government Works.)
- Published
- 2021
- Full Text
- View/download PDF
28. Improved protein structure refinement guided by deep learning based accuracy estimation.
- Author
-
Hiranuma N, Park H, Baek M, Anishchenko I, Dauparas J, and Baker D
- Subjects
- Algorithms, Caspases chemistry, Models, Biological, Models, Molecular, Protein Conformation, Software, Computational Biology methods, Deep Learning, Proteins chemistry
- Abstract
We develop a deep learning framework (DeepAccNet) that estimates per-residue accuracy and residue-residue distance signed error in protein models and uses these predictions to guide Rosetta protein structure refinement. The network uses 3D convolutions to evaluate local atomic environments followed by 2D convolutions to provide their global contexts and outperforms other methods that similarly predict the accuracy of protein structure models. Overall accuracy predictions for X-ray and cryoEM structures in the PDB correlate with their resolution, and the network should be broadly useful for assessing the accuracy of both predicted structure models and experimentally determined structures and identifying specific regions likely to be in error. Incorporation of the accuracy predictions at multiple stages in the Rosetta refinement protocol considerably increased the accuracy of the resulting protein structure models, illustrating how deep learning can improve search for global energy minima of biomolecules.
- Published
- 2021
- Full Text
- View/download PDF
29. Self-organization of swimmers drives long-range fluid transport in bacterial colonies.
- Author
-
Xu H, Dauparas J, Das D, Lauga E, and Wu Y
- Subjects
- Hydrodynamics, Bacillus subtilis physiology, Escherichia coli physiology, Flagella physiology, Microbiota physiology, Proteus mirabilis physiology
- Abstract
Motile subpopulations in microbial communities are believed to be important for dispersal, quest for food, and material transport. Here, we show that motile cells in sessile colonies of peritrichously flagellated bacteria can self-organize into two adjacent, centimeter-scale motile rings surrounding the entire colony. The motile rings arise from spontaneous segregation of a homogeneous swimmer suspension that mimics a phase separation; the process is mediated by intercellular interactions and shear-induced depletion. As a result of this self-organization, cells drive fluid flows that circulate around the colony at a constant peak speed of ~30 µm s
-1 , providing a stable and high-speed avenue for directed material transport at the macroscopic scale. Our findings present a unique form of bacterial self-organization that influences population structure and material distribution in colonies.- Published
- 2019
- Full Text
- View/download PDF
30. Helical micropumps near surfaces.
- Author
-
Dauparas J, Das D, and Lauga E
- Abstract
Recent experiments proposed to use confined bacteria in order to generate flows near surfaces. We develop a mathematical and a computational model of this fluid transport using a linear superposition of fundamental flow singularities. The rotation of a helical bacterial flagellum induces both a force and a torque on the surrounding fluid, both of which lead to a net flow along the surface. The combined flow is in general directed at an angle to the axis of the flagellar filament. The optimal pumping is thus achieved when bacteria are tilted with respect to the direction in which one wants to move the fluid, in good agreement with experimental results. We further investigate the optimal helical shapes to be used as micropumps near surfaces and show that bacterial flagella are nearly optimal, a result which could be relevant to the expansion of bacterial swarms.
- Published
- 2018
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.