1. In silico proof of principle of machine learning-based antibody design at unconstrained scale
- Author
-
Milena Pavlović, Michael Widrich, Günter Klambauer, Fridtjof Lund-Johansen, Greiff, Sepp Hochreiter, Lonneke Scheffer, Maria Chernigovskaya, Cédric R. Weber, Philippe Robert, Geir Kjetil Sandve, Brij Bhushan Mehta, Ingrid Hobæk Haff, Rahmad Akbar, Igor Snapkov, Andersen Jt, Andrei Slabodkin, Enkelejda Miho, and Frank R
- Subjects
Sequence ,Matching (graph theory) ,business.industry ,Computer science ,Deep learning ,Machine learning ,computer.software_genre ,Oracle ,Range (mathematics) ,Generative model ,Paratope ,Artificial intelligence ,Transfer of learning ,business ,computer - Abstract
Generative machine learning (ML) has been postulated to be a major driver in the computational design of antigen-specific monoclonal antibodies (mAb). However, efforts to confirm this hypothesis have been hindered by the infeasibility of testing arbitrarily large numbers of antibody sequences for their most critical design parameters: paratope, epitope, affinity, and developability. To address this challenge, we leveraged a lattice-based antibody-antigen binding simulation framework, which incorporates a wide range of physiological antibody binding parameters. The simulation framework enables both the computation of antibody-antigen 3D-structures as well as functions as an oracle for unrestricted prospective evaluation of the antigen specificity of ML-generated antibody sequences. We found that a deep generative model, trained exclusively on antibody sequence (1D) data can be used to design native-like conformational (3D) epitope-specific antibodies, matching or exceeding the training dataset in affinity and developability variety. Furthermore, we show that transfer learning enables the generation of high-affinity antibody sequences from low-N training data. Finally, we validated that the antibody design insight gained from simulated antibody-antigen binding data is applicable to experimental real-world data. Our work establishes a priori feasibility and the theoretical foundation of high-throughput ML-based mAb design.HighlightsA large-scale dataset of 70M [3 orders of magnitude larger than the current state of the art] synthetic antibody-antigen complexes, that reflect biological complexity, allows the prospective evaluation of antibody generative deep learningCombination of generative learning, synthetic antibody-antigen binding data, and prospective evaluation shows that deep learning driven antibody design and discovery at an unconstrained level is feasibleTransfer learning (low-N learning) coupled to generative learning shows that antibody-binding rules may be transferred across unrelated antibody-antigen complexesExperimental validation of antibody-design conclusions drawn from deep learning on synthetic antibody-antigen binding dataGraphical abstractWe leverage large synthetic ground-truth data to demonstrate the (A,B) unconstrained deep generative learning-based generation of native-like antibody sequences, (C) the prospective evaluation of conformational (3D) affinity, paratope-epitope pairs, and developability. (D) Finally, we show increased generation quality of low-N-based machine learning models via transfer learning.
- Published
- 2021