Back to Search
Start Over
Applications of artificial intelligence to alchemical free energy calculations in contemporary drug design
- Publication Year :
- 2023
- Publisher :
- University of Edinburgh, 2023.
-
Abstract
- The work presented in this thesis resides at the interface of alchemical free energy methods (AFE) and machine-learning (ML) in the context of computer-aided drug discovery (CADD). The majority of the work consists of explorations into regions of synergy between the individual parts. The overarching hypothesis behind this work is that although areas of high potential exist for standalone ML and AFE in CADD, an additional source of value can be found in areas where ML and AFE are combined in such a way that the new methodology profits from key strengths in either part. Physics-based AFE calculations have - over several decades - grown into precise and accurate sub-kcal·mol−1 (in terms of mean absolute error versus experimental measures) methods of predicting ligand-protein binding affinities which is the main driver of its popularity in project support in drug design workflows. Data-driven ML methods have seen a similar rapid development spurred by the exponential growth in computational hardware capabilities, but are generally still lacking in accuracy versus experimental measures of binding affinities to support drug design work. Contrastingly, however, the first relies mainly on physical rules in the form of statistical mechanics and the latter profits from interpolating signals within large training domains of data. After a historical and theoretical introduction into drug discovery, AFE calculations and ML methods, the thesis will highlight several studies that reflect the above hypothesis along multiple key points in the AFE workflow. Firstly, a methodology that combines AFE with ML has been developed to compute accurate absolute hydration free energies. The hybrid AFE/ML methodology was trained on a subset of the FreeSolv database, and retrospectively shown to outperform most submissions from the SAMPL4 competition. Compared to pure machine-learning approaches, AFE/ML yields more precise estimates of free energies of hydration, and requires a fraction of the training set size to outperform standalone AFE calculations. The ML-derived correction terms are further shown to be transferable to a range of related AFE simulation protocols. The approach may be used to inexpensively improve the accuracy of AFE calculations, and to flag molecules which will benefit the most from bespoke force field parameterisation efforts. Secondly, early investigations into data-driven AFE network generators has been performed. Because AFE calculations make use of alchemical transformations between ligands in congeneric series, practitioners are required to estimate an optimal combination of transformations for each series. AFE networks constitute the collection of edges chosen such that all ligands (nodes) are included in the network and where each edge is a AFE calculation. As there are a vast number of possible configurations for such networks this step in AFE setup suffers from several shortcomings such as scalability and transferability between AFE softwares. Although AFE network generation has been automated in the past, the algorithm depends mostly on expert-driven estimation of AFE transformation reliabilities. This work presents a first iteration of a data-driven alternative to the state-of-the-art using a graph siamese neural network architecture. A novel dataset, RBFE Space, is presented as a representative and transferable training domain for AFE ML research. The workflow presented in this thesis matches state-of-the-art AFE network generation performance with several key benefits. The workflow provides full transferability of the network generator because RBFE-Space is open-sourced and ready to be applied to other AFE softwares. Additionally, the deep learning model represents the first robust ML predictor of transformation reliabilities in AFE calculations. Finally, one major shortcoming of AFE calculations is its decreased reliability for transformations that are larger than ∼5 heavy atoms. The work reported in this thesis describes investigations into whether running charge, Van der Waals and bond parameter transformations individually (with variable λ allocation per step) offers an advantage to transforming all parameters in a single step, as is the current standard in most AFE workflows. Initial results in this work qualitatively suggest that the bound leg benefits from a MultiStep protocol over a onestep ("SoftCore") protocol, whereas the free leg does not show benefit. Further work was performed by Cresset that showed no observable benefit of the MultiStep approach over the Softcore approach. Several key findings are reported in this work that illustrate the benefits of dissecting an FEP approach and comparing the two approaches side-by-side.
- Subjects :
- computational chemistry
computer-aided drug design
CADD
AFE science
Subjects
Details
- Language :
- English
- Database :
- British Library EThOS
- Publication Type :
- Dissertation/ Thesis
- Accession number :
- edsble.884197
- Document Type :
- Electronic Thesis or Dissertation
- Full Text :
- https://doi.org/10.7488/era/3311