Back to Search
Start Over
The SAMPL5 challenge for embedded-cluster integral equation theory: solvation free energies, aqueous pK a , and cyclohexane-water log D.
- Source :
-
Journal of computer-aided molecular design [J Comput Aided Mol Des] 2016 Nov; Vol. 30 (11), pp. 1035-1044. Date of Electronic Publication: 2016 Aug 23. - Publication Year :
- 2016
-
Abstract
- We predict cyclohexane-water distribution coefficients (log D <subscript>7.4</subscript> ) for drug-like molecules taken from the SAMPL5 blind prediction challenge by the "embedded cluster reference interaction site model" (EC-RISM) integral equation theory. This task involves the coupled problem of predicting both partition coefficients (log P) of neutral species between the solvents and aqueous acidity constants (pK <subscript>a</subscript> ) in order to account for a change of protonation states. The first issue is addressed by calibrating an EC-RISM-based model for solvation free energies derived from the "Minnesota Solvation Database" (MNSOL) for both water and cyclohexane utilizing a correction based on the partial molar volume, yielding a root mean square error (RMSE) of 2.4 kcal mol <superscript>-1</superscript> for water and 0.8-0.9 kcal mol <superscript>-1</superscript> for cyclohexane depending on the parametrization. The second one is treated by employing on one hand an empirical pK <subscript>a</subscript> model (MoKa) and, on the other hand, an EC-RISM-derived regression of published acidity constants (RMSE of 1.5 for a single model covering acids and bases). In total, at most 8 adjustable parameters are necessary (2-3 for each solvent and two for the pK <subscript>a</subscript> ) for training solvation and acidity models. Applying the final models to the log D <subscript>7.4</subscript> dataset corresponds to evaluating an independent test set comprising other, composite observables, yielding, for different cyclohexane parametrizations, 2.0-2.1 for the RMSE with the first and 2.2-2.8 with the combined first and second SAMPL5 data set batches. Notably, a pure log P model (assuming neutral species only) performs statistically similarly for these particular compounds. The nature of the approximations and possible perspectives for future developments are discussed.
Details
- Language :
- English
- ISSN :
- 1573-4951
- Volume :
- 30
- Issue :
- 11
- Database :
- MEDLINE
- Journal :
- Journal of computer-aided molecular design
- Publication Type :
- Academic Journal
- Accession number :
- 27554666
- Full Text :
- https://doi.org/10.1007/s10822-016-9939-7