Back to Search Start Over

The SAMPL5 challenge for embedded-cluster integral equation theory: solvation free energies, aqueous pK a , and cyclohexane-water log D.

Authors :
Tielker N
Tomazic D
Heil J
Kloss T
Ehrhart S
Güssregen S
Schmidt KF
Kast SM
Source :
Journal of computer-aided molecular design [J Comput Aided Mol Des] 2016 Nov; Vol. 30 (11), pp. 1035-1044. Date of Electronic Publication: 2016 Aug 23.
Publication Year :
2016

Abstract

We predict cyclohexane-water distribution coefficients (log D <subscript>7.4</subscript> ) for drug-like molecules taken from the SAMPL5 blind prediction challenge by the "embedded cluster reference interaction site model" (EC-RISM) integral equation theory. This task involves the coupled problem of predicting both partition coefficients (log P) of neutral species between the solvents and aqueous acidity constants (pK <subscript>a</subscript> ) in order to account for a change of protonation states. The first issue is addressed by calibrating an EC-RISM-based model for solvation free energies derived from the "Minnesota Solvation Database" (MNSOL) for both water and cyclohexane utilizing a correction based on the partial molar volume, yielding a root mean square error (RMSE) of 2.4 kcal mol <superscript>-1</superscript> for water and 0.8-0.9 kcal mol <superscript>-1</superscript> for cyclohexane depending on the parametrization. The second one is treated by employing on one hand an empirical pK <subscript>a</subscript> model (MoKa) and, on the other hand, an EC-RISM-derived regression of published acidity constants (RMSE of 1.5 for a single model covering acids and bases). In total, at most 8 adjustable parameters are necessary (2-3 for each solvent and two for the pK <subscript>a</subscript> ) for training solvation and acidity models. Applying the final models to the log D <subscript>7.4</subscript> dataset corresponds to evaluating an independent test set comprising other, composite observables, yielding, for different cyclohexane parametrizations, 2.0-2.1 for the RMSE with the first and 2.2-2.8 with the combined first and second SAMPL5 data set batches. Notably, a pure log P model (assuming neutral species only) performs statistically similarly for these particular compounds. The nature of the approximations and possible perspectives for future developments are discussed.

Details

Language :
English
ISSN :
1573-4951
Volume :
30
Issue :
11
Database :
MEDLINE
Journal :
Journal of computer-aided molecular design
Publication Type :
Academic Journal
Accession number :
27554666
Full Text :
https://doi.org/10.1007/s10822-016-9939-7