Back to Search Start Over

Systematic Modeling of log D7.4 Based on Ensemble Machine Learning, Group Contribution, and Matched Molecular Pair Analysis

Authors :
Aiping Lu
Tingjun Hou
Lu Liu
Li Fu
Pan Li
Junjie Ding
Yong-Huan Yun
Dong-Sheng Cao
Zhi-Jiang Yang
Source :
Journal of Chemical Information and Modeling. 60:63-76
Publication Year :
2019
Publisher :
American Chemical Society (ACS), 2019.

Abstract

Lipophilicity, as evaluated by the n-octanol/buffer solution distribution coefficient at pH = 7.4 (log D7.4), is a major determinant of various absorption, distribution, metabolism, elimination, and toxicology (ADMET) parameters of drug candidates. In this study, we developed several quantitative structure–property relationship (QSPR) models to predict log D7.4 based on a large and structurally diverse data set. Eight popular machine learning algorithms were employed to build the prediction models with 43 molecular descriptors selected by a wrapper feature selection method. The results demonstrated that XGBoost yielded better prediction performance than any other single model (RT2 = 0.906 and RMSET = 0.395). Moreover, the consensus model from the top three models could continue to improve the prediction performance (RT2 = 0.922 and RMSET = 0.359). The robustness, reliability, and generalization ability of the models were strictly evaluated by the Y-randomization test and applicability domain analysis. Mor...

Details

ISSN :
1549960X and 15499596
Volume :
60
Database :
OpenAIRE
Journal :
Journal of Chemical Information and Modeling
Accession number :
edsair.doi...........8450493ed77a37fd73def6b00bac7e0d