Back to Search
Start Over
Systematic Modeling of log D7.4 Based on Ensemble Machine Learning, Group Contribution, and Matched Molecular Pair Analysis
- Source :
- Journal of Chemical Information and Modeling. 60:63-76
- Publication Year :
- 2019
- Publisher :
- American Chemical Society (ACS), 2019.
-
Abstract
- Lipophilicity, as evaluated by the n-octanol/buffer solution distribution coefficient at pH = 7.4 (log D7.4), is a major determinant of various absorption, distribution, metabolism, elimination, and toxicology (ADMET) parameters of drug candidates. In this study, we developed several quantitative structure–property relationship (QSPR) models to predict log D7.4 based on a large and structurally diverse data set. Eight popular machine learning algorithms were employed to build the prediction models with 43 molecular descriptors selected by a wrapper feature selection method. The results demonstrated that XGBoost yielded better prediction performance than any other single model (RT2 = 0.906 and RMSET = 0.395). Moreover, the consensus model from the top three models could continue to improve the prediction performance (RT2 = 0.922 and RMSET = 0.359). The robustness, reliability, and generalization ability of the models were strictly evaluated by the Y-randomization test and applicability domain analysis. Mor...
- Subjects :
- Quantitative structure–activity relationship
010304 chemical physics
Computer science
Generalization
business.industry
General Chemical Engineering
Pattern recognition
Feature selection
General Chemistry
Library and Information Sciences
01 natural sciences
Ensemble learning
0104 chemical sciences
Computer Science Applications
010404 medicinal & biomolecular chemistry
Robustness (computer science)
Molecular descriptor
0103 physical sciences
Artificial intelligence
Matched molecular pair analysis
business
Applicability domain
Subjects
Details
- ISSN :
- 1549960X and 15499596
- Volume :
- 60
- Database :
- OpenAIRE
- Journal :
- Journal of Chemical Information and Modeling
- Accession number :
- edsair.doi...........8450493ed77a37fd73def6b00bac7e0d