1. PV LOO -Based Training Set Selection Improves the External Predictability of QSAR/QSPR Models.
- Author
-
Dong Y, Xiang B, and Du D
- Subjects
- Algorithms, Reproducibility of Results, Chemistry methods, Drug Design, Models, Chemical, Quantitative Structure-Activity Relationship
- Abstract
In QSAR/QSPR modeling, the indispensable way to validate the predictability of a model is to perform its statistical external validation. It is common that a division algorithm should be used to select training sets from chemical compound libraries or collections prior to external validations. In this study, a division method based on the posterior variance of leave-one-out cross-validation (PV
LOO ) of the Gaussian process (GP) has been developed with the goal of producing more predictive models. Four structurally diverse data sets of good quality are collected from the literature and then redeveloped and validated on the basis of training set selection methods, namely, four kinds of PVLOO -based training set selection methods with three types of covariance functions (squared exponential, rational quadratic, and neural network covariance functions), the Kennard-Stone algorithm, and random division. The root mean squared error (RMSE) of external validation reported for each model serves as a basis for the final comparison. The results of this study indicate that the training sets with higher values of PVLOO have statistically better external predictability than the training sets generated from other division methods discussed here. These findings could be explained by proposing that the PVLOO value of GP could indicate the mechanism diversity of a specific compound in QSAR/QSPR data sets.- Published
- 2017
- Full Text
- View/download PDF