1. Cluster-based design in environmental QSAR
- Author
-
Eriksson, L., Lennart, E., Johansson, E., Wold, S., Müller, M., and Publica
- Subjects
multivariate design ,multivariate QSAR ,PLS ,soil sorption - Abstract
In QSAR analysis in environmental sciences adverse effects of chemicals released to the environment are modelled and predicted as a function of the chemical properties of the pollutants. Usually, the set of compounds under study contains several classes of substances, i.e., a more or less strongly clustered set. It is then needed to ensure that the selected training set comprises compounds representing all those chemical classes. Multivariate design in the principal properties of the compound classes is usually appropriate for selecting a meaningful training set. However, with clustered data, often seen in environmental chemistry and toxicology, a single multivariate design may be suboptimal. This because of the risk of ignoring small classes with few members and only selecting training set compounds from the largest classes. In this paper, a procedure for training set selection recognizing clustering is proposed. Here, when non-selective biological or environmental responses are model led, local multivariate designs are constructed within each cluster (class). The chosen compounds arising from the local designs are finally united in the overall training set, which thus will contain members from all clusters. Our illustration deals with a set of 66 compounds, categorized into five classes, for which the soil sorption coefficient is available. The training set selection is discussed, followed by multivariate QSAR modelling, model validation and interpretation, and predictions for the test set.
- Published
- 1997