Back to Search
Start Over
Development and Validation of a Novel Variable Selection Technique with Application to QSAR Studies
- Source :
- Molecular Modeling and Prediction of Bioactivity ISBN: 9781461368571
- Publication Year :
- 2000
- Publisher :
- Springer US, 2000.
-
Abstract
- Variable selection is typically a time-consuming and ambiguous procedure in performing quantitative structure-activity relationship (QSAR) studies on over-determined (regressor-heavy) data sets. A variety of techniques including stepwise and partial least squares/principle components analysis (PLS/PCA) regression have been applied to this common problem. Other strategies, such as neural networks, cluster significance analysis, nearest neighbor, or genetic (function) or evolutionary algorithms have also evaluated. A simple random selection strategy that implements iterative generation of models, but directly avoids cross-over and mutation, has been developed and is implemented herein to rapidly identify from a pool of allowable variables, those which are most closely associated with a given response variable. The FRED (fast random elucidation of determinants) algorithm begins with a population of offspring (models) composed of a fixed, or variable, number of randomly selected variables. Iterative elimination of descriptors leads naturally to subsequent generations of more fit offspring (models). In contrast to common genetic and evolutionary algorithms, only those descriptors determined to contribute to the genetic make-up of less fit offspring (models) are eliminated from the descriptor pool. After every generation, a new random increment line search of the remaining descriptors initiates the development of the next generation of randomly constructed models. An optional algorithm with eliminates highly correlated descriptors in a stepwise manner prior to the development of the first generation of offspring greatly enhances the efficiency of the FRED algorithm. A FRED analysis on a set of antifilarials published by Selwood (n=31 compounds, k=53 descriptors) demonstrates the ability of the algorithm to rapidly identify determinants of biological outcome form a large collection of highly intercorrelated variables (see Figure 1.). A comparison of the results of a FRED analysis of the Selwood data set with those obtained using alternative algorithms reveals that this technique is capable of identifying the same “optimal” solutions in an efficient manner.
- Subjects :
- Quantitative structure–activity relationship
education.field_of_study
business.industry
Computer science
Population
Evolutionary algorithm
Contrast (statistics)
Feature selection
Pattern recognition
Partial least squares regression
Mutation (genetic algorithm)
Principal component analysis
Artificial intelligence
education
business
Subjects
Details
- ISBN :
- 978-1-4613-6857-1
- ISBNs :
- 9781461368571
- Database :
- OpenAIRE
- Journal :
- Molecular Modeling and Prediction of Bioactivity ISBN: 9781461368571
- Accession number :
- edsair.doi...........f0f6577d110247c61f1b6c83b121fc78
- Full Text :
- https://doi.org/10.1007/978-1-4615-4141-7_41