1. Using Ensembles to Classify Compounds for Drug Discovery
- Author
-
J. Kevin Lanctot, Jonathan W. Greene, Santosh Putta, and Christian Lemmen
- Subjects
Models, Molecular ,Property (programming) ,Quantitative Structure-Activity Relationship ,Hemostatics ,Ranking (information retrieval) ,Set (abstract data type) ,Artificial Intelligence ,Terminology as Topic ,Ensemble forecasting ,Chemistry ,business.industry ,Fingerprint (computing) ,Thrombin ,Pattern recognition ,Set cover problem ,General Medicine ,General Chemistry ,Mutual information ,Combinatorial chemistry ,Computer Science Applications ,Data set ,Databases as Topic ,Pharmaceutical Preparations ,Computational Theory and Mathematics ,Drug Design ,Artificial intelligence ,business ,Algorithms ,Software ,Information Systems - Abstract
This paper introduces Signal, a novel method for classifying activity against a small molecule drug target. Signal creates an ensemble, or collection, of meaningful descriptors chosen from a much larger property space. The method works with a variety of descriptor types, including fingerprints that represent four-point pharmacophores or shape descriptors. It also exploits information from both active and inactive compounds and generates predictive models suitable for high throughput screening data analysis. Given the fingerprints and activity data for a set of compounds, Signal is a two step process. The first step is to Evaluate the Descriptors: for each descriptor in the fingerprint, quantify and rank the correlation between the activity of the compounds and the presence of that descriptor. The second step is to Create an Ensemble Model: use the high ranking descriptors to create a model of activity against the biological target. For the first step, two possible ranking strategies were investigated: mutual information and chi-square. For the second step, two types of ensemble models were investigated: high ranking and a novel method called high ranking set cover. Of the four possible pairings, the combination of chi-square and high ranking set cover performed the best on a Thrombin data set.
- Published
- 2003
- Full Text
- View/download PDF