1. Identification of Diverse Database Subsets using Property-Based and Fragment-Based Molecular Descriptions
- Author
-
Michael H. Charlton, Geoffrey M. Downs, Peter Willett, Roger Lahana, John M. Barnard, John D. Holliday, Florence Casset, Mark Ashton, and Dominique Gorse
- Subjects
Pharmacology ,Identification (information) ,Fragment (logic) ,Property (programming) ,Chemistry ,Bioactive molecules ,Structural diversity ,Computational biology ,Data mining ,computer.software_genre ,computer ,Selection (genetic algorithm) - Abstract
This paper reports a comparison of calculated molecular properties and of 2D fragment bit-strings when used for the selection of structurally diverse subsets of a file of 44295 compounds. MaxMin dissimilarity-based selection and k-means cluster-based selection are used to select subsets containing between 1% and 20% of the file. Investigation of the numbers of bioactive molecules in the selected subsets suggest: that the MaxMin subsets are noticeably superior to the k-means subsets; that the property-based descriptors are marginally superior to the fragment-based descriptors; and that both approaches are noticeably superior to random selection.
- Published
- 2002
- Full Text
- View/download PDF