Author: "Valerie J. Gillet" / Database: OpenAIRE - Searchworks@Jio Institute Digital Library Search Results

1. An Analysis of Classification Approaches for Hit Song Prediction Using Engineered Metadata Features with Lyrics and Audio Features

Author: Mengyisong Zhao, Morgan Harvey, David Cameron, Frank Hopfgartner, and Valerie J. Gillet
Published: 2023
Full Text: View/download PDF

2. Amyloid binding and beyond: a new approach for Alzheimer's disease drug discovery targeting Aβo–PrPCbinding and downstream pathways

Author: Jennifer C. Louth, Imane Ghafir El Idrissi, James D Grayson, Valerie J. Gillet, Sasha Stimpson, Emma Mead, Antonio de la Vega de León, Margarita Segovia Roldan, Gary Sharman, Claire J. Garwood, Jim A. Thomas, Matthew P. Baumgartner, Amélia P. Rauter, Beining Chen, Samuel J Dawes, Joanna Wolak, Charlotte Dunbar, Ke Ning, Anastasia Zhuravleva, Cleide dos Santos Souza, Benjamin M. Partridge, Nicola Antonio Colabufo, and David A. Evans
Subjects: 0303 health sciences, Amyloid, Amyloid β, Drug discovery, Chemistry, HEK 293 cells, General Chemistry, Cortical neurons, Disease, Inhibitory postsynaptic potential, Cell biology, 03 medical and health sciences, 0302 clinical medicine, Receptor, 030217 neurology & neurosurgery, 030304 developmental biology
Abstract: Amyloid β oligomers (Aβo) are the main toxic species in Alzheimer's disease, which have been targeted for single drug treatment with very little success. In this work we report a new approach for identifying functional Aβo binding compounds. A tailored library of 971 fluorine containing compounds was selected by a computational method, developed to generate molecular diversity. These compounds were screened for Aβo binding by a combined 19F and STD NMR technique. Six hits were evaluated in three parallel biochemical and functional assays. Two compounds disrupted Aβo binding to its receptor PrPC in HEK293 cells. They reduced the pFyn levels triggered by Aβo treatment in neuroprogenitor cells derived from human induced pluripotent stem cells (hiPSC). Inhibitory effects on pTau production in cortical neurons derived from hiPSC were also observed. These drug-like compounds connect three of the pillars in Alzheimer's disease pathology, i.e. prion, Aβ and Tau, affecting three different pathways through specific binding to Aβo and are, indeed, promising candidates for further development.
Published: 2021
Full Text: View/download PDF

3. Enhancing reaction-based de novo design using a multi-label reaction class recommender

Author: Valerie J. Gillet, Gian Marco Ghiandoni, James Webster, Beining Chen, James E. A. Wallace, Dimitar Hristozov, and Michael J. Bodkin
Subjects: Computer science, Databases, Pharmaceutical, Chemistry, Pharmaceutical, Multi-label classification, Chemistry Techniques, Synthetic, computer.software_genre, 01 natural sciences, Article, Set (abstract data type), Machine Learning, Small Molecule Libraries, 03 medical and health sciences, External data, Drug Discovery, Reaction class recommender, Humans, Computer Simulation, Physical and Theoretical Chemistry, 030304 developmental biology, 0303 health sciences, Filter (signal processing), Limiting, Class (biology), 0104 chemical sciences, Computer Science Applications, 010404 medicinal & biomolecular chemistry, Reaction vector, Transformation (function), De novo design, Drug Design, Computer-Aided Design, Data mining, computer, Combinatorial explosion, Algorithms, Databases, Chemical
Abstract: Reaction-based de novo design refers to the in-silico generation of novel chemical structures by combining reagents using structural transformations derived from known reactions. The driver for using reaction-based transformations is to increase the likelihood of the designed molecules being synthetically accessible. We have previously described a reaction-based de novo design method based on reaction vectors which are transformation rules that are encoded automatically from reaction databases. A limitation of reaction vectors is that they account for structural changes that occur at the core of a reaction only, and they do not consider the presence of competing functionalities that can compromise the reaction outcome. Here, we present the development of a Reaction Class Recommender to enhance the reaction vector framework. The recommender is intended to be used as a filter on the reaction vectors that are applied during de novo design to reduce the combinatorial explosion of in-silico molecules produced while limiting the generated structures to those which are most likely to be synthesisable. The recommender has been validated using an external data set extracted from the recent medicinal chemistry literature and in two simulated de novo design experiments. Results suggest that the use of the recommender drastically reduces the number of solutions explored by the algorithm while preserving the chance of finding relevant solutions and increasing the global synthetic accessibility of the designed molecules.
Published: 2020

4. Virtual Screening Based on Electrostatic Similarity and Flexible Ligands

Author: Savíns Puertas-Martín, Juana L. Redondo, Antonio J. Banegas-Luna, Ester M. Garzón, Horacio Pérez-Sánchez, Valerie J. Gillet, and Pilar M. Ortigosa
Published: 2022
Full Text: View/download PDF

5. RENATE: A Pseudo-retrosynthetic Tool for Synthetically Accessible de novo Design

Author: Dimitar Hristozov, Valerie J. Gillet, James Webster, Beining Chen, Michael J. Bodkin, James E. A. Wallace, and Gian Marco Ghiandoni
Subjects: Theoretical computer science, Computer science, Organic Chemistry, Design tool, Context (language use), Ligands, Computer Science Applications, Market fragmentation, Set (abstract data type), Fragment (logic), Structural Biology, Product (mathematics), Drug Discovery, Molecular Medicine, Heuristics, Combinatorial explosion, Algorithms, Retrospective Studies
Abstract: Reaction-based de novo design refers to the generation of synthetically accessible molecules using transformation rules extracted from known reactions in the literature. In this context, we have previously described the extraction of reaction vectors from a reactions database and their coupling with a structure generation algorithm for the generation of novel molecules from a starting material. An issue when designing molecules from a starting material is the combinatorial explosion of possible product molecules that can be generated, especially for multistep syntheses. Here, we present the development of RENATE, a reaction-based de novo design tool, which is based on a pseudo-retrosynthetic fragmentation of a reference ligand and an inside-out approach to de novo design. The reference ligand is fragmented; each fragment is used to search for similar fragments as building blocks; the building blocks are combined into products using reaction vectors; and a synthetic route is suggested for each product molecule. The RENATE methodology is presented followed by a retrospective validation to recreate a set of approved drugs. Results show that RENATE can generate very similar or even identical structures to the corresponding input drugs, hence validating the fragmentation, search, and design heuristics implemented in the tool.
Published: 2021

6. Development and Application of a Data-Driven Reaction Classification Model: Comparison of an Electronic Lab Notebook and Medicinal Chemistry Literature

Author: Beining Chen, Valerie J. Gillet, Dimitar Hristozov, Michael J. Bodkin, James E. A. Wallace, Gian Marco Ghiandoni, and James Webster
Subjects: Molecular Structure, 010304 chemical physics, Computer science, business.industry, Chemistry, Pharmaceutical, General Chemical Engineering, General Chemistry, Electronic lab notebook, Library and Information Sciences, 01 natural sciences, 0104 chemical sciences, Computer Science Applications, Data-driven, Task (project management), Machine Learning, 010404 medicinal & biomolecular chemistry, Models, Chemical, 0103 physical sciences, Software engineering, business, Databases, Chemical
Abstract: Reaction classification has often been considered an important task for many different applications, and has traditionally been accomplished using hand-coded rule-based approaches. However, the availability of large collections of reactions enables data-driven approaches to be developed. We present the development and validation of a 336-class machine learning-based classification model integrated within a Conformal Prediction (CP) framework to associate reaction class predictions with confidence estimations. We also propose a data-driven approach for "dynamic" reaction fingerprinting to maximize the effectiveness of reaction encoding, as well as developing a novel reaction classification system that organizes labels into four hierarchical levels (SHREC: Sheffield Hierarchical REaction Classification). We show that the performance of the CP augmented model can be improved by defining confidence thresholds to detect predictions that are less likely to be false. For example, the external validation of the model reports 95% of predictions as correct by filtering out less than 15% of the uncertain classifications. The application of the model is demonstrated by classifying two reaction data sets: one extracted from an industrial ELN and the other from the medicinal chemistry literature. We show how confidence estimations and class compositions across different levels of information can be used to gain immediate insights on the nature of reaction collections and hidden relationships between reaction classes.
Published: 2019
Full Text: View/download PDF

7. Author response: Identification of compounds that rescue otic and myelination defects in the zebrafish adgrg6 (gpr126) mutant

Author: Antonio de la Vega de León, Anzar Asad, Sarah Baxendale, Tanya T. Whitfield, Leila Abbas, Valerie J. Gillet, Celia J. Holdsworth, Giselle R. Wiggin, and Elvira Diamantopoulou
Subjects: biology, Mutant, Myelination defects, Identification (biology), biology.organism_classification, Zebrafish, Cell biology
Published: 2019
Full Text: View/download PDF

8. Applications of Chemoinformatics in Drug Discovery

Author: Valerie J. Gillet
Subjects: Virtual screening, Drug discovery, Cheminformatics, Computer science, Computational biology, Pharmacophore
Published: 2019
Full Text: View/download PDF

9. Glossary of terms used in computational drug design, part II (IUPAC Recommendations 2015)

Author: Tudor I. Oprea, David A. Winkler, Valerie J. Gillet, György G. Ferenczy, Ruben Abagyan, Johan Ulander, N. S. Zefirov, and Yvonne C. Martin
Subjects: 0301 basic medicine, Quantitative structure–activity relationship, Glossary, Chemistry, Drug discovery, General Chemical Engineering, 0206 medical engineering, Chemical nomenclature, 02 engineering and technology, General Chemistry, Medicinal chemistry, 03 medical and health sciences, 030104 developmental biology, Cheminformatics, Organic chemistry, 020602 bioinformatics
Abstract: Computational drug design is a rapidly changing field that plays an increasingly important role in medicinal chemistry. Since the publication of the first glossary in 1997, substantial changes have occurred in both medicinal chemistry and computational drug design. This has resulted in the use of many new terms and the consequent necessity to update the previous glossary. For this purpose a Working Party of eight experts was assembled. They produced explanatory definitions of more than 150 new and revised terms.
Published: 2016
Full Text: View/download PDF

10. Identification of compounds that rescue otic and myelination defects in the zebrafish adgrg6 (gpr126) mutant

Author: Celia J. Holdsworth, Antonio de la Vega de León, Anzar Asad, Elvira Diamantopoulou, Sarah Baxendale, Tanya T. Whitfield, Leila Abbas, Valerie J. Gillet, and Giselle R. Wiggin
Subjects: inner ear, 0301 basic medicine, QH301-705.5, Science, Mutant, Morphogenesis, medicine.disease_cause, chemoinformatics, General Biochemistry, Genetics and Molecular Biology, 03 medical and health sciences, 0302 clinical medicine, Gene expression, medicine, Biology (General), Allele, Receptor, Zebrafish, Adgrg6 (Gpr126), 030304 developmental biology, 0303 health sciences, Mutation, General Immunology and Microbiology, biology, General Neuroscience, phenotypic screening, myelination, General Medicine, adhesion GPCR, biology.organism_classification, 3. Good health, Cell biology, 030104 developmental biology, biology.protein, Medicine, Versican, Developmental biology, 030217 neurology & neurosurgery
Abstract: Adgrg6 (Gpr126) is an adhesion class G protein-coupled receptor with a conserved role in myelination of the peripheral nervous system. In the zebrafish, mutation of adgrg6 also results in defects in the inner ear: otic tissue fails to down-regulate versican gene expression and morphogenesis is disrupted. We have designed a whole-animal screen that tests for rescue of both up- and down-regulated gene expression in mutant embryos, together with analysis of weak and strong alleles. From a screen of 3120 structurally diverse compounds, we have identified 68 that reduce versican b expression in the adgrg6 mutant ear, 41 of which also restore myelin basic protein gene expression in Schwann cells of mutant embryos. Nineteen compounds unable to rescue a strong adgrg6 allele provide candidates for molecules that interact directly with the Adgrg6 receptor. Our pipeline provides a powerful approach for identifying compounds that modulate GPCR activity, with potential impact for future drug design.
Published: 2019
Full Text: View/download PDF

11. Identification of compounds that rescue otic and myelination defects in the zebrafish

Author: Elvira, Diamantopoulou, Sarah, Baxendale, Antonio, de la Vega de León, Anzar, Asad, Celia J, Holdsworth, Leila, Abbas, Valerie J, Gillet, Giselle R, Wiggin, and Tanya T, Whitfield
Subjects: inner ear, Embryo, Nonmammalian, Molecular Structure, phenotypic screening, Gene Expression Regulation, Developmental, myelination, Zebrafish Proteins, adhesion GPCR, chemoinformatics, Receptors, G-Protein-Coupled, Small Molecule Libraries, Ear, Inner, Mutation, Peripheral Nervous System, Animals, Proteoglycans, Schwann Cells, Myelin Sheath, Zebrafish, Adgrg6 (Gpr126), Signal Transduction, Research Article, Developmental Biology, Neuroscience
Abstract: Adgrg6 (Gpr126) is an adhesion class G protein-coupled receptor with a conserved role in myelination of the peripheral nervous system. In the zebrafish, mutation of adgrg6 also results in defects in the inner ear: otic tissue fails to down-regulate versican gene expression and morphogenesis is disrupted. We have designed a whole-animal screen that tests for rescue of both up- and down-regulated gene expression in mutant embryos, together with analysis of weak and strong alleles. From a screen of 3120 structurally diverse compounds, we have identified 68 that reduce versican b expression in the adgrg6 mutant ear, 41 of which also restore myelin basic protein gene expression in Schwann cells of mutant embryos. Nineteen compounds unable to rescue a strong adgrg6 allele provide candidates for molecules that may interact directly with the Adgrg6 receptor. Our pipeline provides a powerful approach for identifying compounds that modulate GPCR activity, with potential impact for future drug design.
Published: 2019

12. Effect of missing data on multitask prediction methods

Author: Valerie J. Gillet, Antonio de la Vega de León, and Beining Chen
Subjects: 0301 basic medicine, Computer science, Multitask prediction, Missing data, Bayesian probability, Library and Information Sciences, Machine learning, computer.software_genre, Field (computer science), lcsh:Chemistry, 03 medical and health sciences, Prediction methods, Deep neural networks, Physical and Theoretical Chemistry, Training set, Macau, lcsh:T58.5-58.64, business.industry, lcsh:Information technology, Sparse data sets, Computer Graphics and Computer-Aided Design, Computer Science Applications, 030104 developmental biology, Probabilistic matrix factorization, lcsh:QD1-999, Cheminformatics, Artificial intelligence, business, computer, Research Article
Abstract: There has been a growing interest in multitask prediction in chemoinformatics, helped by the increasing use of deep neural networks in this field. This technique is applied to multitarget data sets, where compounds have been tested against different targets, with the aim of developing models to predict a profile of biological activities for a given compound. However, multitarget data sets tend to be sparse; i.e., not all compound-target combinations have experimental values. There has been little research on the effect of missing data on the performance of multitask methods. We have used two complete data sets to simulate sparseness by removing data from the training set. Different models to remove the data were compared. These sparse sets were used to train two different multitask methods, deep neural networks and Macau, which is a Bayesian probabilistic matrix factorization technique. Results from both methods were remarkably similar and showed that the performance decrease because of missing data is at first small before accelerating after large amounts of data are removed. This work provides a first approximation to assess how much data is required to produce good performance in multitask prediction exercises. Electronic supplementary material The online version of this article (10.1186/s13321-018-0281-z) contains supplementary material, which is available to authorized users.
Published: 2018

13. Chemoinformatics at the University of Sheffield 2002-2014

Author: Peter Willett, Valerie J. Gillet, and John D. Holliday
Subjects: Engineering, Information retrieval, Universities, business.industry, Organic Chemistry, Fingerprint (computing), Computational Biology, Data science, Computer Science Applications, Structural Biology, Cheminformatics, Drug Discovery, Research studies, Molecular Medicine, Computer Simulation, business, Databases, Chemical
Abstract: This paper summarises work in chemoinformatics carried out in the Information School of the University of Sheffield during the period 2002-2014. Research studies are described on fingerprint-based similarity searching, data fusion, applications of reduced graphs and pharmacophore mapping, and on the School's teaching in chemoinformatics.
Published: 2015
Full Text: View/download PDF

14. New structural alerts for Ames mutagenicity discovered using emerging pattern mining techniques

Author: Lilia Fisk, Valerie J. Gillet, William C. Drewe, Mukesh Patel, Jonathan D. Vessey, Steven J. Canipa, Laurence Coquin, Richard Sherhod, and Jeffrey Plante
Subjects: ComputingMethodologies_PATTERNRECOGNITION, Computer science, Health, Toxicology and Mutagenesis, Toxicology, computer.software_genre, Nexus (standard), computer, Data science, Expert system
Abstract: Emerging pattern mining techniques have been applied to datasets of Ames mutagens. The discovered patterns give rise to clusters of compounds from large and biased datasets which are used to develop new structural alerts for mutagenicity in the Derek Nexus expert system.
Published: 2015
Full Text: View/download PDF

15. Bioisosteric Replacements Extracted from High-Quality Structures in the Protein Databank

Author: Matthew P. Seddon, Valerie J. Gillet, and David A. Cosgrove
Subjects: 0301 basic medicine, Pharmacology, Chemistry, Drug discovery, Protein Conformation, Organic Chemistry, Protein Data Bank (RCSB PDB), Computational Biology, Proteins, Computational biology, Ligands, 01 natural sciences, Biochemistry, 0104 chemical sciences, 010404 medicinal & biomolecular chemistry, 03 medical and health sciences, 030104 developmental biology, Drug Discovery, Molecular Medicine, General Pharmacology, Toxicology and Pharmaceutics, Pharmacophore, Databases, Protein, Algorithms
Abstract: Bioisosterism is an important concept in the lead optimisation phase of drug discovery where the aim is to make modifications to parts of a molecule in order to improve some properties while maintaining others. We present an analysis of bioisosteric fragments extracted from the ligands in an established data set consisting of 121 protein targets. A pairwise analysis is carried out of all ligands for a given target. The ligands are fragmented using the BRICS fragmentation scheme and a pair of fragments is deemed to be bioisosteric if they occupy a similar volume of the protein binding site. We consider two levels of generality, one which does not consider the number of attachment points in the fragments and a more restricted case in which both fragments are required to have the same number of attachments. We investigate the extent to which the bioisosteric pairs that are found are common across different target.
Published: 2017

16. Investigation of the Use of Spectral Clustering for the Analysis of Molecular Data

Author: Sonny Gan, Valerie J. Gillet, Eleanor J. Gardiner, and David A. Cosgrove
Subjects: Clustering high-dimensional data, Fuzzy clustering, Cyclooxygenase 2 Inhibitors, business.industry, General Chemical Engineering, Statistics as Topic, Correlation clustering, Single-linkage clustering, Pattern recognition, General Chemistry, Library and Information Sciences, computer.software_genre, Spectral clustering, Computer Science Applications, CURE data clustering algorithm, Drug Discovery, Cluster Analysis, Artificial intelligence, Data mining, business, Cluster analysis, computer, Algorithms, k-medians clustering, Mathematics
Abstract: Spectral clustering involves placing objects into clusters based on the eigenvectors and eigenvalues of an associated matrix. The technique was first applied to molecular data by Brewer [J. Chem. Inf. Model. 2007, 47, 1727-1733] who demonstrated its use on a very small dataset of 125 COX-2 inhibitors. We have determined suitable parameters for spectral clustering using a wide variety of molecular descriptors and several datasets of a few thousand compounds and compared the results of clustering using a nonoverlapping version of Brewer's use of Sarker and Boyer's algorithm with that of Ward's and k-means clustering. We then replaced the exact eigendecomposition method with two different approximate methods and concluded that Singular Value Decomposition is the most appropriate method for clustering larger compound collections of up to 100,000 compounds. We have also used spectral clustering with the Tversky coefficient to generate two sets of clusters linked by a common set of eigenvalues and have used this novel approach to cluster sets of fragments such as those used in fragment-based drug design.
Published: 2014
Full Text: View/download PDF

17. Automating Knowledge Discovery for Toxicity Prediction Using Jumping Emerging Pattern Mining

Author: Valerie J. Gillet, Jonathan D. Vessey, Richard Sherhod, and Philip N. Judson
Subjects: Process (engineering), General Chemical Engineering, Library and Information Sciences, Machine learning, computer.software_genre, medicine.disease_cause, Pattern Recognition, Automated, Set (abstract data type), Jumping, Knowledge extraction, medicine, Cluster Analysis, Data Mining, Humans, Cluster analysis, Toxicity data, business.industry, Estrogens, General Chemistry, Ether-A-Go-Go Potassium Channels, Computer Science Applications, Identification (information), Artificial intelligence, Data mining, business, computer, Mutagens
Abstract: The design of new alerts, that is, collections of structural features observed to result in toxicological activity, can be a slow process and may require significant input from toxicology and chemistry experts. A method has therefore been developed to help automate alert identification by mining descriptions of activating structural features directly from toxicity data sets. The method is based on jumping emerging pattern mining which is applied to a set of toxic and nontoxic compounds that are represented using atom pair descriptors. Using the resulting jumping emerging patterns, it is possible to cluster toxic compounds into groups defined by the presence of shared structural features and to arrange the clusters into hierarchies. The methodology has been tested on a number of data sets for Ames mutagenicity, oestrogenicity, and hERG channel inhibition end points. These tests have shown the method to be effective at clustering the data sets around minimal jumping-emerging structural patterns and finding descriptions of potentially activating structural features. Furthermore, the mined structural features have been shown to be related to some of the known alerts for all three tested end points.
Published: 2012
Full Text: View/download PDF

18. Compression of Molecular Interaction Fields Using Wavelet Thumbnails: Application to Molecular Alignment

Author: Stefan Senger, Eleanor J. Gardiner, Valerie J. Gillet, and Richard L. Martin
Subjects: Models, Molecular, Property (programming), Computer science, General Chemical Engineering, Drug Evaluation, Preclinical, Molecular Conformation, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Data_CODINGANDINFORMATIONTHEORY, Library and Information Sciences, Ligands, computer.software_genre, User-Computer Interface, Wavelet, Signal processing, business.industry, Proteins, Wavelet transform, Pattern recognition, General Chemistry, Grid, Computer Science Applications, Data point, Area Under Curve, Data mining, Artificial intelligence, business, computer, Image compression, Data compression
Abstract: Molecular interaction fields provide a useful description of ligand binding propensity and have found widespread use in computer-aided drug design, for example, to characterize protein binding sites and in small molecular applications, such as three-dimensional quantitative structure-activity relationships, physicochemical property prediction, and virtual screening. However, the grids on which the field data are stored are typically very large, consisting of thousands of data points, which make them cumbersome to store and manipulate. The wavelet transform is a commonly used data compression technique, for example, in signal processing and image compression. Here we use the wavelet transform to encode molecular interaction fields as wavelet thumbnails, which represent the original grid data in significantly reduced volumes. We describe a method for aligning wavelet thumbnails based on extracting extrema from the thumbnails and subsequently use them for virtual screening. We demonstrate that wavelet thumbnails provide an effective method of capturing the three-dimensional information encoded in a molecular interaction field.
Published: 2012
Full Text: View/download PDF

19. Diversity selection algorithms

Author: Valerie J. Gillet
Subjects: Combinatorial Chemistry Techniques, Drug discovery, Late stage, Biology, Biochemistry, Computer Science Applications, Computational Mathematics, Drug development, Cheminformatics, Materials Chemistry, Physical and Theoretical Chemistry, Algorithm, Selection (genetic algorithm), Diversity (business)
Abstract: Molecular diversity has been an important topic in chemoinformatics for the last two decades following the introduction of high-throughput screening and combinatorial chemistry techniques. This article reviews the main algorithms that have been developed for assessing the diversity of a set of compounds and for selecting a diverse subset of compounds from a larger library. Particular focus is given to recent trends including the use of scaffolds as a way of assessing molecular diversity and the importance now given to optimizing multiple properties simultaneously in an attempt to reduce late stage attrition during the drug development stage of drug discovery. © 2011 John Wiley & Sons, Ltd. WIREs Comput Mol Sci 2011 1 580-589 DOI: 10.1002/wcms.33
Published: 2011
Full Text: View/download PDF

20. Lead Optimization Using Matched Molecular Pairs: Inclusion of Contextual Information for Enhanced Prediction of hERG Inhibition, Solubility, and Lipophilicity

Author: Peter Willett, Simon J. F. Macdonald, George Papadatos, Muhammad Alkarouri, Nicola J. Richmond, Gianpaolo Bravi, Anthony William James Cooper, Stephen D. Pickett, Visakan Kadirkamanathan, Christopher N. Luscombe, Jameed Hussain, Valerie J. Gillet, and John Martin Pritchard
Subjects: Databases, Factual, Stereochemistry, General Chemical Engineering, hERG, Context (language use), Computational biology, Library and Information Sciences, Ligands, Molecular property, Humans, Solubility, Lead (electronics), Virtual screening, Molecular Structure, biology, Chemistry, General Chemistry, Lipids, Ether-A-Go-Go Potassium Channels, Computer Science Applications, Drug Design, Lipophilicity, biology.protein, Matched molecular pair analysis, Algorithms
Abstract: Previous studies of the analysis of molecular matched pairs (MMPs) have often assumed that the effect of a substructural transformation on a molecular property is independent of the context (i.e., the local structural environment in which that transformation occurs). Experiments with large sets of hERG, solubility, and lipophilicity data demonstrate that the inclusion of contextual information can enhance the predictive power of MMP analyses, with significant trends (both positive and negative) being identified that are not apparent when using conventional, context-independent approaches.
Published: 2010
Full Text: View/download PDF

21. Multiobjective Optimization of Pharmacophore Hypotheses: Bias Toward Low-Energy Conformations

Author: Eleanor J. Gardiner, David A. Cosgrove, Valerie J. Gillet, and Robin Taylor
Subjects: Models, Molecular, Mathematical optimization, Databases, Factual, Computer science, General Chemical Engineering, Population, Molecular Conformation, Library and Information Sciences, Ligands, Multi-objective optimization, Set (abstract data type), Low energy, Drug Discovery, Genetic algorithm, Humans, Enzyme Inhibitors, education, Clique, education.field_of_study, Cyclin-Dependent Kinase 2, Thrombin, General Chemistry, Computer Science Applications, Tetrahydrofolate Dehydrogenase, Orders of magnitude (time), Mutation, Thermodynamics, Pharmacophore, Algorithm, Algorithms
Abstract: Two methods are described for biasing conformational search during pharmacophore elucidation using a multiobjective genetic algorithm (MOGA). The MOGA explores conformation on-the-fly while simultaneously aligning a set of molecules such that their pharmacophoric features are maximally overlaid. By using a clique detection method to generate overlays of precomputed conformations to initialize the population (rather than starting from random), the speed of the algorithm has been increased by 2 orders of magnitude. This increase in speed has enabled the program to be applied to greater numbers of molecules than was previously possible. Furthermore, it was found that biasing the conformations explored during search time to those found in the Cambridge Structural Database could also improve the quality of the results.
Published: 2009
Full Text: View/download PDF

22. Three-Dimensional Pharmacophore Methods in Drug Discovery

Author: Valerie J. Gillet, Andrew R. Leach, Richard A. Lewis, and Robin Taylor
Subjects: Models, Molecular, Molecular model, Hydrogen bond, Stereochemistry, Drug discovery, Chemistry, Combinatorial chemistry, Thiophene derivatives, Structure-Activity Relationship, Drug Delivery Systems, Imaging, Three-Dimensional, Drug Discovery, Lipophilicity, Molecular Medicine, Pharmacophore, Algorithms, Three dimensional model
Published: 2009
Full Text: View/download PDF

23. Turbo similarity searching: Effect of fingerprint and dataset on virtual-screening performance

Author: Yogendra Patel, Eleanor J. Gardiner, Valerie J. Gillet, John D. Holliday, Maciej Haranczyk, Jérôme Hert, Peter Willett, and Nurul Hashimah Ahamed Hassain Malim
Subjects: Virtual screening, biology, Computer science, business.industry, Turbo, Fingerprint (computing), Pattern recognition, biology.organism_classification, Computer Science Applications, Similarity (network science), Artificial intelligence, business, Analysis, Information Systems
Published: 2009
Full Text: View/download PDF

24. Use of Reduced Graphs To Encode Bioisosterism for Similarity-Based Virtual Screening

Author: Claude Luttmann, Peter Willett, Valerie J. Gillet, Pierre Ducrot, and Kristian Birchall
Subjects: Virtual screening, Theoretical computer science, Databases, Factual, General Chemical Engineering, Drug Evaluation, Preclinical, General Chemistry, Reference Standards, Library and Information Sciences, ENCODE, Computer Science Applications, Structure-Activity Relationship, User-Computer Interface, Similarity (network science), Fragment (logic), Simple (abstract algebra), Encoding (memory), Computer Graphics, Mathematics
Abstract: This paper describes a project to include explicit information about bioisosteric equivalences between pairs of fragment substructures in a system for similarity-based virtual screening. Data from the BIOSTER database show that reduced graphs provide a simple way of encoding known bioisosteric equivalences in a manner that can be used during similarity searching. Scaffold-hopping experiments with the WOMBAT database show that including such information enables similarities to be identified between the reference structures and active structures from the database that contain different, but equivalent, fragment substructures. However, such equivalences also contribute to the similarities between the reference structures and inactives, and the latter equivalences can swamp those involving the actives. This presents serious problems for the routine use of information about bioisosteric fragments in similarity-based virtual screening.
Published: 2009
Full Text: View/download PDF

25. Representing Clusters Using a Maximum Common Edge Substructure Algorithm Applied to Reduced Graphs and Molecular Graphs

Author: Eleanor J. Gardiner, Peter Willett, Valerie J. Gillet, and A A Cosgrove David Cosgrove David
Subjects: Virtual screening, Molecular Structure, Computer science, General Chemical Engineering, Computational Biology, General Medicine, General Chemistry, Edge (geometry), Library and Information Sciences, Computer Science Applications, ComputingMethodologies_PATTERNRECOGNITION, Iterated function, Molecular descriptor, Cluster (physics), Cluster Analysis, DECIPHER, Substructure, Cluster analysis, Algorithm, Algorithms, Chemical database
Abstract: Chemical databases are routinely clustered, with the aim of grouping molecules which share similar structural features. Ideally, medicinal chemists are then able to browse a few representatives of the cluster in order to interpret the shared activity of the cluster members. However, when molecules are clustered using fingerprints, it may be difficult to decipher the structural commonalities which are present. Here, we seek to represent a cluster by means of a maximum common substructure based on the shared functionality of the cluster members. Previously, we have used reduced graphs, where each node corresponds to a generalized functional group, as topological molecular descriptors for virtual screening. In this work, we precluster a database using any clustering method. We then represent the molecules in a cluster as reduced graphs. By repeated application of a maximum common edge substructure (MCES) algorithm, we obtain one or more reduced graph cluster representatives. The sparsity of the reduced graphs means that the MCES calculations can be performed in real time. The reduced graph cluster representatives are readily interpretable in terms of functional activity and can be mapped directly back to the molecules to which they correspond, giving the chemist a rapid means of assessing potential activities contained within the cluster. Clusters of interest are then subject to a detailed R-group analysis using the same iterated MCES algorithm applied to the molecular graphs.
Published: 2007
Full Text: View/download PDF

26. Incorporating partial matches within multiobjective pharmacophore identification

Author: Robin Taylor, Valerie J. Gillet, and Simon J. Cottrell
Subjects: Models, Molecular, Computer science, In Vitro Techniques, Machine learning, computer.software_genre, Carbonic Anhydrase II, Set (abstract data type), Drug Discovery, Genetic algorithm, Feature (machine learning), Humans, Point (geometry), Physical and Theoretical Chemistry, Carbonic Anhydrase Inhibitors, Databases, Protein, Binding Sites, business.industry, Cyclin-Dependent Kinase 2, computer.file_format, Protein Data Bank, Computer Science Applications, Identification (information), ComputingMethodologies_PATTERNRECOGNITION, Test case, Drug Design, Computer-Aided Design, Artificial intelligence, Pharmacophore, business, Algorithm, computer, Algorithms
Abstract: This paper describes the extension of our earlier multi-objective method for generating plausible pharmacophore hypotheses to incorporate partial matches. Diverse sets of molecules rarely adopt exactly the same binding mode, and so allowing the identification of partial matches allows our program to be applied to larger and more diverse datasets. The method explores the conformational space of a series of ligands simultaneously with their alignment using a multi-objective genetic algorithm (MOGA). The principles of Pareto ranking are used to evolve a diverse set of pharmacophore hypotheses that are optimised on conformational energy of the ligands, the goodness of the overlay and the volume of the overlay. A partial match is defined as a pharmacophoric feature that is present in at least two, but not all, of the ligands in the set. The number of ligands that map to a given pharmacophore point is taken into account when evaluating an overlay. The method is applied to a number of test cases extracted from the Protein Data Bank (PDB) where the true overlay is known.
Published: 2007
Full Text: View/download PDF

27. Data mining of search engine logs

Author: Martin Whittle, Barry Eaglestone, Nigel Ford, Valerie J. Gillet, and Andrew Madden
Subjects: Human-Computer Interaction, Artificial Intelligence, Computer Networks and Communications, Software, Information Systems
Published: 2007
Full Text: View/download PDF

28. Perspectives on Knowledge Discovery Algorithms Recently Introduced in Chemoinformatics: Rough Set Theory, Association Rule Mining, Emerging Patterns, and Formal Concept Analysis

Author: Valerie J. Gillet and Eleanor J. Gardiner
Subjects: Informatics, Association rule learning, Computer science, General Chemical Engineering, Library and Information Sciences, computer.software_genre, Machine learning, Structure-Activity Relationship, Knowledge extraction, Feature (machine learning), Formal concept analysis, Pharmacological Phenomena, Structure (mathematical logic), Molecular Structure, business.industry, General Chemistry, Computer Science Applications, Cheminformatics, Pattern recognition (psychology), Data mining, Artificial intelligence, Rough set, business, computer, Algorithm, Algorithms
Abstract: Knowledge Discovery in Databases (KDD) refers to the use of methodologies from machine learning, pattern recognition, statistics, and other fields to extract knowledge from large collections of data, where the knowledge is not explicitly available as part of the database structure. In this paper, we describe four modern data mining techniques, Rough Set Theory (RST), Association Rule Mining (ARM), Emerging Pattern Mining (EP), and Formal Concept Analysis (FCA), and we have attempted to give an exhaustive list of their chemoinformatics applications. One of the main strengths of these methods is their descriptive ability. When used to derive rules, for example, in structure-activity relationships, the rules have clear physical meaning. This review has shown that there are close relationships between the methods. Often apparent differences lie in the way in which the problem under investigation has been formulated which can lead to the natural adoption of one or other method. For example, the idea of a structural alert, as a structure which is present in toxic and absent in nontoxic compounds, leads to the natural formulation of an Emerging Pattern search. Despite the similarities between the methods, each has its strengths. RST is useful for dealing with uncertain and noisy data. Its main chemoinformatics applications so far have been in feature extraction and feature reduction, the latter often as input to another data mining method, such as an Support Vector Machine (SVM). ARM has mostly been used for frequent subgraph mining. EP and FCA have both been used to mine both structural and nonstructural patterns for classification of both active and inactive molecules. Since their introduction in the 1980s and 1990s, RST, ARM, EP, and FCA have found wide-ranging applications, with many thousands of citations in Web of Science, but their adoption by the chemoinformatics community has been relatively slow. Advances, both in computer power and in algorithm development, mean that there is the potential to apply these techniques to larger data sets and thus to different problems in the future.
Published: 2015

29. Training Similarity Measures for Specific Activities: Application to Reduced Graphs

Author: Gavin Harper, Kristian Birchall, Valerie J. Gillet, and Stephen D. Pickett
Subjects: Discrete mathematics, Theoretical computer science, General Chemical Engineering, General Chemistry, Models, Theoretical, Library and Information Sciences, Graph, Computer Science Applications, Structure-Activity Relationship, Pharmaceutical Preparations, Artificial Intelligence, Computer Graphics, Edit distance, Degree of similarity, Enzyme Inhibitors, Algorithms, Mathematics
Abstract: Reduced graph representations of chemical structures have been shown to be effective in similarity searching applications where they offer comparable performance to other 2D descriptors in terms of recall experiments. They have also been shown to complement existing descriptors and to offer potential to scaffold hop from one chemical series to another. Various methods have been developed for quantifying the similarity between reduced graphs including fingerprint approaches, graph matching, and an edit distance method. The edit distance approach quantifies the degree of similarity of two reduced graphs based on the number and type of operations required to convert one graph to the other. An attractive feature of the edit distance method is the ability to assign different weights to different operations. For example, the mutation of an aromatic ring node to an acyclic node may be assigned a higher weight than the mutation of an aromatic ring to an aliphatic ring node. In this paper, we describe a genetic algorithm (GA) for training the weights of the different edit distance operations. The method is applied to specific activity classes extracted from the MDDR database to derive activity-class specific weights. The GA-derived weights give substantially improved results in recall experiments as compared to using weights assigned on intuition. Furthermore, such activity specific weights may provide useful structure--activity information for subsequent design efforts. In a virtual screening setting when few active compounds are known, it may be more useful to have weights that perform well across a variety of different activity classes. Thus, the GA is also trained on multiple activity classes simultaneously to derive a generalized set of weights. These more generally applicable weights also represent a substantial improvement on previous work.
Published: 2006
Full Text: View/download PDF

30. Generation of multiple pharmacophore hypotheses using multiobjective optimisation techniques

Author: Robin Taylor, Valerie J. Gillet, David J. Wilton, and Simon J. Cottrell
Subjects: Models, Molecular, Dopamine D2 Receptor Antagonists, Computer science, Molecular Conformation, Evolutionary algorithm, Serotonin 5-HT1 Receptor Antagonists, Machine learning, computer.software_genre, Molecular conformation, Set (abstract data type), Structure-Activity Relationship, Drug Discovery, Humans, Enzyme Inhibitors, Physical and Theoretical Chemistry, Hydro-Lyases, Flexibility (engineering), Binding Sites, Molecular Structure, business.industry, Ensemble learning, Computer Science Applications, Models, Chemical, Drug Design, Computer-Aided Design, Dopamine Antagonists, Thermodynamics, Serotonin Antagonists, Artificial intelligence, Pharmacophore, business, computer, Algorithms
Abstract: Pharmacophore methods provide a way of establishing a structure activity relationship for a series of known active ligands. Often, there are several plausible hypotheses that could explain the same set of ligands and, in such cases, it is important that the chemist is presented with alternatives that can be tested with different synthetic compounds. Existing pharmacophore methods involve either generating an ensemble of conformers and considering each conformer of each ligand in turn or exploring conformational space on-the-fly. The ensemble methods tend to produce a large number of hypotheses and require considerable effort to analyse the results, whereas methods that vary conformation on-the-fly typically generate a single solution that represents one possible hypothesis, even though several might exist. We describe a new method for generating multiple pharmacophore hypotheses with full conformational flexibility being explored on-the-fly. The method is based on multiobjective evolutionary algorithm techniques and is designed to search for an ensemble of diverse yet plausible overlays which can then be presented to the chemist for further investigation.
Published: 2004
Full Text: View/download PDF

31. Chemoinformatics Research at the University of Sheffield: A History and Citation Analysis

Author: Neal Bishop, Valerie J. Gillet, John D. Holliday, and Peter Willett
Subjects: Reino unido, Bibliometric analysis, Research areas, 05 social sciences, Library science, 02 engineering and technology, Library and Information Sciences, Citation analysis, Cheminformatics, Similarity (psychology), 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, 0509 other social sciences, 050904 information & library sciences, Royaume uni, Information Systems
Abstract: This paper reviews the work of the Chemoinformatics Research Group in the Department of Information Studies at the University of Sheffield, focusing particularly on the work carried out in the period 1985-2002. Four major research areas are discussed, these involving the development of methods for: substructure searching in databases of three-dimensional structures, including both rigid and flexible molecules; the representation and searching of the Markush structures that occur in chemical patents; similarity searching in databases of both two-dimensional and three-dimensional structures; and compound selection and the design of combinatorial libraries. An analysis of citations to 321 publications from the Group shows that it attracted a total of 3725 residual citations during the period 1980-2002. These citations appeared in 411 different journals, and involved 910 different citing organizations from 54 different countries, thus demonstrating the widespread impact of the Group's work.
Published: 2003
Full Text: View/download PDF

32. [Untitled]

Author: Valerie J. Gillet
Subjects: Combinatorial Chemistry Techniques, Computer science, Product (mathematics), Drug Discovery, Biochemical engineering, Physical and Theoretical Chemistry, Combinatorial chemistry, Computer Science Applications
Published: 2002
Full Text: View/download PDF

33. [Untitled]

Author: Gianpaolo Bravi, Yogendra Patel, Valerie J. Gillet, and Andrew R. Leach
Subjects: Stereochemistry, Structural alignment, computer.file_format, Biology, Protein Data Bank, LigandScout, Computer Science Applications, Thermolysin, Proteins metabolism, Drug Discovery, Physical and Theoretical Chemistry, Pharmacophore, computer, CDC2-CDC28 Kinases
Abstract: Three commercially available pharmacophore generation programs, Catalyst/HipHop, DISCO and GASP, were compared on their ability to generate known pharmacophores deduced from protein-ligand complexes extracted from the Protein Data Bank. Five different protein families were included Thrombin, Cyclin Dependent Kinase 2, Dihydrofolate Reductase, HIV Reverse Transcriptase and Thermolysin. Target pharmacophores were defined through visual analysis of the data sets. The pharmacophore models produced were evaluated qualitatively through visual inspection and according to their ability to generate the target pharmacophores. Our results show that GASP and Catalyst outperformed DISCO at reproducing the five target pharmacophores.
Published: 2002
Full Text: View/download PDF

34. Emerging pattern mining to aid toxicological knowledge discovery

Author: Richard, Sherhod, Philip N, Judson, Thierry, Hanser, Jonathan D, Vessey, Samuel J, Webb, and Valerie J, Gillet
Subjects: Endpoint Determination, Mutagenicity Tests, Potassium Channel Blockers, Data Mining, Toxicology, Algorithms, Ether-A-Go-Go Potassium Channels
Abstract: Knowledge-based systems for toxicity prediction are typically based on rules, known as structural alerts, that describe relationships between structural features and different toxic effects. The identification of structural features associated with toxicological activity can be a time-consuming process and often requires significant input from domain experts. Here, we describe an emerging pattern mining method for the automated identification of activating structural features in toxicity data sets that is designed to help expedite the process of alert development. We apply the contrast pattern tree mining algorithm to generate a set of emerging patterns of structural fragment descriptors. Using the emerging patterns it is possible to form hierarchical clusters of compounds that are defined by the presence of common structural features and represent distinct chemical classes. The method has been tested on a large public in vitro mutagenicity data set and a public hERG channel inhibition data set and is shown to be effective at identifying common toxic features and recognizable classes of toxicants. We also describe how knowledge developers can use emerging patterns to improve the specificity and sensitivity of an existing expert system.
Published: 2014

35. Superstar: improved knowledge-based interaction fields for protein binding sites11Edited by R. Huber

Author: Paul Watson, Valerie J. Gillet, Peter Willett, Marcel L. Verdonk, and Jason C. Cole
Subjects: Ligand, Chemistry, Atom probe, Crystal structure, law.invention, Ion, Crystallography, Protein structure, Molecular recognition, Structural Biology, law, Computational chemistry, Atom, Molecular Biology, Coordination geometry
Abstract: SuperStar is an empirical method for identifying interaction sites in proteins, based entirely on experimental information about non-bonded interactions occurring in small-molecule crystal structures, taken from the IsoStar database. We describe recent modifications and additions to SuperStar, validating the results on a test set of 122 X-ray structures of protein-ligand complexes. In this validation, propensity maps are generated for all the binding sites of these proteins, using four different probes: a charged NH(+)(3) nitrogen atom, a carbonyl oxygen atom, a hydroxyl oxygen atom and a methyl carbon atom. Next, the maps are compared with the experimentally observed positions of ligand atoms of these types. A peak-searching algorithm is introduced that highlights potential interaction hot spots. For the three hydrogen-bonding probes - NH(+)(3) nitrogen atom, carbonyl oxygen atom and hydroxyl oxygen atom - the average distance from the ligand atom to the nearest SuperStar peak is 1.0-1.2 A (0.8-1.0 A for solvent-inaccessible ligand atoms). For the methyl carbon atom probe, this distance is about 1.5 A, probably because interactions to methyl groups are much less directional. The most important addition to SuperStar is the enabling of propensity maps around metal centres - Ca(2+), Mg(2+) and Zn(2+) - in protein binding sites. The results are validated on a test set of 24 protein-ligand complexes that have a metal ion in their binding site. Coordination geometries are derived automatically, using only the protein atoms that coordinate to the metal ion. The correct coordination geometry is derived in approximately 75 % of the cases. If the derived geometry is assumed during the SuperStar calculation, the average distance from a ligand atom coordinating to the metal ion to the nearest peak in the propensity map for an oxygen probe is 0.87(7) A. If the correct coordination geometry is imposed, this distance reduces to 0.59(7)A. This indicates that the SuperStar predictions around metal-binding sites are at least as good as those around other protein groups. Using clustering techniques, a non-redundant set of probes is selected from the set of probes available in the IsoStar database. The performance in SuperStar of all these probes is tested on the test set of protein-ligand complexes. With the exception of the "ether oxygen" probe and the "any NH(+)" probe, all new probes perform as well as the four probes introduced first.
Published: 2001
Full Text: View/download PDF

36. [Untitled]

Author: Valerie J. Gillet, Marcel L. Verdonk, Peter Willett, and Paul Watson
Subjects: business.industry, Group (mathematics), Crystallographic data, Pattern recognition, computer.software_genre, Computer Science Applications, chemistry.chemical_compound, Similarity (network science), chemistry, Drug Discovery, Functional group, Data mining, Artificial intelligence, Physical and Theoretical Chemistry, business, computer
Abstract: A knowledge-based method for calculating the similarity of functional groups is described and validated. The method is based on experimental information derived from small molecule crystal structures. These data are used in the form of scatterplots that show the likelihood of a non-bonded interaction being formed between functional group A (the `central group') and functional group B (the `contact group' or `probe'). The scatterplots are converted into three-dimensional maps that show the propensity of the probe at different positions around the central group. Here we describe how to calculate the similarity of a pair of central groups based on these maps. The similarity method is validated using bioisosteric functional group pairs identified in the Bioster database and Relibase. The Bioster database is a critical compilation of thousands of bioisosteric molecule pairs, including drugs, enzyme inhibitors and agrochemicals. Relibase is an object-oriented database containing structural data about protein-ligand interactions. The distributions of the similarities of the bioisosteric functional group pairs are compared with similarities for all the possible pairs in IsoStar, and are found to be significantly different. Enrichment factors are also calculated showing the similarity method is statistically significantly better than random in predicting bioisosteric functional group pairs.
Published: 2001
Full Text: View/download PDF

37. De Novo Molecular Design

Author: Valerie J. Gillet
Subjects: Root mean square, Applied mathematics, Evolution strategy, Mathematics
Published: 2000
Full Text: View/download PDF

38. [Untitled]

Author: Valerie J. Gillet
Subjects: Inorganic Chemistry, Computer science, Product (mathematics), Organic Chemistry, Drug Discovery, General Medicine, Biochemical engineering, Physical and Theoretical Chemistry, Molecular Biology, Combinatorial chemistry, Catalysis, Information Systems
Published: 2000
Full Text: View/download PDF

39. Similarity Searching in Files of Three-Dimensional Chemical Structures: Analysis of the BIOSTER Database Using Two-Dimensional Fingerprints and Molecular Field Descriptors

Author: Ansgar Schuffenhauer, Valerie J. Gillet, and Peter Willett
Subjects: Computational Theory and Mathematics, Similarity (network science), Database, General Chemistry, Data mining, Similarity measure, computer.software_genre, computer, Field (computer science), Computer Science Applications, Information Systems, Mathematics
Abstract: This paper compares the effectiveness of similarity measures based on two-dimensional fingerprints and on molecular fields for identifying pairs of bioisosteric molecules in the BIOSTER database. The results suggest that the two types of descriptor are complementary in nature, each finding some bioisosteric pairs that are not found by the other. This conclusion is confirmed by studies of groups of BIOSTER molecules that share the same activity characteristics, and by experiments that involve combining the two types of similarity measure.
Published: 1999
Full Text: View/download PDF

40. Selecting Combinatorial Libraries to Optimize Diversity and Physical Properties

Author: Valerie J. Gillet, Peter Willett, John Bradshaw, and Darren V. S. Green
Subjects: Mathematical optimization, Fitness function, General Chemistry, Space (commercial competition), computer.software_genre, Measure (mathematics), Computer Science Applications, k-nearest neighbors algorithm, Computational Theory and Mathematics, Genetic algorithm, Trigonometric functions, Product topology, Pairwise comparison, Data mining, computer, Information Systems, Mathematics
Abstract: The program SELECT is presented for the design of combinatorial libraries. SELECT is based on a genetic algorithm with a multi-objective fitness function. Any number of objectives can be included, provided that they can be readily calculated. Typically, the objectives would be to maximize structural diversity while ensuring that the compounds in the library have “drug-like” properties. In the examples given, structural diversity is measured using Daylight fingerprints as descriptors and either the normalized sum of pairwise dissimilarities, calculated with the cosine coefficient, or the average nearest neighbor distance, calculated with the Tanimoto coefficient, as the measure of diversity. The objectives are specified at run time. Combinatorial libraries are selected by analyzing product space, which gives significant advantages over methods that are based on analyzing reactant space. SELECT can also be used to choose an optimal configuration for a multicomponent library. The performance of SELECT is dem...
Published: 1998
Full Text: View/download PDF

41. Similarity and Dissimilarity Methods for Processing Chemical Structure Databases

Author: Peter Willett, John Bradshaw, Valerie J. Gillet, and David J. Wild
Subjects: General Computer Science, Fragment (logic), Similarity (network science), Diversity analysis, Computer science, Search algorithm, Genetic algorithm, Data mining, computer.software_genre, computer, Similitude, Selection (genetic algorithm), Chemical database
Abstract: This paper reviews measures of similarity and dissimilarity between pairs of chemical molecules and the use of such measures for processing chemical databases. The applications discussed include similarity searching, database clustering and diversity analysis, focusing upon measures that are based on fragment bit-string occurrence data. The paper then discusses recent work on the calculation of similarity by aligning molecular fields and on the selection of structurally diverse subsets of chemical databases.
Published: 1998
Full Text: View/download PDF

42. Identification of Biological Activity Profiles Using Substructural Analysis and Genetic Algorithms

Author: Valerie J. Gillet, John Bradshaw, and Peter Willett
Subjects: Pharmacology, Databases, Factual, Molecular Structure, Anti-HIV Agents, Chemistry, Stereochemistry, Hydrogen Bonding, Biological activity, General Chemistry, Computer Science Applications, Structure-Activity Relationship, Identification (information), Pharmaceutical Preparations, Computational Theory and Mathematics, Drug Design, Genetic algorithm, Biological system, Algorithms, Information Systems
Abstract: A substructural analysis approach is used to calculate biological activity profiles, which contain weights that describe the differential occurrences of generic features (specifically, the numbers of hydrogen-bond donors and acceptors, the numbers of rotatable bonds and aromatic rings, the molecular weights, and the 2 kappa alpha descriptors) in active molecules taken from the World Drug Index and in (presumed) inactive molecules taken from the SPRESI database. Even with such simple structural descriptors, the profiles discriminate effectively between active and inactive compounds. The effectiveness of the approach is further increased by using a genetic algorithm for the calculation of the weights comprising a profile. The methods have been successfully applied to a number of different data sets.
Published: 1998
Full Text: View/download PDF

43. USING CHEMOINFORMATICS TOOLS TO ANALYZE CHEMICAL ARRAYS IN LEAD OPTIMIZATION

Author: Iain Mcfarlane Mclay, Stephen D. Pickett, Peter Willett, George Papadatos, Christopher N. Luscombe, and Valerie J. Gillet
Subjects: Engineering, Lead (geology), business.industry, Cheminformatics, Biochemical engineering, business, Multi-objective optimization, Combinatorial chemistry
Published: 2013
Full Text: View/download PDF

44. MultiobjectiveDe NovoDesign of Synthetically Accessible Compounds

Author: Michael J. Bodkin, Valerie J. Gillet, and Dimitar Hristozov
Subjects: Engineering, business.industry, Evolutionary algorithm, Structure generation, Artificial intelligence, business, Multi-objective optimization
Published: 2013
Full Text: View/download PDF

45. Toxicological knowledge discovery by mining emerging patterns from toxicity data

Author: Thierry Hanser, Jonathan D. Vessey, Valerie J. Gillet, Richard Sherhod, and Philip N. Judson
Subjects: Toxicity data, Association rule learning, Process (engineering), Computer science, business.industry, Library and Information Sciences, computer.software_genre, Computer Graphics and Computer-Aided Design, Expert system, Computer Science Applications, Domain (software engineering), Knowledge extraction, Knowledge base, Oral Presentation, Binary descriptor, Data mining, Physical and Theoretical Chemistry, business, computer
Abstract: Predicting the risk of toxic and environmental effects of chemical compounds is of great importance to all chemical industries [1]. Expert systems have shown success in predicting toxic risk by applying established knowledge of toxicology encoded as a knowledge base of structural alerts and a reasoning model. A disadvantage of expert systems is that developing new structural alerts requires considerable time and effort from domain experts. In order to expedite this process a software tool has been developed that can automatically mine representations of activating features directly from toxicity datasets and present them in an interpretable form. Our knowledge discovery tool applies emerging pattern (EP) mining [2]: a form of association rule mining [3] that is well known to computer science, but is relatively new to chemistry [4]. The EP mining algorithm accepts any data expressed as a series of binary properties, which is divided into two classes, and extracts patterns of those properties that are frequent within the data and are more frequent in one data class compared to the other. By mining emerging patterns from toxicity datasets, encoded as fingerprints of binary descriptors, the tool generates patterns of features that distinguish toxicants from innocuous compounds. These patterns represent potentially activating features of the toxic compounds that may then be used to define new alerts. The knowledge discovery tool has been tested using a public dataset of 3489 mutagens and 2981 non-mutagens, encoded as fingerprints of approximately 2000 functional groups and ring descriptors. EPs were produced and grouped into a number of hierarchical families. Six of the EPs that represented distinct chemical classes were selected for manual inspection by a toxicology expert. Relevant literature was analysed to find a mechanistic rationale for the mined features, which resulted in four new structural alerts for in vitro mutagenicity.
Published: 2013
Full Text: View/download PDF

46. SPROUT, HIPPO and CAESA: Tools for de novo structure generation and estimation of synthetic accessibility

Author: Glenn J. Myatt, Valerie J. Gillet, Zsolt Zsoldos, and A. Peter Johnson
Subjects: Pharmacology, Target site, Chemistry, Covalent bond, Hydrogen bond, Organic Chemistry, Drug Discovery, Molecule, Structure generation, Form solution, Receptor binding site, Combinatorial chemistry
Abstract: Several components of a system for structure generation are now well developed. HIPPO is a program that characterises a receptor binding site for potential target sites within the cavity that can be used in de novo design. The target sites include simple and complex hydrogen bonds, covalent bonds and bonds to metal ions. The SPROUT program for structure generation consists of two main components: the first is skeleton generation, followed by atom substitution to convert the solution skeletons to molecules. A new method of skeleton generation is presented here, where part skeletons are grown outwards from each target site. The part skeletons are then connected together to form solution skeletons. Finally the CAESA program is described, that ranks the output from SPROUT according to ease of synthesis.
Published: 1995
Full Text: View/download PDF

47. SPROUT: 3D Structure Generation Using Templates

Author: A. Peter Johnson, Valerie J. Gillet, S. Sike, Paulina Mata, Anna L. Stebbings, Glenn J. Myatt, and Jorge Lampreia
Subjects: Template, Computational Theory and Mathematics, Computer science, Structure generation, Nanotechnology, General Chemistry, Computer Science Applications, Information Systems
Published: 1995
Full Text: View/download PDF

48. ChemInform Abstract: Validation of Reaction Vectors for de novo Design

Author: Valerie J. Gillet, Dimitar Hristozov, Michael J. Bodkin, Hina Patel, and Beining Chen
Subjects: Chemistry, Stereochemistry, General Medicine
Published: 2012
Full Text: View/download PDF

49. Mining for Context-Sensitive Bioisosteric Replacements in Large Chemical Databases

Author: Valerie J. Gillet, Michael J. Bodkin, George Papadatos, and Peter Willett
Subjects: Engineering, business.industry, Context (language use), Computational biology, Metabolic stability, business, Combinatorial chemistry, Chemical database
Published: 2012
Full Text: View/download PDF

50. Pharmacophore Models in Drug Design

Author: Valerie J. Gillet
Subjects: Steric effects, Chemistry, Chemical nomenclature, Pharmacophore, Combinatorial chemistry, Abstract concept, Small molecule, LigandScout
Abstract: A pharmacophore is an abstract concept introduced by Kier in the late 1960s1 that is used to describe the steric arrangement of functional groups that enable a small molecule to bind to a receptor. The IUPAC definition is as follows: “A pharmacophore is the ensemble of steric and electronic features...
Published: 2012
Full Text: View/download PDF

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Journal

Database

Publisher

129 results on '"Valerie J. Gillet"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources