125 results on '"John B. O. Mitchell"'
Search Results
2. Robust identification of interactions between heat-stress responsive genes in the chicken brain using Bayesian networks and augmented expression data
- Author
-
E. A. Videla Rodriguez, John B. O. Mitchell, and V. Anne Smith
- Subjects
Bayesian network ,Stress ,Gene ,Chicken ,Medicine ,Science - Abstract
Abstract Bayesian networks represent a useful tool to explore interactions within biological systems. The aims of this study were to identify a reduced number of genes associated with a stress condition in chickens (Gallus gallus) and to unravel their interactions by implementing a Bayesian network approach. Initially, one publicly available dataset (3 control vs. 3 heat-stressed chickens) was used to identify the stress signal, represented by 25 differentially expressed genes (DEGs). The dataset was augmented by looking for the 25 DEGs in other four publicly available databases. Bayesian network algorithms were used to discover the informative relationships between the DEGs. Only ten out of the 25 DEGs displayed interactions. Four of them were Heat Shock Proteins that could be playing a key role, especially under stress conditions, where maintaining the correct functioning of the cell machinery might be crucial. One of the DEGs is an open reading frame whose function is yet unknown, highlighting the power of Bayesian networks in knowledge discovery. Identifying an initial stress signal, augmenting it by combining other databases, and finally learning the structure of Bayesian networks allowed us to find genes closely related to stress, with the possibility of further exploring the system in future studies.
- Published
- 2024
- Full Text
- View/download PDF
3. Allosteric activation unveils protein-mass modulation of ATP phosphoribosyltransferase product release
- Author
-
Benjamin J. Read, John B. O. Mitchell, and Rafael G. da Silva
- Subjects
Chemistry ,QD1-999 - Abstract
Abstract Heavy-isotope substitution into enzymes slows down bond vibrations and may alter transition-state barrier crossing probability if this is coupled to fast protein motions. ATP phosphoribosyltransferase from Acinetobacter baumannii is a multi-protein complex where the regulatory protein HisZ allosterically enhances catalysis by the catalytic protein HisGS. This is accompanied by a shift in rate-limiting step from chemistry to product release. Here we report that isotope-labelling of HisGS has no effect on the nonactivated reaction, which involves negative activation heat capacity, while HisZ-activated HisGS catalytic rate decreases in a strictly mass-dependent fashion across five different HisGS masses, at low temperatures. Surprisingly, the effect is not linked to the chemical step, but to fast motions governing product release in the activated enzyme. Disruption of a specific enzyme-product interaction abolishes the isotope effects. Results highlight how altered protein mass perturbs allosterically modulated thermal motions relevant to the catalytic cycle beyond the chemical step.
- Published
- 2024
- Full Text
- View/download PDF
4. Practical application of a Bayesian network approach to poultry epigenetics and stress
- Author
-
Emiliano A. Videla Rodriguez, Fábio Pértille, Carlos Guerrero-Bosagna, John B. O. Mitchell, Per Jensen, and V. Anne Smith
- Subjects
Bayesian networks ,Differential methylation ,Epigenetics ,Poultry ,Stress ,Computer applications to medicine. Medical informatics ,R858-859.7 ,Biology (General) ,QH301-705.5 - Abstract
Abstract Background Relationships among genetic or epigenetic features can be explored by learning probabilistic networks and unravelling the dependencies among a set of given genetic/epigenetic features. Bayesian networks (BNs) consist of nodes that represent the variables and arcs that represent the probabilistic relationships between the variables. However, practical guidance on how to make choices among the wide array of possibilities in Bayesian network analysis is limited. Our study aimed to apply a BN approach, while clearly laying out our analysis choices as an example for future researchers, in order to provide further insights into the relationships among epigenetic features and a stressful condition in chickens (Gallus gallus). Results Chickens raised under control conditions (n = 22) and chickens exposed to a social isolation protocol (n = 24) were used to identify differentially methylated regions (DMRs). A total of 60 DMRs were selected by a threshold, after bioinformatic pre-processing and analysis. The treatment was included as a binary variable (control = 0; stress = 1). Thereafter, a BN approach was applied: initially, a pre-filtering test was used for identifying pairs of features that must not be included in the process of learning the structure of the network; then, the average probability values for each arc of being part of the network were calculated; and finally, the arcs that were part of the consensus network were selected. The structure of the BN consisted of 47 out of 61 features (60 DMRs and the stressful condition), displaying 43 functional relationships. The stress condition was connected to two DMRs, one of them playing a role in tight and adhesive intracellular junctions in organs such as ovary, intestine, and brain. Conclusions We clearly explain our steps in making each analysis choice, from discrete BN models to final generation of a consensus network from multiple model averaging searches. The epigenetic BN unravelled functional relationships among the DMRs, as well as epigenetic features in close association with the stressful condition the chickens were exposed to. The DMRs interacting with the stress condition could be further explored in future studies as possible biomarkers of stress in poultry species.
- Published
- 2022
- Full Text
- View/download PDF
5. A Bayesian network structure learning approach to identify genes associated with stress in spleens of chickens
- Author
-
E. A. Videla Rodriguez, John B. O. Mitchell, and V. Anne Smith
- Subjects
Medicine ,Science - Abstract
Abstract Differences in the expression patterns of genes have been used to measure the effects of non-stress or stress conditions in poultry species. However, the list of genes identified can be extensive and they might be related to several biological systems. Therefore, the aim of this study was to identify a small set of genes closely associated with stress in a poultry animal model, the chicken (Gallus gallus), by reusing and combining data previously published together with bioinformatic analysis and Bayesian networks in a multi-step approach. Two datasets were collected from publicly available repositories and pre-processed. Bioinformatics analyses were performed to identify genes common to both datasets that showed differential expression patterns between non-stress and stress conditions. Bayesian networks were learnt using a Simulated Annealing algorithm implemented in the software Banjo. The structure of the Bayesian network consisted of 16 out of 19 genes together with the stress condition. Network structure showed CARD19 directly connected to the stress condition plus highlighted CYGB, BRAT1, and EPN3 as relevant, suggesting these genes could play a role in stress. The biological functionality of these genes is related to damage, apoptosis, and oxygen provision, and they could potentially be further explored as biomarkers of stress.
- Published
- 2022
- Full Text
- View/download PDF
6. Can human experts predict solubility better than computers?
- Author
-
Samuel Boobier, Anne Osbourn, and John B. O. Mitchell
- Subjects
Information technology ,T58.5-58.64 ,Chemistry ,QD1-999 - Abstract
Abstract In this study, we design and carry out a survey, asking human experts to predict the aqueous solubility of druglike organic compounds. We investigate whether these experts, drawn largely from the pharmaceutical industry and academia, can match or exceed the predictive power of algorithms. Alongside this, we implement 10 typical machine learning algorithms on the same dataset. The best algorithm, a variety of neural network known as a multi-layer perceptron, gave an RMSE of 0.985 log S units and an R2 of 0.706. We would not have predicted the relative success of this particular algorithm in advance. We found that the best individual human predictor generated an almost identical prediction quality with an RMSE of 0.942 log S units and an R2 of 0.723. The collection of algorithms contained a higher proportion of reasonably good predictors, nine out of ten compared with around half of the humans. We found that, for either humans or algorithms, combining individual predictions into a consensus predictor by taking their median generated excellent predictivity. While our consensus human predictor achieved very slightly better headline figures on various statistical measures, the difference between it and the consensus machine learning predictor was both small and statistically insignificant. We conclude that human experts can predict the aqueous solubility of druglike molecules essentially equally well as machine learning algorithms. We find that, for either humans or algorithms, combining individual predictions into a consensus predictor by taking their median is a powerful way of benefitting from the wisdom of crowds.
- Published
- 2017
- Full Text
- View/download PDF
7. Why do Sequence Signatures Predict Enzyme Mechanism? Homology versus Chemistry
- Author
-
Kirsten E. Beattie, Luna De Ferrari, and John B. O. Mitchell
- Subjects
Evolution ,QH359-425 - Published
- 2015
8. Classifying the World Anti-Doping Agency's 2005 Prohibited List Using the Chemistry Development Kit Fingerprint.
- Author
-
Edward O. Cannon and John B. O. Mitchell
- Published
- 2006
- Full Text
- View/download PDF
9. N-strain epidemic model using bond percolation
- Author
-
Peter Mann, V. Anne Smith, John B. O. Mitchell, Simon Dobson, EPSRC, University of St Andrews. School of Computer Science, University of St Andrews. St Andrews Bioinformatics Unit, University of St Andrews. Office of the Principal, University of St Andrews. St Andrews Centre for Exoplanet Science, University of St Andrews. Centre for Biological Diversity, University of St Andrews. Centre for Higher Education Research, University of St Andrews. Scottish Oceans Institute, University of St Andrews. Institute of Behavioural and Neural Sciences, University of St Andrews. School of Biology, University of St Andrews. EaSTCHEM, University of St Andrews. Biomedical Sciences Research Complex, University of St Andrews. School of Chemistry, and University of St Andrews. Sir James Mackenzie Institute for Early Diagnosis
- Subjects
QA75 ,SDG 3 - Good Health and Well-being ,RA0421 ,Epidemic spreading ,QA75 Electronic computers. Computer science ,RA0421 Public health. Hygiene. Preventive Medicine ,T-NDAS ,Complex networks ,Percolation ,Co-infection - Abstract
Funding: This work was partially supported by the UK Engineering and Physical Sciences Research Council under grant number EP/N007565/1 (Science of Sensor Systems Software). In this paper we examine the structure of random networks that have undergone bond percolation an arbitrary, but finite, number of times. We define two types of sequential branching processes: a competitive branching process - in which each iteration performs bond percolation on the residual graph (RG) resulting from previous generations; and, collaborative branching process - where percolation is performed on the giant connected component (GCC) instead. We investigate the behaviour of these models, including the expected size of the GCC for a given generation, the critical percolation probability and other topological properties of the resulting graph structures using the analytically exact method of generating functions. We explore this model for Erds-Renyi and scale free random graphs. This model can be interpreted as a seasonal n-strain model of disease spreading. Publisher PDF
- Published
- 2022
10. Computational Insights into the Catalytic Mechanism of Is-PETase: An Enzyme Capable of Degrading Poly(ethylene) Terephthalate
- Author
-
Eugene Shrimpton‐Phoenix, John B. O. Mitchell, Michael Bühl, University of St Andrews. School of Chemistry, University of St Andrews. EaSTCHEM, and University of St Andrews. Biomedical Sciences Research Complex
- Subjects
MCC ,Green chemistry ,Enzyme ,Organic Chemistry ,DAS ,QD ,General Chemistry ,QD Chemistry ,QM/MM ,Plastics ,Catalysis - Abstract
Funding: This work was supported through a studentship from BBSRC in the EastBio doctoral training programme for E. S.-P. Is-PETase has become an enzyme of significant interest due to its ability to catalyse the degradation of polyethylene terephthalate (PET) at mesophilic temperatures. We performed hybrid quantum mechanics and molecular mechanics (QM/MM) at the DSD-PBEP86-D3/ma-def2-TZVP/CHARMM27//rev-PBE-D3/dev2-SVP/CHARMM level to calculate the energy profile for the degradation of a suitable PET model by this enzyme. Very low overall barriers are computed for serine protease-type hydrolysis steps (as low as 34.1 kJ mol-1). Spontaneous deprotonation of the final product, terephthalic acid, with a high computed driving force indicates that product release could be rate limiting. Publisher PDF
- Published
- 2022
11. Toward Physics-Based Solubility Computation for Pharmaceuticals to Rival Informatics
- Author
-
Rui Guo, John B. O. Mitchell, Sarah L. Price, Daniel J. Fowles, David S. Palmer, University of St Andrews. EaSTCHEM, University of St Andrews. School of Chemistry, and University of St Andrews. Biomedical Sciences Research Complex
- Subjects
Work (thermodynamics) ,Phonon ,Computation ,NDAS ,Molecular Dynamics Simulation ,01 natural sciences ,Article ,Machine Learning ,Molecular dynamics ,0103 physical sciences ,QD ,Statistical physics ,Physical and Theoretical Chemistry ,Density Functional Theory ,Lattice energy ,010304 chemical physics ,Water ,QD Chemistry ,Computer Science Applications ,Pharmaceutical Preparations ,Solubility ,Cheminformatics ,Thermodynamics ,Density functional theory ,Sublimation (phase transition) - Abstract
D.S.P. and D.J.F. thank the EPSRC for funding via Prosperity Partnership EP/S035990/1. D.S.P. and D.J.F. thank the ARCHIE-WeSt High-Performance Computing Centre (www.archie-west.ac.uk) for computational resources. The UCL authors thank Prof. Keith Refson for guidance with the phonon calculations, which used the ARCHER U.K. National Supercomputing Service (http://www.archer.ac.uk) as part of the U.K. HEC Materials Chemistry Consortium, which is funded by the EPSRC (EP/L000202, EP/R029431). R.G. was funded by MagnaPharm, a project funded by the European Union’s Horizon 2020 Research and Innovation programme under grant agreement number 736899. We demonstrate that physics-based calculations of intrinsic aqueous solubility can rival cheminformatics-based machine learning predictions. A proof-of-concept was developed for a physics-based approach via a sublimation thermodynamic cycle, building upon previous work that relied upon several thermodynamic approximations, notably the 2RT approximation, and limited conformational sampling. Here, we apply improvements to our sublimation free-energy model with the use of crystal phonon mode calculations to capture the contributions of the vibrational modes of the crystal. Including these improvements with lattice energies computed using the model-potential-based Ψmol method leads to accurate estimates of sublimation free energy. Combining these with hydration free energies obtained from either molecular dynamics free-energy perturbation simulations or density functional theory calculations, solubilities comparable to both experiment and informatics predictions are obtained. The application to coronene, succinic acid, and the pharmaceutical desloratadine shows how the methods must be adapted for the adoption of different conformations in different phases. The approach has the flexibility to extend to applications that cannot be covered by informatics methods. Publisher PDF
- Published
- 2021
12. Allosteric inhibition of Acinetobacter baumannii ATP phosphoribosyltransferase by protein:dipeptide and protein:protein Interactions
- Author
-
Benjamin J. Read, Gemma Fisher, Oliver L. R. Wissett, Teresa F. G. Machado, John Nicholson, John B. O. Mitchell, Rafael G. da Silva, EPSRC, University of St Andrews. School of Biology, University of St Andrews. School of Chemistry, University of St Andrews. Biomedical Sciences Research Complex, and University of St Andrews. EaSTCHEM
- Subjects
Acinetobacter baumannii ,Enzyme inhibition ,Infectious Diseases ,Protein interaction ,Kinetic mechanism ,QR180 ,NDAS ,QD ,QR180 Immunology ,QD Chemistry ,ATP phosphoribosyltransferase ,AC - Abstract
This work was supported by the Biotechnology and Biological Sciences Research Council (BBSRC) (Grant BB/M010996/1) via EASTBIO Doctoral Training Partnership studentships to B.J.R. and G.F., and by the Engineering and Physical Sciences Research Council (EPSRC) [grant number EP/L016419/1] via a CRITICAT Centre for Doctoral Training studentship to T.F.G.M. ATP phosphoribosyltransferase (ATPPRT) catalyzes the first step of histidine biosynthesis in bacteria, namely, the condensation of ATP and 5-phospho-α-d-ribosyl-1-pyrophosphate (PRPP) to generate N1-(5-phospho-β-d-ribosyl)-ATP (PRATP) and pyrophosphate. Catalytic (HisGS) and regulatory (HisZ) subunits assemble in a hetero-octamer where HisZ activates HisGS and mediates allosteric inhibition by histidine. In Acinetobacter baumannnii, HisGS is necessary for the bacterium to persist in the lung during pneumonia. Inhibition of ATPPRT is thus a promising strategy for specific antibiotic development. Here, A. baumannii ATPPRT is shown to follow a rapid equilibrium random kinetic mechanism, unlike any other ATPPRT. Histidine noncompetitively inhibits ATPPRT. Binding kinetics indicates histidine binds to free ATPPRT and to ATPPRT:PRPP and ATPPRT:ATP binary complexes with similar affinity following a two-step binding mechanism, but with distinct kinetic partition of the initial enzyme:inhibitor complex. The dipeptide histidine-proline inhibits ATPPRT competitively and likely uncompetitively, respectively, against PRPP and ATP. Rapid kinetics analysis shows His-Pro binds to the ATPPRT:ATP complex via a two-step binding mechanism. A related HisZ that shares 43% sequence identity with A. baumannii HisZ is a tight-binding allosteric inhibitor of A. baumannii HisGS. These findings lay the foundation for inhibitor design against A. baumannii ATPPRT. Postprint
- Published
- 2022
13. Allosteric Inhibition of
- Author
-
Benjamin J, Read, Gemma, Fisher, Oliver L R, Wissett, Teresa F G, Machado, John, Nicholson, John B O, Mitchell, and Rafael G, da Silva
- Subjects
Acinetobacter baumannii ,Kinetics ,Histidine ,Dipeptides ,ATP Phosphoribosyltransferase - Abstract
ATP phosphoribosyltransferase (ATPPRT) catalyzes the first step of histidine biosynthesis in bacteria, namely, the condensation of ATP and 5-phospho-α-d-ribosyl-1-pyrophosphate (PRPP) to generate
- Published
- 2021
14. Exact formula for bond percolation on cliques
- Author
-
John B. O. Mitchell, V. Anne Smith, Peter Mann, Christopher Jefferson, Simon Dobson, University of St Andrews. School of Chemistry, University of St Andrews. Office of the Principal, University of St Andrews. St Andrews Centre for Exoplanet Science, University of St Andrews. Centre for Biological Diversity, University of St Andrews. Centre for Higher Education Research, University of St Andrews. Scottish Oceans Institute, University of St Andrews. Institute of Behavioural and Neural Sciences, University of St Andrews. School of Biology, University of St Andrews. EaSTCHEM, University of St Andrews. Biomedical Sciences Research Complex, University of St Andrews. School of Computer Science, University of St Andrews. Sir James Mackenzie Institute for Early Diagnosis, University of St Andrews. Centre for Research into Equality, Diversity & Inclusion, University of St Andrews. St Andrews GAP Centre, University of St Andrews. Centre for Interdisciplinary Research in Computational Algebra, and University of St Andrews. St Andrews Bioinformatics Unit
- Subjects
Discrete mathematics ,QA75 ,Bond ,QA75 Electronic computers. Computer science ,T-NDAS ,Complex networks ,Complex network ,Clustering ,QC Physics ,SDG 3 - Good Health and Well-being ,RA0421 ,Percolation ,RA0421 Public health. Hygiene. Preventive Medicine ,Exact formula ,ComputingMilieux_COMPUTERSANDEDUCATION ,Bond percolation ,Chemistry (relationship) ,Cluster analysis ,GeneralLiterature_REFERENCE(e.g.,dictionaries,encyclopedias,glossaries) ,QC ,MathematicsofComputing_DISCRETEMATHEMATICS - Abstract
The authors would like to thank the School of Computer Science, the School of Chemistry, and the School of Biology of the University of St Andrews for funding this work. We present exact solutions for the size of the giant connected component of complex networks composed of cliques following bond percolation. We use our theoretical result to find the location of the percolation threshold of the model, providing analytical solutions where possible. We expect the results derived here to be useful to a wide variety of applications including graph theory, epidemiology, percolation, and lattice gas models, as well as fragmentation theory. We also examine the Erdős-Gallai theorem as a necessary condition on the graphicality of configuration model networks comprising clique subgraphs. Publisher PDF
- Published
- 2021
15. Two-pathogen model with competition on clustered networks
- Author
-
John B. O. Mitchell, Simon Dobson, Peter Mann, V. Anne Smith, University of St Andrews. School of Chemistry, University of St Andrews. School of Computer Science, University of St Andrews. Sir James Mackenzie Institute for Early Diagnosis, University of St Andrews. EaSTCHEM, University of St Andrews. Biomedical Sciences Research Complex, University of St Andrews. Office of the Principal, University of St Andrews. St Andrews Centre for Exoplanet Science, University of St Andrews. Centre for Biological Diversity, University of St Andrews. Centre for Higher Education Research, University of St Andrews. Scottish Oceans Institute, University of St Andrews. Institute of Behavioural and Neural Sciences, University of St Andrews. School of Biology, and University of St Andrews. St Andrews Bioinformatics Unit
- Subjects
QA75 ,Physics - Physics and Society ,Computer science ,QA75 Electronic computers. Computer science ,T-NDAS ,Population ,Complex networks ,FOS: Physical sciences ,Clustered networks ,Physics and Society (physics.soc-ph) ,Poisson distribution ,Topology ,01 natural sciences ,010305 fluids & plasmas ,symbols.namesake ,SDG 3 - Good Health and Well-being ,RA0421 ,RA0421 Public health. Hygiene. Preventive Medicine ,0103 physical sciences ,Quantitative Biology - Populations and Evolution ,010306 general physics ,education ,Cluster analysis ,Generating function (physics) ,education.field_of_study ,Percolation (cognitive psychology) ,Social network ,business.industry ,Populations and Evolution (q-bio.PE) ,Percolation ,Complex network ,Co-infection ,Transmission (telecommunications) ,Coupling (computer programming) ,FOS: Biological sciences ,Epidemic spreading ,symbols ,business - Abstract
Networks provide a mathematically rich framework to represent social contacts sufficient for the transmission of disease. Social networks are often highly clustered and fail to be locally tree-like. In this paper, we study the effects of clustering on the spread of sequential strains of a pathogen using the generating function formulation under a complete cross-immunity coupling, deriving conditions for the threshold of coexistence of the second strain. We show that clustering reduces the coexistence threshold of the second strain and its outbreak size in Poisson networks, whilst exhibiting the opposite effects on uniform-degree models. We conclude that clustering within a population must increase the ability of the second wave of an epidemic to spread over a network. We apply our model to the study of multilayer clustered networks and observe the fracturing of the residual graph at two distinct transmissibilities., 9 pages, 5 figures
- Published
- 2021
16. Rational Drug Design of Antineoplastic Agents Using 3D-QSAR, Cheminformatic, and Virtual Screening Approaches
- Author
-
Jelica Vucicevic, John B. O. Mitchell, and Katarina Nikolic
- Subjects
Models, Molecular ,Drug ,Quantitative structure–activity relationship ,Engineering ,In silico ,media_common.quotation_subject ,Drug Evaluation, Preclinical ,Quantitative Structure-Activity Relationship ,Drug design ,Antineoplastic Agents ,Computational biology ,01 natural sciences ,Biochemistry ,03 medical and health sciences ,Drug Discovery ,Humans ,030304 developmental biology ,media_common ,Pharmacology ,0303 health sciences ,Virtual screening ,010405 organic chemistry ,Drug discovery ,business.industry ,Organic Chemistry ,Combinatorial chemistry ,0104 chemical sciences ,Cheminformatics ,Drug Design ,Computer-Aided Design ,Molecular Medicine ,Pharmacophore ,business - Abstract
Background:Computer-Aided Drug Design has strongly accelerated the development of novel antineoplastic agents by helping in the hit identification, optimization, and evaluation.Results:Computational approaches such as cheminformatic search, virtual screening, pharmacophore modeling, molecular docking and dynamics have been developed and applied to explain the activity of bioactive molecules, design novel agents, increase the success rate of drug research, and decrease the total costs of drug discovery. Similarity, searches and virtual screening are used to identify molecules with an increased probability to interact with drug targets of interest, while the other computational approaches are applied for the design and evaluation of molecules with enhanced activity and improved safety profile.Conclusion:In this review are described the main in silico techniques used in rational drug design of antineoplastic agents and presented optimal combinations of computational methods for design of more efficient antineoplastic drugs.
- Published
- 2019
17. Symbiotic and antagonistic disease dynamics on networks using bond percolation
- Author
-
Simon Dobson, Peter Mann, John B. O. Mitchell, and V. Anne Smith
- Subjects
Random graph ,Physics ,Physics - Physics and Society ,Percolation theory ,Assortativity ,Percolation ,FOS: Physical sciences ,Context (language use) ,Node (circuits) ,Physics and Society (physics.soc-ph) ,Statistical physics ,Complex network ,Cluster analysis - Abstract
In this paper we introduce a novel description of the equilibrium state of a bond percolation process on random graphs using the exact method of generating functions. This allows us to find the expected size of the giant connected component (GCC) of two sequential bond percolation processes in which the bond occupancy probability of the second process is modulated (increased or decreased) by a node being inside or outside of the GCC created by the first process. In the context of epidemic spreading this amounts to both a antagonistic partial immunity or a synergistic partial coinfection interaction between the two sequential diseases. We examine configuration model networks with tunable clustering. We find that the emergent evolutionary behaviour of the second strain is highly dependent on the details of the coupling between the strains. Contact clustering generally reduces the outbreak size of the second strain relative to unclustered topologies; however, positive assortativity induced by clustered contacts inverts this conclusion for highly transmissible disease dynamics., 10 pages, 5 figures
- Published
- 2021
18. Degree correlations in graphs with clique clustering
- Author
-
Peter Mann, V. Anne Smith, John B. O. Mitchell, Simon Dobson, EPSRC, University of St Andrews. School of Computer Science, University of St Andrews. St Andrews Bioinformatics Unit, University of St Andrews. Office of the Principal, University of St Andrews. St Andrews Centre for Exoplanet Science, University of St Andrews. Centre for Biological Diversity, University of St Andrews. Centre for Higher Education Research, University of St Andrews. Scottish Oceans Institute, University of St Andrews. Institute of Behavioural and Neural Sciences, University of St Andrews. School of Biology, University of St Andrews. EaSTCHEM, University of St Andrews. Biomedical Sciences Research Complex, University of St Andrews. School of Chemistry, and University of St Andrews. Sir James Mackenzie Institute for Early Diagnosis
- Subjects
MCC ,QA75 ,QC Physics ,QA75 Electronic computers. Computer science ,T-NDAS ,Complex networks ,QA Mathematics ,QA ,Clustering ,QC ,MathematicsofComputing_DISCRETEMATHEMATICS - Abstract
Funding: This work was partially supported by the UK Engineering and Physical Sciences Research Council under grant number EP/N007565/1 (Science of Sensor Systems Software). Correlations among the degrees of nodes in random graphs often occur when clustering is present. In this paper we define a joint-degree correlation function for nodes in the giant component of clustered configuration model networks which are comprised of higher-order subgraphs. We use this model to investigate, in detail, the organisation among nearest-neighbour subgraphs for random graphs as a function of subgraph topology as well as clustering. We find an expression for the average joint degree of a neighbour in the giant component at the critical point for these networks. Finally, we introduce a novel edge-disjoint clique decomposition algorithm and investigate the correlations between the subgraphs of empirical networks. Postprint
- Published
- 2021
19. Percolation in random graphs with higher-order clustering
- Author
-
Simon Dobson, Peter Mann, John B. O. Mitchell, V. Anne Smith, University of St Andrews. School of Computer Science, University of St Andrews. Sir James Mackenzie Institute for Early Diagnosis, University of St Andrews. School of Chemistry, University of St Andrews. Office of the Principal, University of St Andrews. St Andrews Centre for Exoplanet Science, University of St Andrews. Centre for Biological Diversity, University of St Andrews. Centre for Higher Education Research, University of St Andrews. Scottish Oceans Institute, University of St Andrews. Institute of Behavioural and Neural Sciences, University of St Andrews. School of Biology, University of St Andrews. EaSTCHEM, University of St Andrews. Biomedical Sciences Research Complex, and University of St Andrews. St Andrews Bioinformatics Unit
- Subjects
Random graph ,QA75 ,Physics - Physics and Society ,Statistical Mechanics (cond-mat.stat-mech) ,Computer science ,Generalization ,QA75 Electronic computers. Computer science ,QH301 Biology ,T-NDAS ,FOS: Physical sciences ,Physics and Society (physics.soc-ph) ,Complex network ,QD Chemistry ,Giant component ,QH301 ,Percolation theory ,Computer Science::Discrete Mathematics ,Percolation ,QD ,Statistical physics ,Cluster analysis ,Condensed Matter - Statistical Mechanics ,Generating function (physics) - Abstract
Percolation theory can be used to describe the structural properties of complex networks using the generating function formulation. This mapping assumes that the network is locally tree-like and does not contain short-range loops between neighbours. In this paper we use the generating function formulation to examine clustered networks that contain simple cycles and cliques of any order. We use the natural generalisation to the Molloy-Reed criterion for these networks to describe their critical properties and derive an analytical description of the size of the giant component, providing solutions for Poisson and power-law networks. We find that networks comprising larger simple cycles behave increasingly more tree-like. Conversley, clustering comprised of larger cliques increasingly deviate from the tree-like solution, although the behaviour is strongly dependent on the degree-assortativity., 11 pages, 6 figures
- Published
- 2021
20. Random graphs with arbitrary clustering and their applications
- Author
-
John B. O. Mitchell, V. Anne Smith, Peter Mann, Simon Dobson, University of St Andrews. School of Computer Science, University of St Andrews. Sir James Mackenzie Institute for Early Diagnosis, University of St Andrews. School of Chemistry, University of St Andrews. Office of the Principal, University of St Andrews. St Andrews Centre for Exoplanet Science, University of St Andrews. Centre for Biological Diversity, University of St Andrews. Centre for Higher Education Research, University of St Andrews. Scottish Oceans Institute, University of St Andrews. Institute of Behavioural and Neural Sciences, University of St Andrews. School of Biology, University of St Andrews. EaSTCHEM, University of St Andrews. Biomedical Sciences Research Complex, and University of St Andrews. St Andrews Bioinformatics Unit
- Subjects
QA75 ,Physics - Physics and Society ,Computer science ,QA75 Electronic computers. Computer science ,T-NDAS ,Structure (category theory) ,Complex networks ,FOS: Physical sciences ,Clustered networks ,Physics and Society (physics.soc-ph) ,Topology ,01 natural sciences ,010305 fluids & plasmas ,0103 physical sciences ,QA Mathematics ,010306 general physics ,Cluster analysis ,QA ,Condensed Matter - Statistical Mechanics ,QC ,Generating function (physics) ,Random graphs ,Random graph ,Statistical Mechanics (cond-mat.stat-mech) ,Degree (graph theory) ,Complex network ,QC Physics ,Percolation ,Network analysis - Abstract
The structure of many real networks is not locally tree-like and hence, network analysis fails to characterise their bond percolation properties. In a recent paper [P. Mann, V. A. Smith, J. B. O. Mitchell, and S. Dobson, Percolation in random graphs with higher-order clustering, arXiv e-prints, p. arXiv:2006.06744, June 2020.], we developed analytical solutions to the percolation properties of random networks with homogeneous clustering (clusters whose nodes are degree-equivalent). In this paper, we extend this model to investigate networks that contain clusters whose nodes are not degree-equivalent, including multilayer networks. Through numerical examples we show how this method can be used to investigate the properties of random complex networks with arbitrary clustering, extending the applicability of the configuration model and generating function formulation., Comment: 11 pages, 10 figures
- Published
- 2021
21. Three machine learning models for the 2019 Solubility Challenge
- Author
-
John B. O. Mitchell, University of St Andrews. School of Chemistry, University of St Andrews. Biomedical Sciences Research Complex, and University of St Andrews. EaSTCHEM
- Subjects
QA75 ,Extra trees ,Chemistry(all) ,Computer science ,Solubility prediction ,QA75 Electronic computers. Computer science ,Medicine (miscellaneous) ,Machine learning ,computer.software_genre ,Bagging ,Pharmacology (medical) ,QD ,General Pharmacology, Toxicology and Pharmaceutics ,Solubility ,Aqueous intrinsic solubility ,business.industry ,lcsh:RM1-950 ,3rd-DAS ,QD Chemistry ,Random forest ,Wisdom of crowds ,Inter-laboratory error ,lcsh:Therapeutics. Pharmacology ,Consensus classifiers ,Chemistry (miscellaneous) ,Artificial intelligence ,business ,computer ,Computer Science(all) - Abstract
We describe three machine learning models submitted to the 2019 Solubility Challenge. All are founded on tree-like classifiers, with one model being based on Random Forest and another on the related Extra Trees algorithm. The third model is a consensus predictor combining the former two with a Bagging classifier. We call this consensus classifier Vox Machinarum, and here discuss how it benefits from the Wisdom of Crowds. On the first 2019 Solubility Challenge test set of 100 low-variance intrinsic aqueous solubilities, Extra Trees is our best classifier. One the other, a high-variance set of 32 molecules, we find that Vox Machinarum and Random Forest both perform a little better than Extra Trees, and almost equally to one another. We also compare the gold standard solubilities from the 2019 Solubility Challenge with a set of literature-based solubilities for most of the same compounds.
- Published
- 2020
22. Cooperative coinfection dynamics on clustered networks
- Author
-
Simon Dobson, John B. O. Mitchell, V. Anne Smith, Peter Mann, University of St Andrews. Office of the Principal, University of St Andrews. St Andrews Centre for Exoplanet Science, University of St Andrews. Centre for Biological Diversity, University of St Andrews. Centre for Higher Education Research, University of St Andrews. Scottish Oceans Institute, University of St Andrews. Institute of Behavioural and Neural Sciences, University of St Andrews. School of Biology, University of St Andrews. EaSTCHEM, University of St Andrews. Biomedical Sciences Research Complex, University of St Andrews. School of Chemistry, University of St Andrews. Sir James Mackenzie Institute for Early Diagnosis, University of St Andrews. School of Computer Science, and University of St Andrews. St Andrews Bioinformatics Unit
- Subjects
Physics - Physics and Society ,Population ,T-NDAS ,Complex networks ,FOS: Physical sciences ,Computational biology ,Physics and Society (physics.soc-ph) ,Biology ,Primary disease ,01 natural sciences ,010305 fluids & plasmas ,0103 physical sciences ,medicine ,010306 general physics ,Cluster analysis ,education ,Quantitative Biology - Populations and Evolution ,QC ,education.field_of_study ,Percolation (cognitive psychology) ,Coinfection ,Dynamics (mechanics) ,Populations and Evolution (q-bio.PE) ,Percolation ,Complex network ,medicine.disease ,Emergent disease ,QC Physics ,Epidemic spreading ,FOS: Biological sciences - Abstract
Coinfection is the process by which a host that is infected with a pathogen becomes infected by a second pathogen at a later point in time. An immunosuppressant host response to a primary disease can facilitate spreading of a subsequent emergent pathogen among the population. Social contact patterns within the substrate populace can be modelled using complex networks and it has been shown that contact patterns vastly influence the emergent disease dynamics. In this paper, we consider the effect of contact clustering on the coinfection dynamics of two pathogens spreading over a network. We use the generating function formulation to describe the expected outbreak sizes of each pathogen and numerically study the threshold criteria that permit the coexistence of each strain among the network. We find that the effects of clustering on the levels of coinfection are governed by the details of the contact topology., Comment: 9 page, 5 figures
- Published
- 2020
- Full Text
- View/download PDF
23. 3. In Silico methods to predict solubility
- Author
-
James L. McDonagh, John B. O. Mitchell, David S. Palmer, and R. Skyner
- Subjects
Materials science ,Computational chemistry ,In silico ,Solubility - Published
- 2019
24. Probing the average distribution of water in organic hydrate crystal structures with radial distribution functions (RDFs)
- Author
-
John B. O. Mitchell, Colin R. Groom, R. Skyner, University of St Andrews. School of Chemistry, University of St Andrews. Biomedical Sciences Research Complex, and University of St Andrews. EaSTCHEM
- Subjects
Chemistry ,Solvation ,DAS ,02 engineering and technology ,General Chemistry ,Crystal structure ,QD Chemistry ,010402 general chemistry ,021001 nanoscience & nanotechnology ,Condensed Matter Physics ,Crystal engineering ,01 natural sciences ,0104 chemical sciences ,law.invention ,Crystallography ,Distribution function ,Polymorphism (materials science) ,law ,Chemical physics ,Molecule ,QD ,General Materials Science ,Crystallization ,0210 nano-technology ,Hydrate - Abstract
The authors thank the University of St Andrews, EPSRC (grant EP/L505079/1). The abundance of crystal structures of solvated organic molecules reflects the common role of solvent in the crystallisation process. An understanding of solvation is therefore important for crystal engineering, with solvent choice often affecting polymorphism as well as influencing the crystal structure. Of particular importance is the role of water, and a number of approaches have previously been considered in the analysis of large datasets of organic hydrates. In this work we attempt to develop a method suitable for application to organic hydrate crystal structures, in order to better understand the distribution of water molecules in such systems. We present a model aimed at combining the distribution functions of multiple atom pairs from a number of crystal structures. From this, we can comment qualitatively on the average distribution of water in organic hydrates. Postprint
- Published
- 2017
25. Are the Sublimation Thermodynamics of Organic Molecules Predictable?
- Author
-
John B. O. Mitchell, David S. Palmer, Tanja van Mourik, James L. McDonagh, University of St Andrews. School of Chemistry, University of St Andrews. EaSTCHEM, and University of St Andrews. Biomedical Sciences Research Complex
- Subjects
Models, Molecular ,Quantitative structure–activity relationship ,Informatics ,Entropy ,General Chemical Engineering ,Enthalpy ,Molecular Conformation ,Quantitative Structure-Activity Relationship ,Thermodynamics ,02 engineering and technology ,Library and Information Sciences ,010402 general chemistry ,01 natural sciences ,Phase Transition ,symbols.namesake ,Enthalpy of sublimation ,Partial least squares regression ,QD ,Organic Chemicals ,Predictability ,Solubility ,Lattice energy ,Chemistry ,DAS ,General Chemistry ,QD Chemistry ,021001 nanoscience & nanotechnology ,0104 chemical sciences ,Computer Science Applications ,Gibbs free energy ,symbols ,0210 nano-technology - Abstract
JMcD and JBOM would like to thank SULSA for funding. DSP thanks the University of Strathclyde for support through its Strategic Appointment and Investment Scheme. We compare a range of computational methods for the prediction of sublimation thermodynamics (enthalpy, entropy and free energy of sublimation). These include a model from theoretical chemistry that utilizes crystal lattice energy minimization (with the DMACRYS program) and QSPR models generated by both machine learning (Random Forest and Support Vector Machines) and regression (Partial Least Squares) methods. Using these methods we investigate the predictability of the enthalpy, entropy and free energy of sublimation, with consideration of whether such a method may be able to improve solubility prediction schemes. Previous work has suggested that the major source of error in solubility prediction schemes involving a thermodynamic cycle via the solid state is in the modeling of the free energy change away from the solid state. Yet contrary to this conclusion other work has found that the inclusion of terms such as the enthalpy of sublimation in QSPR methods does not improve the predictions of solubility. We suggest the use of theoretical chemistry terms, detailed explicitly in the methods section, as descriptors for the prediction of the enthalpy and free energy of sublimation. A dataset of 158 molecules with experimental sublimation thermodynamics values and some CSD refcodes has been collected from the literature and is provided with their original source references. Postprint
- Published
- 2016
26. Crystal structure evaluation: calculating relative stabilities and other criteria: general discussion
- Author
-
J. Christian Schön, Matthew R. Ryder, Jonas Nyman, Seiji Tsuzuki, Alexandre Tkatchenko, Alan Hare, John B. O. Mitchell, Marcus A. Neumann, Julian Helfferich, Samuel Alexander Jobbins, Johannes Hoja, David H. Bowskill, Ivo B. Rietveld, Luca Iuzzolino, Pablo M. Piaggi, Michael T. Ruggiero, Sharmarke Mohamed, Sarah L. Price, Rui Guo, Mihails Arhangelskis, Qiang Zhu, Artem R. Oganov, Matthew Addicoat, Jason C. Cole, Gregory J. O. Beran, Graeme M. Day, Sten O. Nilsson Lill, Doris E. Braun, Scott M. Woodley, Christopher R. Taylor, Virginia M. Burger, German Sastre, Claire S. Adjiman, Noa Marom, Aurora J. Cruz-Cabeza, David McKay, Jan Gerit Brandenburg, Susan M. Reutzel-Edens, Grahame Woollam, Joost A. van den Ende, Volker L. Deringer, Respiratory Epidemiology and Public Health, Imperial College London-Royal Brompton Hospital-National Heart and Lung Institute [UK], Mulliken Center for Theoretical Chemistry, Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn, Blackett Laboratory, Imperial College London, University of Cambridge [UK] (CAM), Brigham and Women's Hospital [Boston], Karlsruhe Institute of Technology (KIT), Institute for Computational Engineering and Sciences [Austin] (ICES), University of Texas at Austin [Austin], School of Engineering and Physical Sciences, Heriot-Watt University, Heriot-Watt University [Edinburgh] (HWU), University College of London [London] (UCL), Sciences et Méthodes Séparatives (SMS), Université de Rouen Normandie (UNIROUEN), Normandie Université (NU)-Normandie Université (NU), Univ Politecnica Valencia Consejo Super Invest, Inst Tecnol Quim UPV CSIC, Valencia 46022, Spain, Max Planck Institute for Solid State Research, Max-Planck-Gesellschaft, National Institute of Advanced Industrial Science and Technology (AIST), and Department of Chemistry, University College London
- Subjects
[CHIM.THEO]Chemical Sciences/Theoretical and/or physical chemistry ,Materials science ,010304 chemical physics ,0103 physical sciences ,Thermodynamics ,02 engineering and technology ,Crystal structure ,Physical and Theoretical Chemistry ,021001 nanoscience & nanotechnology ,0210 nano-technology ,01 natural sciences ,ComputingMilieux_MISCELLANEOUS - Abstract
International audience
- Published
- 2018
27. Artificial intelligence in pharmaceutical research and development
- Author
-
John B. O. Mitchell
- Subjects
Pharmacology ,Knowledge management ,010405 organic chemistry ,business.industry ,Computer science ,MEDLINE ,Pharmaceutical Research ,Quantitative Structure-Activity Relationship ,02 engineering and technology ,021001 nanoscience & nanotechnology ,01 natural sciences ,0104 chemical sciences ,Drug Development ,Artificial Intelligence ,Drug Discovery ,Molecular Medicine ,Humans ,Pharmaceutical sciences ,0210 nano-technology ,business - Published
- 2018
28. Applications of crystal structure prediction – inorganic and network structures: general discussion
- Author
-
Scott M. Woodley, Yi Li, John B. O. Mitchell, Peter R. Spackman, Frederik Claeyssens, Matthew S. Dyer, Graeme M. Day, Caroline Mellot-Draznieks, Daniel W. Davies, Sharmarke Mohamed, Michael T. Ruggiero, Matthew R. Ryder, J. Christian Schön, Sarah L. Price, Virginia M. Burger, Artem R. Oganov, Alan Hare, Qiang Zhu, German Sastre, Laboratoire de Chimie des Processus Biologiques (LCPB), and Collège de France (CdF (institution))-Institut de Chimie du CNRS (INC)-Sorbonne Université (SU)-Centre National de la Recherche Scientifique (CNRS)
- Subjects
Materials science ,Network structure ,02 engineering and technology ,010402 general chemistry ,021001 nanoscience & nanotechnology ,computer.software_genre ,01 natural sciences ,0104 chemical sciences ,Crystal structure prediction ,Data mining ,[PHYS.PHYS.PHYS-CHEM-PH]Physics [physics]/Physics [physics]/Chemical Physics [physics.chem-ph] ,Physical and Theoretical Chemistry ,0210 nano-technology ,computer ,ComputingMilieux_MISCELLANEOUS - Abstract
International audience
- Published
- 2018
29. Is Experimental Data Quality the Limiting Factor in Predicting the Aqueous Solubility of Druglike Molecules?
- Author
-
John B. O. Mitchell, David S. Palmer, University of St Andrews. EaSTCHEM, University of St Andrews. Biomedical Sciences Research Complex, and University of St Andrews. School of Chemistry
- Subjects
Bioavailability ,Chemistry, Pharmaceutical ,Quantitative Structure-Activity Relationship ,Pharmaceutical Science ,Henderson–Hasselbalch equation ,Toxicology ,QSPR ,Crystal ,Drug Discovery ,QD ,Solubility ,Henderson-Hasselbalch ,R2C ,ADME ,Molecular Structure ,QSAR ,Chemistry ,Temperature ,Experimental uncertainty analysis ,Research Design ,Lipinski's rule of five ,Regression Analysis ,Thermodynamics ,Molecular Medicine ,CheqSol ,BDC ,Algorithms ,Dissolution ,Limiting factor ,RM ,Quantitative structure–activity relationship ,RS ,Machine learning ,Rule-of-five ,Random Forest ,Reproducibility of Results ,Water ,Experimental data ,DAS ,Experimental error ,Druglike ,QD Chemistry ,RM Therapeutics. Pharmacology ,Kinetics ,ADMET ,Models, Chemical ,Pharmaceutical ,Polymorph ,General solubility equation ,Noyes-Whitney ,Software - Abstract
D.S.P. is grateful for funding from the European Commission through a Marie Curie Intra-European Fellowship within the seventh European Community Framework Programme (FP7-PEOPLE-2010-IEF). D.S.P. thanks the University of Strathclyde for support through its Strategic Appointment and Investment Scheme. Computations were performed at the EPSRC funded ARCHIE-WeSt High Performance Computer (www.archie-west.ac.uk, EPSRC grant no. EP K0005861). J.B.O.M. thanks the Scottish Universities Life Sciences Alliance (SULSA) for financial support and EaStCHEM for access to the EaStCHEM Research Computing Facility. We report the results of testing quantitative structure-property relationships (QSPR) that were trained upon the same druglike molecules but two different sets of solubility data: (i) data extracted from several different sources from the published literature, for which the experimental uncertainty is estimated to be 0.6-0.7 log S units (referred to mol/L); (ii) data measured by a single accurate experimental method (CheqSol), for which experimental uncertainty is typically
- Published
- 2014
30. In Silico Target Predictions: Defining a Benchmarking Data Set and Comparison of Performance of the Multiclass Naïve Bayes and Parzen-Rosenblatt Window
- Author
-
Alexios Koutsoukas, John B. O. Mitchell, Robert C. Glen, Werner Klaffke, Yasaman KalantarMotamedi, Andreas Bender, Robert Lowe, and Hamse Y. Mussa
- Subjects
Computer science ,General Chemical Engineering ,Library and Information Sciences ,Ligands ,Machine learning ,computer.software_genre ,Set (abstract data type) ,Naive Bayes classifier ,Bayes' theorem ,Drug Discovery ,Humans ,business.industry ,Probabilistic logic ,Computational Biology ,Proteins ,Reproducibility of Results ,Bayes Theorem ,General Chemistry ,chEMBL ,Computer Science Applications ,Data set ,Benchmarking ,ComputingMethodologies_PATTERNRECOGNITION ,Binary classification ,Cheminformatics ,Artificial intelligence ,business ,computer ,Algorithms ,Protein Binding - Abstract
In this study, two probabilistic machine-learning algorithms were compared for in silico target prediction of bioactive molecules, namely the well-established Laplacian-modified Naïve Bayes classifier (NB) and the more recently introduced (to Cheminformatics) Parzen-Rosenblatt Window. Both classifiers were trained in conjunction with circular fingerprints on a large data set of bioactive compounds extracted from ChEMBL, covering 894 human protein targets with more than 155,000 ligand-protein pairs. This data set is also provided as a benchmark data set for future target prediction methods due to its size as well as the number of bioactivity classes it contains. In addition to evaluating the methods, different performance measures were explored. This is not as straightforward as in binary classification settings, due to the number of classes, the possibility of multiple class memberships, and the need to translate model scores into "yes/no" predictions for assessing model performance. Both algorithms achieved a recall of correct targets that exceeds 80% in the top 1% of predictions. Performance depends significantly on the underlying diversity and size of a given class of bioactive compounds, with small classes and low structural similarity affecting both algorithms to different degrees. When tested on an external test set extracted from WOMBAT covering more than 500 targets by excluding all compounds with Tanimoto similarity above 0.8 to compounds from the ChEMBL data set, the current methodologies achieved a recall of 63.3% and 66.6% among the top 1% for Naïve Bayes and Parzen-Rosenblatt Window, respectively. While those numbers seem to indicate lower performance, they are also more realistic for settings where protein targets need to be established for novel chemical substances.
- Published
- 2013
31. Drug Design for CNS Diseases: Polypharmacological Profiling of Compounds Using Cheminformatic, 3D-QSAR and Virtual Screening Methodologies
- Author
-
Lazaros Mavridis, John B. O. Mitchell, Kemal Yelekçi, Teodora Djikic, Jelica Vucicevic, Danica Agbaba, Katarina Nikolic, Yelekçi, Kemal, University of St Andrews. School of Chemistry, University of St Andrews. Biomedical Sciences Research Complex, and University of St Andrews. EaSTCHEM
- Subjects
0301 basic medicine ,Drug ,Virtual screening ,Quantitative structure–activity relationship ,media_common.quotation_subject ,NDAS ,Drug design ,Review ,Drug action ,Computational biology ,Pharmacology ,Biology ,01 natural sciences ,Docking ,03 medical and health sciences ,Multi-target drugs ,medicine ,QD ,Rational drug design ,CNS disease ,media_common ,Virtual docking ,QSAR ,General Neuroscience ,Neurodegeneration ,QD Chemistry ,chEMBL ,medicine.disease ,0104 chemical sciences ,3. Good health ,010404 medicinal & biomolecular chemistry ,030104 developmental biology ,Docking (molecular) ,Cheminformatic - Abstract
Support was kindly provided by the EU COST Action CM1103. DA, KN, and JV kindly acknowledge national project number 172033 and OI1612039 supported by the Ministry of the Republic of Serbia. TDj and KY kindly acknowledge "Training in Neurodegeneration, Therapeutics, Intervention and Neurorepair" project number 608381 funded by Marie Skłodowska-Curie action, funding scheme: FP7-MC-ITN The diverse cerebral mechanisms implicated in CNS (Central Nervous System) diseases together with the heterogeneous and overlapping nature of phenotypes indicated that multitarget strategies may be appropriate for the improved treatment of complex brain diseases. Understanding how the neurotransmitter systems interact is also important in optimizing therapeutic strategies. Pharmacological intervention on one target will often influence another one, such as the well-established serotonin-dopamine interaction or the dopamine-glutamate interaction. It is now accepted that drug action can involve plural targets and that polypharmacological interaction with multiple targets, to address disease in more subtle and effective ways, is a key concept for development of novel drug candidates against complex CNS diseases. A multi-target therapeutic strategy for Alzheimer`s disease resulted in the development of very effective Multi-Target Designed Ligands (MTDL) that act on both the cholinergic and monoaminergic systems, and also retard the progression of neurodegeneration by inhibiting amyloid aggregation. Many compounds already in databases have been investigated as ligands for multiple targets in drug-discovery programs. A probabilistic method, the Parzen-Rosenblatt Window approach, was used to build a “predictor” model using data collected from the ChEMBL database. The model can be used to predict both the primary pharmaceutical target and off-targets of a compound based on its structure. Several multi-target ligands were selected for further study, as compounds with possible additional beneficial pharmacological activities. Based on all these findings, it is concluded that multipotent ligands targeting AChE/MAO-A/MAO-B and also D1-R/D2-R/5-HT2A-R/H3-R are promising novel drug candidates with improved efficacy and beneficial neuroleptic and procognitive activities in treatment of Alzheimer’s and related neurodegenerative diseases. Structural information for drug targets permits docking and virtual screening and exploration of the molecular determinants of binding, hence facilitating the design of multi-targeted drugs. The crystal structures and models of enzymes of the monoaminergic and cholinergic systems have been used to investigate the structural origins of target selectivity and to identify molecular determinants, in order to direct the development of novel multifunctional ligands. Publisher PDF
- Published
- 2016
32. First-Principles Calculation of the Intrinsic Aqueous Solubility of Crystalline Druglike Molecules
- Author
-
David S. Palmer, John B. O. Mitchell, Maxim V. Fedorov, Tanja van Mourik, James L. McDonagh, University of St Andrews. School of Chemistry, University of St Andrews. Biomedical Sciences Research Complex, and University of St Andrews. EaSTCHEM
- Subjects
Aqueous solution ,Chemistry ,Ab initio ,Thermodynamics ,Crystal structure ,QD Chemistry ,Computer Science Applications ,Crystal ,Computational chemistry ,Solvent models ,Molecule ,QD ,Sublimation (phase transition) ,Physics::Chemical Physics ,Physical and Theoretical Chemistry ,Solubility - Abstract
We demonstrate that the intrinsic aqueous solubility of crystalline druglike molecules can be estimated with reasonable accuracy from sublimation free energies calculated using crystal lattice simulations and hydration free energies calculated using the 3D Reference Interaction Site Model (3D-RISM) of the Integral Equation Theory of Molecular Liquids (IET). The solubilities of 25 crystalline druglike molecules taken from different chemical classes are predicted by the model with a correlation coefficient of R = 0.85 and a root mean square error (RMSE) equal to 1.45 log(10) S units, which is significantly more accurate than results obtained using implicit continuum solvent models. The method is not directly parametrized against experimental solubility data, and it offers a full computational characterization of the thermodynamics of transfer of the drug molecule from crystal phase to gas phase to dilute aqueous solution. Postprint
- Published
- 2012
33. Classifying Molecules Using a Sparse Probabilistic Kernel Binary Classifier
- Author
-
Hamse Y. Mussa, John B. O. Mitchell, Robert Lowe, and Robert C. Glen
- Subjects
General Chemical Engineering ,Library and Information Sciences ,Machine learning ,computer.software_genre ,Models, Biological ,Field (computer science) ,Relevance vector machine ,Set (abstract data type) ,Statistics::Machine Learning ,Kernel (linear algebra) ,Drug Discovery ,Enzyme Inhibitors ,Mathematics ,business.industry ,Probabilistic logic ,Computational Biology ,Pattern recognition ,General Chemistry ,Computer Science Applications ,Data set ,Support vector machine ,ComputingMethodologies_PATTERNRECOGNITION ,Binary classification ,Artificial intelligence ,business ,computer ,Algorithms - Abstract
The central idea of supervised classification in chemoinformatics is to design a classifying algorithm that accurately assigns a new molecule to one of a set of predefined classes. Tipping has devised a classifying scheme, the Relevance Vector Machine (RVM), which is in terms of sparsity equivalent to the Support Vector Machine (SVM). However, unlike SVM classifiers, the RVM classifiers are probabilistic in nature, which is crucial in the field of decision making and risk taking. In this work, we investigate the performance of RVM binary classifiers on classifying a subset of the MDDR data set, a standard molecular benchmark data set, into active and inactive compounds. Additionally, we present results that compare the performance of SVM and RVM binary classifiers.
- Published
- 2011
34. Ask the experts: focus on computational chemistry
- Author
-
Andreas Bender, Yvonne C. Martin, Woody Sherman, Steffen Renner, Charles A. Laughton, Richard A. Bryce, John B. O. Mitchell, Paul Selzer, Guilio Vistoli, Peter Willett, Alessandro Padova, Carlton Taft, Jürgen Bajorath, Maria Letizia Barreca, Michael C. Hutter, Tiziano Tuccinardi, Wolfgang Sippl, and Christian Laggner
- Subjects
Pharmacology ,High rate ,Computer science ,Space (commercial competition) ,Computing Methodologies ,Data science ,Focus (linguistics) ,Chemistry ,chemistry.chemical_compound ,chemistry ,Ask price ,Chemical physics ,Drug Design ,Pharmaceutical ,Drug Discovery ,Chemogenomics ,Computer-Aided Design ,Molecular Medicine ,Chemistry, Pharmaceutical - Abstract
Q What recent development in the field has most attracted your attention? Over the past few years I have paid particular attention to systematic mining of compound activity data and mapping of ligand–target interaction spaces. Many of these efforts have been covered under the ‘chemogenomics’ or ‘pharmacological space’ labels. In general, systematic mining of activity data, still growing at very high rates, can teach us important lessons about structure–activity or selectivity relationships as well as therapeutically relevant ligand preferences of target families.
- Published
- 2011
35. Development and Comparison of hERG Blocker Classifiers: Assessment on Different Datasets Yields Markedly Different Results
- Author
-
John B. O. Mitchell, Robert C. Glen, and Richard L. Marchese Robinson
- Subjects
Winnow ,Virtual screening ,biology ,Computer science ,Organic Chemistry ,hERG ,computer.software_genre ,Computer Science Applications ,Random forest ,Correlation ,Support vector machine ,Binary classification ,Structural Biology ,Drug Discovery ,biology.protein ,Molecular Medicine ,Data mining ,computer ,Applicability domain - Abstract
In recent years, considerable effort has been invested in the development of classification models for prospective hERG inhibitors, due to the implications of hERG blockade for cardiotoxicity and the low throughput of functional hERG assays. We present novel approaches for binary classification which seek to separate strong inhibitors (IC50
- Published
- 2011
36. Erratum for 'In Silico Target Predictions: Defining a Benchmarking Data Set and Comparison of Performance of the Multiclass Naı̈ve Bayes and Parzen-Rosenblatt Window'
- Author
-
Alexios Koutsoukas, Robert Lowe, Yasaman KalantarMotamedi, Hamse Y. Mussa, Werner Klaffke, John B. O. Mitchell, Robert C. Glen, and Andreas Bender
- Subjects
General Chemical Engineering ,General Chemistry ,Library and Information Sciences ,Computer Science Applications - Published
- 2014
37. Predicting Phospholipidosis Using Machine Learning
- Author
-
John B. O. Mitchell, Robert C. Glen, and Robert Lowe
- Subjects
Support Vector Machine ,Databases, Factual ,Computer science ,Pharmaceutical Science ,Machine learning ,computer.software_genre ,Lipidoses ,01 natural sciences ,Models, Biological ,Article ,Phospholipidosis ,03 medical and health sciences ,Artificial Intelligence ,Drug Discovery ,Animals ,Humans ,Phospholipids ,030304 developmental biology ,0303 health sciences ,Random Forest ,business.industry ,prediction ,0104 chemical sciences ,Random forest ,Support vector machine ,010404 medicinal & biomolecular chemistry ,machine learning ,in silico ,Molecular Medicine ,Artificial intelligence ,business ,computer ,Predictive methods - Abstract
Phospholipidosis is an adverse effect caused by numerous cationic amphiphilic drugs and can affect many cell types. It is characterized by the excess accumulation of phospholipids and is most reliably identified by electron microscopy of cells revealing the presence of lamellar inclusion bodies. The development of phospholipidosis can cause a delay in the drug development process, and the importance of computational approaches to the problem has been well documented. Previous work on predictive methods for phospholipidosis showed that state of the art machine learning methods produced the best results. Here we extend this work by looking at a larger data set mined from the literature. We find that circular fingerprints lead to better models than either E-Dragon descriptors or a combination of the two. We also observe very similar performance in general between Random Forest and Support Vector Machine models.
- Published
- 2010
38. A machine learning approach to predicting protein–ligand binding affinity with applications to molecular docking
- Author
-
Pedro J. Ballester and John B. O. Mitchell
- Subjects
Statistics and Probability ,Protein Conformation ,Computer science ,Chemical biology ,Overfitting ,Ligands ,Machine learning ,computer.software_genre ,Biochemistry ,Article ,Protein structure ,Artificial Intelligence ,Cluster Analysis ,Databases, Protein ,Molecular Biology ,Models, Statistical ,Drug discovery ,business.industry ,Ligand ,Binding protein ,Computational Biology ,Proteins ,Reproducibility of Results ,Ligand (biochemistry) ,Computer Science Applications ,Computational Mathematics ,Computational Theory and Mathematics ,Structural biology ,Docking (molecular) ,Data Interpretation, Statistical ,Drug Design ,Artificial intelligence ,business ,computer ,Algorithms ,Protein Binding ,Protein ligand - Abstract
Motivation: Accurately predicting the binding affinities of large sets of diverse protein–ligand complexes is an extremely challenging task. The scoring functions that attempt such computational prediction are essential for analysing the outputs of molecular docking, which in turn is an important technique for drug discovery, chemical biology and structural biology. Each scoring function assumes a predetermined theory-inspired functional form for the relationship between the variables that characterize the complex, which also include parameters fitted to experimental or simulation data and its predicted binding affinity. The inherent problem of this rigid approach is that it leads to poor predictivity for those complexes that do not conform to the modelling assumptions. Moreover, resampling strategies, such as cross-validation or bootstrapping, are still not systematically used to guard against the overfitting of calibration data in parameter estimation for scoring functions. Results: We propose a novel scoring function (RF-Score) that circumvents the need for problematic modelling assumptions via non-parametric machine learning. In particular, Random Forest was used to implicitly capture binding effects that are hard to model explicitly. RF-Score is compared with the state of the art on the demanding PDBbind benchmark. Results show that RF-Score is a very competitive scoring function. Importantly, RF-Score's performance was shown to improve dramatically with training set size and hence the future availability of more high-quality structural and interaction data is expected to lead to improved versions of RF-Score. Contact: pedro.ballester@ebi.ac.uk; jbom@st-andrews.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online.
- Published
- 2010
39. Toxicological relationships between proteins obtained from protein target predictions of large toxicity databases
- Author
-
John B. O. Mitchell and Florian Nigsch
- Subjects
Pharmacology ,Winnow ,Databases, Factual ,Drug-Related Side Effects and Adverse Reactions ,Database ,Computer aid ,In silico ,Monte Carlo method ,Computational Biology ,Proteins ,Computational toxicology ,Biology ,Toxicology ,computer.software_genre ,Models, Biological ,Predictive factor ,Toxicity ,Humans ,Protein target ,Monte Carlo Method ,computer ,Algorithms ,Forecasting - Abstract
The combination of models for protein target prediction with large databases containing toxicological information for individual molecules allows the derivation of "toxiclogical" profiles, i.e., to what extent are molecules of known toxicity predicted to interact with a set of protein targets. To predict protein targets of drug-like and toxic molecules, we built a computational multiclass model using the Winnow algorithm based on a dataset of protein targets derived from the MDL Drug Data Report. A 15-fold Monte Carlo cross-validation using 50% of each class for training, and the remaining 50% for testing, provided an assessment of the accuracy of that model. We retained the 3 top-ranking predictions and found that in 82% of all cases the correct target was predicted within these three predictions. The first prediction was the correct one in almost 70% of cases. A model built on the whole protein target dataset was then used to predict the protein targets for 150000 molecules from the MDL Toxicity Database. We analysed the frequency of the predictions across the panel of protein targets for experimentally determined toxicity classes of all molecules. This allowed us to identify clusters of proteins related by their toxicological profiles, as well as toxicities that are related. Literature-based evidence is provided for some specific clusters to show the relevance of the relationships identified.
- Published
- 2008
40. Why Are Some Properties More Difficult To Predict than Others? A Study of QSPR Models of Solubility, Melting Point, and Log P
- Author
-
Laura D. Hughes, Florian Nigsch, John B. O. Mitchell, and David S. Palmer
- Subjects
Physics ,Octanols ,Quantitative structure–activity relationship ,Coefficient of determination ,Mean squared error ,General Chemical Engineering ,Quantitative Structure-Activity Relationship ,Water ,Thermodynamics ,General Chemistry ,Library and Information Sciences ,Computer Science Applications ,Support vector machine ,Partition coefficient ,Models, Chemical ,Solubility ,Artificial Intelligence ,Test set ,Melting point ,Transition Temperature - Abstract
This paper attempts to elucidate differences in QSPR models of aqueous solubility (Log S), melting point (Tm), and octanol-water partition coefficient (Log P), three properties of pharmaceutical interest. For all three properties, Support Vector Machine models using 2D and 3D descriptors calculated in the Molecular Operating Environment were the best models. Octanol-water partition coefficient was the easiest property to predict, as indicated by the RMSE of the external test set and the coefficient of determination (RMSE = 0.73, r2 = 0.87). Melting point prediction, on the other hand, was the most difficult (RMSE = 52.8 degrees C, r2 = 0.46), and Log S statistics were intermediate between melting point and Log P prediction (RMSE = 0.900, r2 = 0.79). The data imply that for all three properties the lack of measured values at the extremes is a significant source of error. This source, however, does not entirely explain the poor melting point prediction, and we suggest that deficiencies in descriptors used in melting point prediction contribute significantly to the prediction errors.
- Published
- 2008
41. The Chemistry of Protein Catalysis
- Author
-
John B. O. Mitchell, Daniel Almonacid, Gemma L. Holliday, and Janet M. Thornton
- Subjects
Protein Conformation ,Stereochemistry ,Catalysis ,Enzyme catalysis ,Protein structure ,Nucleophile ,Structural Biology ,Enzyme Stability ,Organic chemistry ,Amino Acids ,Databases, Protein ,Molecular Biology ,Histidine ,Binding Sites ,Molecular Structure ,biology ,Chemistry ,Reproducibility of Results ,Active site ,Enzyme Commission number ,Enzymes ,biology.protein ,Protons ,Protein Processing, Post-Translational ,Cysteine - Abstract
We report, for the first time, on the statistics of chemical mechanisms and amino acid residue functions that occur in enzyme reaction sequences using the MACiE database of 202 distinct enzyme reaction mechanisms as a knowledge base. MACiE currently holds representatives from each Enzyme Commission sub-subclass where there is an available crystal structure and sufficient evidence in the primary literature for a mechanism. Each catalytic step of every reaction sequence in MACiE is fully annotated, so that it includes the function of the catalytic residues involved in the reaction and the chemical mechanisms by which substrates are transformed into products. We show that the most catalytic amino acid residues are histidine, cysteine and aspartate, which are also the residues whose side-chains are more likely to serve as reactants, and that have the greatest versatility of function. We show that electrophilic reactions in enzymes are very rare, and the majority of enzyme reactions rely upon nucleophilic and general acid/base chemistry. However, although rare, radical (homolytic) reactions are much more common than electrophilic reactions. Thus, the majority of amino acid residues perform stabilisation roles (as spectators) or proton shuttling roles (as reactants). The analysis presented provides a better understanding of the mechanisms of enzyme catalysis and may act as an initial step in the validation and prediction of mechanism in an enzyme active site.
- Published
- 2007
42. Theoretical Study of the Reaction Mechanism of Streptomyces coelicolor Type II Dehydroquinase
- Author
-
Blomberg Lm, Martina Mangold, Jochen Blumberger, and John B. O. Mitchell
- Subjects
Reaction mechanism ,biology ,Chemistry ,Stereochemistry ,Reaction step ,Active site ,Protonation ,Enol ,Computer Science Applications ,chemistry.chemical_compound ,Deprotonation ,Computational chemistry ,biology.protein ,Molecule ,Density functional theory ,Physical and Theoretical Chemistry - Abstract
The reaction mechanism of a type II dehydroquinase (DHQase) from Streptomyces coelicolor was investigated using molecular dynamics simulation and density functional theory (DFT) calculations. DHQase catalyzes the elimination of a water molecule from dehydroquinate (DHQ), a key step in the biosynthesis of aromatic amino acids in bacteria, fungi, and plants. In the DFT calculations, 10 models, containing up to 230 atoms, were used to investigate different proposals for the reaction mechanism, suggested on the basis of crystal structures and kinetic data. Probing the flexibility of the active site, molecular dynamics simulation reveals that deprotonated Tyr28 can act as the base that catalyzes the first reaction step, the proton abstraction of the pro-S proton at C2 of DHQ, and formation of the enolate intermediate. The computed barrier for the first transition state (TS1), 13-15 kcal/mol, is only slightly affected by the active site model used and is in good agreement with the corresponding experimental barrier of 13.4 kcal/mol for the rate-determining step. The previously proposed enol form of the intermediate is found to be significantly higher in energy than the enolate form and is thus thermodynamically not competitive. In the second and final reaction step, protonation of the hydroxyl group at C1 by His106 followed by water elimination, there is a substantial buildup of dipole moment due to the net transfer of a proton from His106 to Tyr28. A barrier for the second transition state (TS2) that fits well with the corresponding experimental barrier could only be found if the buildup of dipole moment is at least partly compensated during the second reaction step. We speculate that this could be facilitated by regeneration of the Tyr28 anion or by proton transfer to the vicinity of His106 before TS2 is reached. A revised mechanism for type II DHQase is discussed in light of the results of the present calculations.
- Published
- 2015
43. Predicting melting points of organic molecules : applications to aqueous solubility prediction using the General Solubility Equation
- Author
-
John B. O. Mitchell, T. van Mourik, James L. McDonagh, University of St Andrews. School of Chemistry, University of St Andrews. EaSTCHEM, and University of St Andrews. Biomedical Sciences Research Complex
- Subjects
Quantitative structure–activity relationship ,Work (thermodynamics) ,Mean squared error ,NDAS ,Thermodynamics ,Organic molecules ,Machine Learning ,Structural Biology ,Prediction methods ,QSPR ,Freezing ,Drug Discovery ,Aqueous solubility ,Machine learning ,QD ,Solubility ,Chemistry ,Organic Chemistry ,QD Chemistry ,Computer Science Applications ,Models, Chemical ,Melting points ,Melting point ,Molecular Medicine ,Physical chemistry ,Pharmaceuticals ,Databases, Chemical - Abstract
In this work we make predictions of several important molecular properties of academic and industrial importance to seek answers to two questions: 1) Can we apply efficient machine learning techniques, using inexpensive descriptors, to predict melting points to a reasonable level of accuracy? 2) Can values of this level of accuracy be usefully applied to predicting aqueous solubility? We present predictions of melting points made by several novel machine learning models, previously applied to solubility prediction. Additionally, we make predictions of solubility via the General Solubility Equation (GSE) and monitor the impact of varying the logP prediction model (AlogP and XlogP) on the GSE. We note that the machine learning models presented, using a modest number of 2D descriptors, can make melting point predictions in line with the current state of the art prediction methods (RMSE ≥ 40 oC). We also find that predicted melting points, with an RMSE of tens of degrees Celsius, can be usefully applied to the GSE to yield accurate solubility predictions (log10S RMSE < 1) over a small dataset of druglike molecules. Postprint Postprint
- Published
- 2015
44. A Random Forest Model for Predicting Allosteric and Functional Sites on Proteins
- Author
-
John B. O. Mitchell, Ava S.-Y. Chen, Lazaros Mavridis, Nicholas J. Westwood, Graeme Wayne Rogers, Paul Brear, University of St Andrews. School of Chemistry, University of St Andrews. EaSTCHEM, University of St Andrews. Biomedical Sciences Research Complex, and University of St Andrews. School of Biology
- Subjects
0301 basic medicine ,QH301 Biology ,Allosteric regulation ,NDAS ,Computational biology ,Machine Learning ,QH301 ,03 medical and health sciences ,Protein structure ,Structural Biology ,Machine learning ,Drug Discovery ,QD ,Binding site ,Databases, Protein ,Protein cavities ,Random Forest ,Chemistry ,Cheminformatics ,Organic Chemistry ,Computational Biology ,Proteins ,Models, Theoretical ,Allosteric site ,QD Chemistry ,Ligand (biochemistry) ,Data science ,Computer Science Applications ,Random forest ,030104 developmental biology ,Drug Design ,Molecular Medicine ,Algorithms ,Allosteric Site - Abstract
We thank the Scottish Universities Life Sciences Alliance (SULSA) for funding to JBOM and for PB’s PhD studentship under NJW’s supervision. We created a computational method to identify allosteric sites using a machine learning method trained and tested on protein structures containing bound ligand molecules. The Random Forest machine learning approach was adopted to build our three-way predictive model. Based on descriptors collated for each ligand and binding site, the classification model allows us to assign protein cavities as allosteric, regular or orthosteric, and hence to identify allosteric sites. 43 structural descriptors per complex were derived and were used to characterize individual protein-ligand binding sites belonging to the three classes, allosteric, regular and orthosteric. We carried out a separate validation on a further unseen set of protein structures containing the ligand 2-(N-cyclohexylamino) ethane sulfonic acid (CHES). Postprint
- Published
- 2015
45. Verifying the fully 'Laplacianised' posterior Naïve Bayesian approach and more
- Author
-
John B. O. Mitchell, Robert C. Glen, David Marcus, Hamse Y. Mussa, BBSRC, University of St Andrews. School of Chemistry, University of St Andrews. Biomedical Sciences Research Complex, and University of St Andrews. EaSTCHEM
- Subjects
Technology ,Computer science ,Chemistry, Multidisciplinary ,Tapering ,Library and Information Sciences ,Machine learning ,computer.software_genre ,01 natural sciences ,03 medical and health sciences ,Naive Bayes classifier ,Naive Bayes ,Classifier (linguistics) ,Feature (machine learning) ,Physical and Theoretical Chemistry ,Features ,030304 developmental biology ,0303 health sciences ,CHEMOINFORMATICS ,Science & Technology ,Computer Science, Information Systems ,business.industry ,3rd-DAS ,PERFORMANCE ,chEMBL ,Classification ,Class (biology) ,Computer Graphics and Computer-Aided Design ,Naïve Bayes ,0104 chemical sciences ,Computer Science Applications ,Data set ,010404 medicinal & biomolecular chemistry ,Chemistry ,Cheminformatics ,Physical Sciences ,Computer Science ,Computer Science, Interdisciplinary Applications ,Artificial intelligence ,Data mining ,business ,computer ,Research Article - Abstract
Mussa and Glen would like to thank Unilever for financial support, whereas Mussa and Mitchell thank the BBSRC for funding this research through grant BB/I00596X/1. Mitchell thanks the Scottish Universities Life Sciences Alliance (SULSA) for financial support. Background In a recent paper, Mussa, Mitchell and Glen (MMG) have mathematically demonstrated that the “Laplacian Corrected Modified Naïve Bayes” (LCMNB) algorithm can be viewed as a variant of the so-called Standard Naïve Bayes (SNB) scheme, whereby the role played by absence of compound features in classifying/assigning the compound to its appropriate class is ignored. MMG have also proffered guidelines regarding the conditions under which this omission may hold. Utilising three data sets, the present paper examines the validity of these guidelines in practice. The paper also extends MMG’s work and introduces a new version of the SNB classifier: “Tapered Naïve Bayes” (TNB). TNB does not discard the role of absence of a feature out of hand, nor does it fully consider its role. Hence, TNB encapsulates both SNB and LCMNB. Results LCMNB, SNB and TNB performed differently on classifying 4,658, 5,031 and 1,149 ligands (all chosen from the ChEMBL Database) distributed over 31 enzymes, 23 membrane receptors, and one ion-channel, four transporters and one transcription factor as their target proteins. When the number of features utilised was equal to or smaller than the “optimal” number of features for a given data set, SNB classifiers systematically gave better classification results than those yielded by LCMNB classifiers. The opposite was true when the number of features employed was markedly larger than the “optimal” number of features for this data set. Nonetheless, these LCMNB performances were worse than the classification performance achieved by SNB when the “optimal” number of features for the data set was utilised. TNB classifiers systematically outperformed both SNB and LCMNB classifiers. Conclusions The classification results obtained in this study concur with the mathematical based guidelines given in MMG’s paper—that is, ignoring the role of absence of a feature out of hand does not necessarily improve classification performance of the SNB approach; if anything, it could make the performance of the SNB method worse. The results obtained also lend support to the rationale, on which the TNB algorithm rests: handled judiciously, taking into account absence of features can enhance (not impair) the discriminatory classification power of the SNB approach. Publisher PDF
- Published
- 2015
46. Predicting targets of compounds against neurological diseases using cheminformatic methodology
- Author
-
Rona R. Ramsay, John B. O. Mitchell, José Marco-Contelles, Danica Agbaba, Holger Stark, Ilaria Rossi, Katarina Nikolic, Lazaros Mavridis, Oscar M. Bautista-Aguilera, Paola Massarelli, Maria do Carmo Carreiras, University of St Andrews. School of Chemistry, University of St Andrews. Biomedical Sciences Research Complex, University of St Andrews. EaSTCHEM, and University of St Andrews. School of Biology
- Subjects
Histamine N-Methyltransferase ,Databases, Factual ,Quantitative Structure-Activity Relationship ,Drug action ,ChE ,Circular fingerprints ,Histamine H3 receptor ,HMT ,MAO ,Multi-targeted ligands ,Off-target study ,Acetylcholinesterase ,Alzheimer Disease ,Drug Discovery ,Humans ,Ligands ,Monoamine Oxidase ,Nervous System Diseases ,Parkinson Disease ,Receptor, Serotonin, 5-HT2A ,Drug Discovery3003 Pharmaceutical Science ,Computer Science Applications1707 Computer Vision and Pattern Recognition ,Physical and Theoretical Chemistry ,Pharmacology ,R Medicine (General) ,0302 clinical medicine ,5-HT2A ,media_common ,0303 health sciences ,Histamine N-methyltransferase ,Drug discovery ,In vitro toxicology ,QR Microbiology ,chEMBL ,3. Good health ,Computer Science Applications ,Receptor ,Drug ,Quantitative structure–activity relationship ,Serotonin ,media_common.quotation_subject ,Biology ,QA76 ,03 medical and health sciences ,Databases ,QA76 Computer software ,Factual ,030304 developmental biology ,R1 ,QR ,DrugBank ,030217 neurology & neurosurgery - Abstract
Recently developed multi-targeted ligands are novel drug candidates able to interact with monoamine oxidase A and B; acetylcholinesterase and butyrylcholinesterase; or with histamine N-methyltransferase and histamine H3-receptor (H3R). These proteins are drug targets in the treatment of depression, Alzheimer's disease, obsessive disorders, and Parkinson's disease. A probabilistic method, the Parzen-Rosenblatt window approach, was used to build a >predictor> model using data collected from the ChEMBL database. The model can be used to predict both the primary pharmaceutical target and off-targets of a compound based on its structure. Molecular structures were represented based on the circular fingerprint methodology. The same approach was used to build a >predictor> model from the DrugBank dataset to determine the main pharmacological groups of the compound. The study of off-target interactions is now recognised as crucial to the understanding of both drug action and toxicology. Primary pharmaceutical targets and off-targets for the novel multi-target ligands were examined by use of the developed cheminformatic method. Several multi-target ligands were selected for further study, as compounds with possible additional beneficial pharmacological activities. The cheminformatic targets identifications were in agreement with four 3D-QSAR (H3R/D1R/D2R/5-HT2aR) models and by in vitro assays for serotonin 5-HT1a and 5-HT2a receptor binding of the most promising ligand (71/MBA-VEG8).
- Published
- 2015
47. Random Forest Models To Predict Aqueous Solubility
- Author
-
John B. O. Mitchell, Robert C. Glen, David S. Palmer, and Noel M. O'Boyle
- Subjects
Quantitative structure–activity relationship ,Models, Statistical ,Databases, Factual ,Artificial neural network ,Mean squared error ,Computer science ,General Chemical Engineering ,Water ,General Chemistry ,Library and Information Sciences ,computer.software_genre ,Molar solubility ,Regression ,Computer Science Applications ,Random forest ,Support vector machine ,Pharmaceutical Preparations ,Solubility ,Artificial Intelligence ,Test set ,Regression Analysis ,Data mining ,Organic Chemicals ,Biological system ,computer - Abstract
Random Forest regression (RF), Partial-Least-Squares (PLS) regression, Support Vector Machines (SVM), and Artificial Neural Networks (ANN) were used to develop QSPR models for the prediction of aqueous solubility, based on experimental data for 988 organic molecules. The Random Forest regression model predicted aqueous solubility more accurately than those created by PLS, SVM, and ANN and offered methods for automatic descriptor selection, an assessment of descriptor importance, and an in-parallel measure of predictive ability, all of which serve to recommend its use. The prediction of log molar solubility for an external test set of 330 molecules that are solid at 25 degrees C gave an r2 = 0.89 and RMSE = 0.69 log S units. For a standard data set selected from the literature, the model performed well with respect to other documented methods. Finally, the diversity of the training and test sets are compared to the chemical space occupied by molecules in the MDL drug data report, on the basis of molecular descriptors selected by the regression analysis.
- Published
- 2006
48. MACiE (Mechanism, Annotation and Classification in Enzymes): novel tools for searching catalytic mechanisms
- Author
-
Noel M. O'Boyle, John B. O. Mitchell, James Torrance, Janet M. Thornton, Daniel Almonacid, Gemma L. Holliday, Peter Murray-Rust, Gail J. Bartlett, University of St Andrews. School of Chemistry, University of St Andrews. Biomedical Sciences Research Complex, and University of St Andrews. EaSTCHEM
- Subjects
0303 health sciences ,Internet ,Information retrieval ,Sequence Homology, Amino Acid ,Mechanism (biology) ,Protein Conformation ,Articles ,Biology ,Bioinformatics ,QD Chemistry ,Catalysis ,Enzymes ,Database ,03 medical and health sciences ,Annotation ,User-Computer Interface ,0302 clinical medicine ,Sequence homology ,Protein data-bank ,Genetics ,QD ,Databases, Protein ,030217 neurology & neurosurgery ,Software ,030304 developmental biology - Abstract
This work is funded by the UK EPSRC and BBSRC (grant BB/C51320X/1) MACiE (Mechanism, Annotation and Classification in Enzymes) is a database of enzyme reaction mechanisms, and is publicly available as a web-based data resource. This paper presents the first release of a web-based search tool to explore enzyme reaction mechanisms in MACiE. We also present Version 2 of MACiE, which doubles the dataset available (from Version 1). MACiE can be accessed from http://www.ebi.ac.uk/thornton-srv/databases/MACIE/. Publisher PDF
- Published
- 2006
49. Melting Point Prediction Employing k-Nearest Neighbor Algorithms and Genetic Parameter Optimization
- Author
-
Bernd van Buuren, Jos Tissen, John B. O. Mitchell, Andreas Bender, Eduard A. Nigsch, and Florian Nigsch
- Subjects
Protein Folding ,Mean squared error ,Chemistry, Pharmaceutical ,General Chemical Engineering ,Monte Carlo method ,Library and Information Sciences ,Pattern Recognition, Automated ,k-nearest neighbors algorithm ,Set (abstract data type) ,Inverse distance weighting ,Technology, Pharmaceutical ,Mathematics ,Models, Statistical ,Models, Genetic ,Basis (linear algebra) ,Temperature ,General Chemistry ,Computer Science Applications ,Data set ,Models, Chemical ,Drug Design ,Test set ,Thermodynamics ,Neural Networks, Computer ,Monte Carlo Method ,Algorithm ,Algorithms - Abstract
We have applied the k-nearest neighbor (kNN) modeling technique to the prediction of melting points. A data set of 4119 diverse organic molecules (data set 1) and an additional set of 277 drugs (data set 2) were used to compare performance in different regions of chemical space, and we investigated the influence of the number of nearest neighbors using different types of molecular descriptors. To compute the prediction on the basis of the melting temperatures of the nearest neighbors, we used four different methods (arithmetic and geometric average, inverse distance weighting, and exponential weighting), of which the exponential weighting scheme yielded the best results. We assessed our model via a 25-fold Monte Carlo cross-validation (with approximately 30% of the total data as a test set) and optimized it using a genetic algorithm. Predictions for drugs based on drugs (separate training and test sets each taken from data set 2) were found to be considerably better [root-mean-squared error (RMSE)=46.3 degrees C, r2=0.30] than those based on nondrugs (prediction of data set 2 based on the training set from data set 1, RMSE=50.3 degrees C, r2=0.20). The optimized model yields an average RMSE as low as 46.2 degrees C (r2=0.49) for data set 1, and an average RMSE of 42.2 degrees C (r2=0.42) for data set 2. It is shown that the kNN method inherently introduces a systematic error in melting point prediction. Much of the remaining error can be attributed to the lack of information about interactions in the liquid state, which are not well-captured by molecular descriptors.
- Published
- 2006
50. Knowledge Based Potentials: the Reverse Boltzmann Methodology, Virtual Screening and Molecular Weight Dependence
- Author
-
John B. O. Mitchell, James A. Lumley, and Chrysi Konstantinou Kirtay
- Subjects
Virtual screening ,Computer science ,Organic Chemistry ,Binding energy ,Function (mathematics) ,Bond order ,Computer Science Applications ,symbols.namesake ,Range (mathematics) ,Simple function ,Drug Discovery ,Boltzmann constant ,symbols ,Statistical potential ,Algorithm - Abstract
We discuss the rationale for using the reverse Boltzmann methodology to convert atom-atom distance distributions from a knowledge base of protein-ligand complexes into energy-like functions. We also generate an updated version of the BLEEP statistical potential, using a dataset of 196 complexes. This performs similarly to the existing BLEEP. An algorithm is implemented to allow the automatic calculation of bond orders, and hence of the appropriate numbers of hydrogen atoms present. An attempt is made to generate a potential specific to strongly bound complexes; however, we find no evidence that this improves the prediction of binding affinities. We also discuss the range of binding energies available as a function of ligand molecular weight and derive some simple functions describing this behaviour.
- Published
- 2005
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.