19 results on '"Daniel S. Murrell"'
Search Results
2. How Do Metabolites Differ from Their Parent Molecules and How Are They Excreted?
- Author
-
Johannes Kirchmair, Andrew Howlett, Julio E. Peironcely, Daniel S. Murrell, Mark J. Williamson, Samuel E. Adams, Thomas Hankemeier, Leo van Buren, Guus Duchateau, Werner Klaffke, and Robert C. Glen
- Published
- 2013
- Full Text
- View/download PDF
3. Applications of proteochemometrics - from species extrapolation to cell line sensitivity modelling.
- Author
-
Isidro Cortes-Ciriano, Gerard J. P. van Westen, Daniel S. Murrell, Eelke B. Lenselink, Andreas Bender 0002, and Therese E. Malliavin
- Published
- 2015
- Full Text
- View/download PDF
4. Prediction of the potency of mammalian cyclooxygenase inhibitors with ensemble proteochemometric modeling.
- Author
-
Isidro Cortes-Ciriano, Daniel S. Murrell, Gerard J. P. van Westen, Andreas Bender 0002, and Thérèse E. Malliavin
- Published
- 2015
- Full Text
- View/download PDF
5. Proteochemometric modeling in a Bayesian framework.
- Author
-
Isidro Cortes-Ciriano, Gerard J. P. van Westen, Eelke B. Lenselink, Daniel S. Murrell, Andreas Bender 0002, and Thérèse E. Malliavin
- Published
- 2014
- Full Text
- View/download PDF
6. Quantifying the shifts in physicochemical property space introduced by the metabolism of small organic molecules.
- Author
-
Johannes Kirchmair, Andrew Howlett, Julio E. Peironcely, Daniel S. Murrell, Mark J. Williamson, Samuel E. Adams, Thomas Hankemeier, Leo van Buren, Guus Duchateau, Werner Klaffke, and Robert C. Glen
- Published
- 2013
- Full Text
- View/download PDF
7. Cancer Cell Line Profiler (CCLP): a webserver for the prediction of compound activity across the NCI60 panel
- Author
-
B Chetrit, Andreas Bender, Daniel S. Murrell, Thérèse E. Malliavin, Pedro J. Ballester, and Isidro Cortes-Ciriano
- Subjects
Set (abstract data type) ,0303 health sciences ,03 medical and health sciences ,Web server ,0302 clinical medicine ,Computer science ,030220 oncology & carcinogenesis ,Data mining ,Cancer cell lines ,computer.software_genre ,computer ,030304 developmental biology - Abstract
SummaryCCLP (Cancer Cell Line Profiler) is a webserver for the prediction of compound activity across the NCI60 panel. CCLP uses a multi-task Random Forest model trained on 941,831 data-points that integrates structural information from 17,142 compounds and multi-omics data sets from 59 cancer cell lines. In addition, CCLP also implements conformal prediction to provide individual prediction errors at several confidence levels. CCLP computes compound descriptors for a set of input molecules and predicts their activity across the NCI60 panel. The output of running CCLP consists of one barplot per input compound displaying the predicted activities and errors across the NCI60 panel, as well as a text file reporting the predicted activities and errors in predictionAvailabilityCCLP is freely available on the web at cclp.marseille.inserm.fr
- Published
- 2017
- Full Text
- View/download PDF
8. How Do Metabolites Differ from Their Parent Molecules and How Are They Excreted?
- Author
-
Andrew Howlett, Daniel S. Murrell, Leo van Buren, Johannes Kirchmair, Mark J. Williamson, Thomas Hankemeier, Robert C. Glen, Guus Duchateau, Julio E. Peironcely, Werner Klaffke, and Samuel E. Adams
- Subjects
Databases, Pharmaceutical ,General Chemical Engineering ,Library and Information Sciences ,Organic molecules ,Small Molecule Libraries ,Feces ,chemistry.chemical_compound ,Biotransformation ,Drug Discovery ,Metabolome ,Bile ,Humans ,Organic chemistry ,Molecule ,Drug discovery ,General Chemistry ,Metabolism ,Computer Science Applications ,Membrane ,Pharmaceutical Preparations ,Biochemistry ,chemistry ,Xenobiotic ,Drugs, Chinese Herbal - Abstract
Understanding which physicochemical properties, or property distributions, are favorable for successful design and development of drugs, nutritional supplements, cosmetics, and agrochemicals is of great importance. In this study we have analyzed molecules from three distinct chemical spaces (i) approved drugs, (ii) human metabolites, and (iii) traditional Chinese medicine (TCM) to investigate four aspects determining the disposition of small organic molecules. First, we examined the physicochemical properties of these three classes of molecules and identified characteristic features resulting from their distinctive biological functions. For example, human metabolites and TCM molecules can be larger and more hydrophobic than drugs, which makes them less likely to cross membranes. We then quantified the shifts in physicochemical property space induced by metabolism from a holistic perspective by analyzing a data set of several thousand experimentally observed metabolic trees. Results show how the metabolic system aims to retain nutrients/micronutrients while facilitating a rapid elimination of xenobiotics. In the third part we compared these global shifts with the contributions made by individual metabolic reactions. For better resolution, all reactions were classified into phase I and phase II biotransformations. Interestingly, not all metabolic reactions lead to more hydrophilic molecules. We were able to identify biotransformations leading to an increase of logP by more than one log unit, which could be used for the design of drugs with enhanced efficacy. The study closes with the analysis of the physicochemical properties of metabolites found in the bile, faeces, and urine. Metabolites in the bile can be large and are often negatively charged. Molecules with molecular weight >500 Da are rarely found in the urine, and most of these large molecules are charged phase II conjugates. © 2013 American Chemical Society.
- Published
- 2013
9. Improving the prediction of organism-level toxicity through integration of chemical, protein target and cytotoxicity qHTS data
- Author
-
Chad H. G. Allen, Alexios Koutsoukas, Thérèse E. Malliavin, Andreas Bender, Robert C. Glen, Isidro Cortes-Ciriano, Daniel S. Murrell, Allen, Chad HG [0000-0001-7289-6529], Apollo - University of Cambridge Repository, The Unilever Centre for Molecular Science Informatics - Department of Chemistry [Cambridge, UK], University of Cambridge [UK] (CAM), Bioinformatique structurale - Structural Bioinformatics, Institut Pasteur [Paris]-Centre National de la Recherche Scientifique (CNRS), Imperial College London, and Institut Pasteur [Paris] (IP)-Centre National de la Recherche Scientifique (CNRS)
- Subjects
SELECTION ,0301 basic medicine ,Health, Toxicology and Mutagenesis ,21ST-CENTURY ,MODELS ,DATABASES ,Toxicology ,ENCODE ,computer.software_genre ,0601 Biochemistry and Cell Biology ,Data type ,Set (abstract data type) ,03 medical and health sciences ,ENDOTHELIN-1 ,Science & Technology ,[SDV.BBM.BS]Life Sciences [q-bio]/Biochemistry, Molecular Biology/Structural Biology [q-bio.BM] ,business.industry ,Data domain ,Pattern recognition ,SCIENCE ,IN-SILICO ,PERFORMANCE ,chEMBL ,Chemical space ,Random forest ,[SDV.BBM.BP]Life Sciences [q-bio]/Biochemistry, Molecular Biology/Biophysics ,Chemistry ,ComputingMethodologies_PATTERNRECOGNITION ,030104 developmental biology ,Generic Health Relevance ,Pairwise comparison ,RISK-ASSESSMENT ,Artificial intelligence ,Data mining ,Patient Safety ,[INFO.INFO-BI]Computer Science [cs]/Bioinformatics [q-bio.QM] ,business ,Life Sciences & Biomedicine ,computer - Abstract
Using three descriptor domains – encoding complementary bioactivity data – enhances the predictive power, applicability, and interpretability of rat acute-toxicity classifiers., Prediction of compound toxicity is essential because covering the vast chemical space requiring safety assessment using traditional experimentally-based, resource-intensive techniques is impossible. However, such prediction is nontrivial due to the complex causal relationship between compound structure and in vivo harm. Protein target annotations and in vitro experimental outcomes encode relevant bioactivity information complementary to chemicals’ structures. This work tests the hypothesis that utilizing three complementary types of data will afford predictive models that outperform traditional models built using fewer data types. A tripartite, heterogeneous descriptor set for 367 compounds was comprised of (a) chemical descriptors, (b) protein target descriptors generated using an algorithm trained on 190 000 ligand–protein interactions from ChEMBL, and (c) descriptors derived from in vitro cell cytotoxicity dose–response data from a panel of human cell lines. 100 random forests classification models for predicting rat LD50 were built using every combination of descriptors. Successive integration of data types improved predictive performance; models built using the full dataset had an average external correct classification rate of 0.82, compared to 0.73–0.80 for models built using two data types and 0.67–0.78 for models built using one. Pairwise comparisons of models trained on the same data showed that including a third data domain on top of chemistry improved average correct classification rate by 1.4–2.4 points, with p-values
- Published
- 2016
10. Impact of Limited Incubation of Bacterial Growth-Positive Cultures and Antibiotic Sensitivity Testing
- Author
-
Daniel S. Murrell and Amanda T. Harrington
- Subjects
Infectious Diseases ,Oncology ,Antibiotic sensitivity ,Biology ,Bacterial growth ,Incubation ,Microbiology - Published
- 2016
11. Clinical Outcomes of a MALDI-TOF-Tied Antimicrobial Stewardship Intervention Compared to MALDI-TOF Reporting Without Intervention: A Quasi-Experimental study
- Author
-
Ellen Stolar, Scott Benken, Maria Merrick, Daniel S. Murrell, Amanda T. Harrington, Alan E. Gross, Maressa Santarossa, Amber E. Williams, and Susan C Bleasdale
- Subjects
medicine.medical_specialty ,Infectious Diseases ,Oncology ,business.industry ,Intervention (counseling) ,Treatment outcome ,Quasi experimental study ,Physical therapy ,Medicine ,Antimicrobial stewardship ,business - Published
- 2016
12. Chemically Aware Model Builder (camb): an R package for property and bioactivity modelling of small molecules
- Author
-
Ian P. Stott, Gerard J. P. van Westen, Daniel S. Murrell, Thérèse E. Malliavin, Robert C. Glen, Isidro Cortes-Ciriano, Andreas Bender, University of Cambridge [UK] (CAM), Bioinformatique structurale - Structural Bioinformatics, Institut Pasteur [Paris] (IP)-Centre National de la Recherche Scientifique (CNRS), European Bioinformatics Institute [Hinxton] (EMBL-EBI), EMBL Heidelberg, Unilever Research Port Sunlight Laboratory Bebington L63 3JW Wirral UK, ICC thanks the Paris-Pasteur International PhD Programme and Institut Pasteur for funding. TM thanks CNRS and Institut Pasteur for funding. DSM and RCG thanks Unilever for funding. GvW thanks EMBL (EIPOD) and Marie Curie (COFUND) for funding. AB thanks Unilever and the European Research Commission (Starting Grant ERC-2013-StG 336159 MIXTURE) for funding., European Project: 336159,EC:FP7:ERC,ERC-2013-StG,MIXTURE(2014), Institut Pasteur [Paris]-Centre National de la Recherche Scientifique (CNRS), Bender, Andreas [0000-0002-6683-7546], Glen, Robert [0000-0003-1759-2914], and Apollo - University of Cambridge Repository
- Subjects
Technology ,Interface (Java) ,Computer science ,Chemistry, Multidisciplinary ,PROTEIN DESCRIPTOR SETS ,computer.software_genre ,01 natural sciences ,Workflow ,"Package" ,Package ,Fingerprint ,QSPR ,0303 health sciences ,Computer Science, Information Systems ,QSAR ,Computer Graphics and Computer-Aided Design ,[SDV.BIBS]Life Sciences [q-bio]/Quantitative Methods [q-bio.QM] ,Computer Science Applications ,Chemistry ,PCM ,Physical Sciences ,Computer Science, Interdisciplinary Applications ,Data mining ,Quantitative structure–activity relationship ,Computation ,"Workflow" ,Feature selection ,Library and Information Sciences ,VALIDATION ,03 medical and health sciences ,ggplot2 ,"PCM" ,Robustness (computer science) ,Learning ,Physical and Theoretical Chemistry ,Representation (mathematics) ,"QSPR" ,030304 developmental biology ,Science & Technology ,"Learning" ,"Ensemble" ,"QSAR" ,0104 chemical sciences ,010404 medicinal & biomolecular chemistry ,Computer Science ,BENCHMARKING ,computer ,Ensemble ,Software - Abstract
Background In silico predictive models have proved to be valuable for the optimisation of compound potency, selectivity and safety profiles in the drug discovery process. Results camb is an R package that provides an environment for the rapid generation of quantitative Structure-Property and Structure-Activity models for small molecules (including QSAR, QSPR, QSAM, PCM) and is aimed at both advanced and beginner R users. camb's capabilities include the standardisation of chemical structure representation, computation of 905 one-dimensional and 14 fingerprint type descriptors for small molecules, 8 types of amino acid descriptors, 13 whole protein sequence descriptors, filtering methods for feature selection, generation of predictive models (using an interface to the R package caret), as well as techniques to create model ensembles using techniques from the R package caretEnsemble). Results can be visualised through high-quality, customisable plots (R package ggplot2). Conclusions Overall, camb constitutes an open-source framework to perform the following steps: (1) compound standardisation, (2) molecular and protein descriptor calculation, (3) descriptor pre-processing and model training, visualisation and validation, and (4) bioactivity/property prediction for new molecules. camb aims to speed model generation, in order to provide reproducibility and tests of robustness. QSPR and proteochemometric case studies are included which demonstrate camb's application.Graphical abstractFrom compounds and data to models: a complete model building workflow in one package. Electronic supplementary material The online version of this article (doi:10.1186/s13321-015-0086-2) contains supplementary material, which is available to authorized users.
- Published
- 2015
13. Applications of proteochemometrics - from species extrapolation to cell line sensitivity modelling
- Author
-
Daniel S. Murrell, Thérèse E. Malliavin, Gerard J. P. van Westen, Andreas Bender, Eelke B. Lenselink, and Isidro Cortes-Ciriano
- Subjects
Computer science ,Cell ,0211 other engineering and technologies ,Context (language use) ,02 engineering and technology ,computer.software_genre ,Bayesian inference ,Biochemistry ,03 medical and health sciences ,symbols.namesake ,Structural Biology ,medicine ,Sensitivity (control systems) ,Representation (mathematics) ,Molecular Biology ,Gaussian process ,030304 developmental biology ,0303 health sciences ,Virtual screening ,021103 operations research ,Ligand ,Applied Mathematics ,Computer Science Applications ,medicine.anatomical_structure ,Meeting Abstract ,symbols ,Data mining ,DNA microarray ,Biological system ,computer ,Applicability domain - Abstract
Proteochemometrics (PCM) is a predictive bioactivity modelling method to simultaneously model the bioactivity of multiple ligands against multiple targets. Therefore, PCM permits to explore the selectivity and promiscuity of ligands on biomolecular systems of different complexity, such proteins or even cell-line models. In practice, each ligand-target interaction is encoded by the concatenation of ligand and target descriptors. These descriptors are then used to train a single machine learning model. This simultaneous inclusion of both chemical and target information enables the extra- and interpolation to predict the bioactivity of compounds on targets, which can be not present in the training set. In this thesis, a methodological advance in the field is firstly introduced, namely how Bayesian inference (Gaussian Processes) can be successfully applied in the context of PCM for (i) the prediction of compounds bioactivity along with the error estimation of the prediction; (ii) the determination of the applicability domain of a PCM model; and (iii) the inclusion of experimental uncertainty of the bioactivity measurements. Additionally, the influence of noise in bioactivity models is benchmarked across a panel of 12 machine learning algorithms, showing that the noise in the input data has a marked and different influence on the predictive power of the considered algorithms. Subsequently, two R packages are presented. The first one, Chemically Aware Model Builder (camb), constitues an open source platform for the generation of predictive bioactivity models. The functionalities of camb include : (i) normalized chemical structure representation, (ii) calculation of 905 one- and two-dimensional physicochemical descriptors, and of 14 fingerprints for small molecules, (iii) 8 types of amino acid descriptors, (iv) 13 whole protein sequence descriptors, and (iv) training, validation and visualization of predictive models. The second package, conformal, permits the calculation of confidence intervals for individual predictions in the case of regression, and P values for classification settings. The usefulness of PCM to concomitantly optimize compounds selectivity and potency is subsequently illustrated in the context of two application scenarios, which are: (a) modelling isoform-selective cyclooxygenase inhibition; and (b) large-scale cancer cell-line drug sensitivity prediction, where the predictive signal of several cell-line profiling data is benchmarked (among others): basal gene expression, gene copy-number variation, exome sequencing, and protein abundance data. Overall, the application of PCM in these two case scenarios let us conclude that PCM is a suitable technique to model the activity of ligands exhibiting uncorrelated bioactivity profiles across a panel of targets, which can range from protein binding sites (a), to cancer cell-lines (b).
- Published
- 2015
14. Protective immunogenicity of group A streptococcal M-related proteins
- Author
-
Thomas A. Penfound, Harry S. Courtney, Daniel S. Murrell, Tina Agbaosi, Claudia M. Hohn, Shannon E. Niedermeyer, James B. Dale, Nicholas D. Hysmith, Matthew F Pullen, Lori E. Shenep, and Michael I. Bright
- Subjects
Microbiology (medical) ,Male ,Adolescent ,Streptococcus pyogenes ,Clinical Biochemistry ,Immunology ,Virulence ,Enzyme-Linked Immunosorbent Assay ,medicine.disease_cause ,Epitope ,law.invention ,Mice ,Antigen ,stomatognathic system ,law ,Streptococcal Infections ,medicine ,Immunology and Allergy ,Animals ,Humans ,Amino Acid Sequence ,Child ,Phylogeny ,Antiserum ,Antigens, Bacterial ,Vaccines ,biology ,Immunogenicity ,Immune Sera ,Age Factors ,Immunization, Passive ,Molecular biology ,Antibodies, Bacterial ,Recombinant Proteins ,Child, Preschool ,biology.protein ,Recombinant DNA ,Female ,Rabbits ,Antibody ,Sequence Alignment ,Bacterial Outer Membrane Proteins - Abstract
Many previous studies have focused on the surface M proteins of group A streptococci (GAS) as virulence determinants and protective antigens. However, the majority of GAS isolates express M-related protein (Mrp) in addition to M protein, and both have been shown to be required for optimal virulence. In the current study, we evaluated the protective immunogenicity of Mrp to determine its potential as a vaccine component that may broaden the coverage of M protein-based vaccines. Sequence analyses of 33mrpgenes indicated that there are three families of structurally related Mrps (MrpI, MrpII, and MrpIII). N-terminal peptides of Mrps were cloned, expressed, and purified from M type 2 (M2) (MrpI), M4 (MrpII), and M49 (MrpIII) GAS. Rabbit antisera against the Mrps reacted at high titers with the homologous Mrp, as determined by enzyme-linked immunosorbent assay, and promoted bactericidal activity against GASemmtypes expressing Mrps within the same family. Mice passively immunized with rabbit antisera against MrpII were protected against challenge infections with M28 GAS. Assays for Mrp antibodies in serum samples from 281 pediatric subjects aged 2 to 16 indicated that the Mrp immune response correlated with increasing age of the subjects. Affinity-purified human Mrp antibodies promoted bactericidal activity against a number of GAS representing differentemmtypes that expressed an Mrp within the same family but showed no activity againstemmtypes expressing an Mrp from a different family. Our results indicate that Mrps have semiconserved N-terminal sequences that contain bactericidal epitopes which are immunogenic in humans. These findings may have direct implications for the development of GAS vaccines.
- Published
- 2015
15. Prediction of the potency of mammalian cyclooxygenase inhibitors with ensemble proteochemometric modeling
- Author
-
Thérèse E. Malliavin, Gerard J. P. van Westen, Isidro Cortes-Ciriano, Andreas Bender, Daniel S. Murrell, Bioinformatique structurale - Structural Bioinformatics, Institut Pasteur [Paris]-Centre National de la Recherche Scientifique (CNRS), The Unilever Centre for Molecular Science Informatics - Department of Chemistry [Cambridge, UK], University of Cambridge [UK] (CAM), ChEMBL Group, European Molecular Biology Laboratory European Bioinformatics Institute, ICC thanks the Paris-Pasteur International PhD Programme for funding. ICC and TM thank CNRS and Institut Pasteur for funding. DM thanks Unilever for funding. GvW thanks EMBL (EIPOD) and Marie Curie (COFUND) for funding. AB thanks Unilever and the European Research Commission (Starting Grant ERC-2013-StG 336159 MIXTURE) for funding. The authors acknowledge the three anonymous reviewers that contributed to improve the manuscript with their constructive suggestions and comments., European Project: 336159,EC:FP7:ERC,ERC-2013-StG,MIXTURE(2014), and Institut Pasteur [Paris] (IP)-Centre National de la Recherche Scientifique (CNRS)
- Subjects
Quantitative structure–activity relationship ,Stereochemistry ,[SDV]Life Sciences [q-bio] ,Library and Information Sciences ,Bioinformatics ,03 medical and health sciences ,0302 clinical medicine ,Applicability domain ,Potency ,Ensemble modeling ,Physical and Theoretical Chemistry ,030304 developmental biology ,0303 health sciences ,biology ,Chemistry ,QSAR ,Computer Graphics and Computer-Aided Design ,[SDV.BIBS]Life Sciences [q-bio]/Quantitative Methods [q-bio.QM] ,Cyclooxygenases ,3. Good health ,Computer Science Applications ,030220 oncology & carcinogenesis ,biology.protein ,Chemogenomics ,Cyclooxygenase ,Proteochemometrics ,Research Article - Abstract
Cyclooxygenases (COX) are present in the body in two isoforms, namely: COX-1, constitutively expressed, and COX-2, induced in physiopathological conditions such as cancer or chronic inflammation. The inhibition of COX with non-steroideal anti-inflammatory drugs (NSAIDs) is the most widely used treatment for chronic inflammation despite the adverse effects associated to prolonged NSAIDs intake. Although selective COX-2 inhibition has been shown not to palliate all adverse effects (e.g. cardiotoxicity), there are still niche populations which can benefit from selective COX-2 inhibition. Thus, capitalizing on bioactivity data from both isoforms simultaneously would contribute to develop COX inhibitors with better safety profiles. We applied ensemble proteochemometric modeling (PCM) for the prediction of the potency of 3,228 distinct COX inhibitors on 11 mammalian cyclooxygenases. Ensemble PCM models ( $R_{0\ test}^{2}=0.65$ R 0 test 2 = 0.65 , and RMSEtest = 0.71) outperformed models exclusively trained on compound ( $R_{0\ test}^{2}=0.17$ R 0 test 2 = 0.17 , and RMSEtest = 1.09) or protein descriptors ( $R_{0\ test}^{2}=0.16$ R 0 test 2 = 0.16 and RMSEtest = 1.10) on the test set. Moreover, PCM predicted COX potency for 1,086 selective and non-selective COX inhibitors with $R_{0\ test}^{2}=0.59$ R 0 test 2 = 0.59 and RMSEtest = 0.76. These values are in agreement with the maximum and minimum achievable $R_{0\ test}^{2}$ R 0 test 2 and RMSEtest values of approximately 0.68 for both metrics. Confidence intervals for individual predictions were calculated from the standard deviation of the predictions from the individual models composing the ensembles. Finally, two substructure analysis pipelines singled out chemical substructures implicated in both potency and selectivity in agreement with the literature.
- Published
- 2015
16. Proteochemometric modeling in a Bayesian framework
- Author
-
Daniel S. Murrell, Eelke B. Lenselink, Thérèse E. Malliavin, Gerard J. P. van Westen, Isidro Cortes-Ciriano, Andreas Bender, Bioinformatique structurale - Structural Bioinformatics, Institut Pasteur [Paris] (IP)-Centre National de la Recherche Scientifique (CNRS), ChEMBL Group, European Molecular Biology Laboratory European Bioinformatics Institute, Division of Medicinal Chemistry, Leiden Academic Center for Drug Research, The Unilever Centre for Molecular Science Informatics - Department of Chemistry [Cambridge, UK], University of Cambridge [UK] (CAM), ICC thanks the Paris-Pasteur International PhD Programme for funding. GvW thanks EMBL (EIPOD) and Marie Curie (COFUND) for funding. ICC and TM thanks CNRS, Institut Pasteur and ANR bipbip for funding. EBL thanks the Dutch Research Council (NWO) for financial support (NWO-TOP #714.011.001). AB thanks Unilever and the European Research Commission (Starting Grant ERC-2013-StG 336159 MIXTURE) for funding., European Project: 336159 ,MIXTURE, Institut Pasteur [Paris]-Centre National de la Recherche Scientifique (CNRS), Bioinformatique structurale, Institut Pasteur [Paris] - Centre National de la Recherche Scientifique (CNRS), Unilever Centre for Molecular Science Informatics, University of Cambridge (UK), and Paris-Pasteur International PhD Programme. EMBL (EIPOD) and Marie Curie (COFUND). CNRS, Institut Pasteur and ANR bipbip. Dutch Research Council (NWO) (NWO-TOP #714.011.001). Unilever and the European Research Commission (Starting Grant ERC-2013-StG 336159 MIXTURE)
- Subjects
Computer science ,Adenosine Receptors ,Library and Information Sciences ,computer.software_genre ,Bayesian inference ,01 natural sciences ,GPCRs ,03 medical and health sciences ,chemistry.chemical_compound ,symbols.namesake ,Chemogenomics ,Physical and Theoretical Chemistry ,Gaussian process ,030304 developmental biology ,[INFO.INFO-BI] Computer Science [cs]/Bioinformatics [q-bio.QM] ,0303 health sciences ,[SDV.BIBS] Life Sciences [q-bio]/Quantitative Methods [q-bio.QM] ,[SDV.BBM.BS]Life Sciences [q-bio]/Biochemistry, Molecular Biology/Structural Biology [q-bio.BM] ,Statistical model ,Applicability Domain ,Computer Graphics and Computer-Aided Design ,0104 chemical sciences ,Computer Science Applications ,[SDV.BBM.BP]Life Sciences [q-bio]/Biochemistry, Molecular Biology/Biophysics ,010404 medicinal & biomolecular chemistry ,chemistry ,symbols ,Bayesian Inference ,Bayesian framework ,Data mining ,Gaussian Process ,[INFO.INFO-BI]Computer Science [cs]/Bioinformatics [q-bio.QM] ,computer ,Proteochemometrics ,Research Article ,Applicability domain - Abstract
Proteochemometrics (PCM) is an approach for bioactivity predictive modeling which models the relationship between protein and chemical information. Gaussian Processes (GP), based on Bayesian inference, provide the most objective estimation of the uncertainty of the predictions, thus permitting the evaluation of the applicability domain (AD) of the model. Furthermore, the experimental error on bioactivity measurements can be used as input for this probabilistic model. In this study, we apply GP implemented with a panel of kernels on three various (and multispecies) PCM datasets. The first dataset consisted of information from 8 human and rat adenosine receptors with 10,999 small molecule ligands and their binding affinity. The second consisted of the catalytic activity of four dengue virus NS3 proteases on 56 small peptides. Finally, we have gathered bioactivity information of small molecule ligands on 91 aminergic GPCRs from 9 different species, leading to a dataset of 24,593 datapoints with a matrix completeness of only 2.43%. GP models trained on these datasets are statistically sound, at the same level of statistical significance as Support Vector Machines (SVM), with R 0 2 values on the external dataset ranging from 0.68 to 0.92, and RMSEP values close to the experimental error. Furthermore, the best GP models obtained with the normalized polynomial and radial kernels provide intervals of confidence for the predictions in agreement with the cumulative Gaussian distribution. GP models were also interpreted on the basis of individual targets and of ligand descriptors. In the dengue dataset, the model interpretation in terms of the amino-acid positions in the tetra-peptide ligands gave biologically meaningful results.
- Published
- 2014
17. R 2 -equitability is satisfiable
- Author
-
Daniel S. Murrell, Ben Murrell, and Hugh Murrell
- Subjects
Discrete mathematics ,Multidisciplinary ,Functional Relationship ,Joint probability distribution ,Function (mathematics) ,Random variable ,Maximal information coefficient ,Measure (mathematics) ,Mathematics - Abstract
Kinney and Atwal (1) make excellent points about mutual information, the maximal information coefficient (2, 3), and “equitability.” One of their central claims, however, is that, “No nontrivial dependence measure can satisfy R 2 R2-equitability.” We argue that this is the result of a poorly constructed definition, which we quote:“A dependence measure D [ X ; Y ] D[X;Y] is R 2 R2-equitable if and only if, when evaluated on a joint probability distribution p ( X , Y ) p(X,Y) that corresponds to a noisy functional relationship between two real random variables X and Y, the following relation holds: D [ X ; Y ] = g ( R 2 [ f ( X ) ; Y ] ) . D[X;Y]=g(R2[f(X);Y]).Here, g is a function that does not depend on p ( X , Y ) p(X,Y) and f is the function defining the noisy functional relationship, i.e., Y = f ( X ) + η … Y=f(X)+η …
- Published
- 2014
18. Discovering general multidimensional associations
- Author
-
Ben Murrell, Daniel S. Murrell, and Hugh Murrell
- Subjects
0301 basic medicine ,FOS: Computer and information sciences ,Statistical Noise ,Coefficient of determination ,Distribution Curves ,Correlation coefficient ,Normal Distribution ,lcsh:Medicine ,01 natural sciences ,Statistics - Applications ,Normal distribution ,010104 statistics & probability ,03 medical and health sciences ,Total variation ,Ocular System ,Statistics ,Medicine and Health Sciences ,Applications (stat.AP) ,0101 mathematics ,lcsh:Science ,Mathematics ,Multidisciplinary ,Approximation Methods ,lcsh:R ,Linear model ,Biology and Life Sciences ,Explained variation ,Probability Theory ,Probability Distribution ,Gaussian Noise ,Probability Density ,030104 developmental biology ,Sample size determination ,Physical Sciences ,Multivariate Analysis ,Linear Models ,Eyes ,lcsh:Q ,Anatomy ,Maximal information coefficient ,Head ,Statistics (Mathematics) ,Research Article ,Statistical Distributions - Abstract
When two variables are related by a known function, the coefficient of determination (denoted $R^2$) measures the proportion of the total variance in the observations that is explained by that function. This quantifies the strength of the relationship between variables by describing what proportion of the variance is signal as opposed to noise. For linear relationships, this is equal to the square of the correlation coefficient, $\rho$. When the parametric form of the relationship is unknown, however, it is unclear how to estimate the proportion of explained variance equitably - assigning similar values to equally noisy relationships. Here we demonstrate how to directly estimate a generalized $R^2$ when the form of the relationship is unknown, and we question the performance of the Maximal Information Coefficient (MIC) - a recently proposed information theoretic measure of dependence. We show that our approach behaves equitably, has more power than MIC to detect association between variables, and converges faster with increasing sample size. Most importantly, our approach generalizes to higher dimensions, which allows us to estimate the strength of multivariate relationships ($Y$ against $A,B, ...$) and to measure association while controlling for covariates ($Y$ against $X$ controlling for $C$)., Comment: 8 pages. 4 figures. Supporting information can be found at http://www.cs.sun.ac.za/~bmurrell/Murrell_Matie_SI.pdf
- Published
- 2013
19. Quantifying the shifts in physicochemical property space introduced by the metabolism of small organic molecules
- Author
-
Daniel S. Murrell, Andrew Howlett, Thomas Hankemeier, Samuel E. Adams, Werner Klaffke, Robert C. Glen, Guus Duchateau, Leo van Buren, Johannes Kirchmair, Julio E. Peironcely, Mark J. Williamson, Williamson, Mark [0000-0002-5295-7811], Glen, Robert [0000-0003-1759-2914], and Apollo - University of Cambridge Repository
- Subjects
Chemical descriptors ,34 Chemical Sciences ,Chemistry ,Computational biology ,Metabolism ,Library and Information Sciences ,Pharmacology ,Drug molecule ,Computer Graphics and Computer-Aided Design ,Approved drug ,Computer Science Applications ,Organic molecules ,Individual analysis ,Oral Presentation ,3404 Medicinal and Biomolecular Chemistry ,Physical and Theoretical Chemistry ,ADME - Abstract
Understanding the metabolic fate of small organic molecules is of fundamental importance to the successful design and development of drugs, nutritional supplements, cosmetics and agrochemicals [1,2]. In the current study we investigated how the products of metabolism differ from their parent molecules by analysing a large dataset of experimentally determined metabolic transformations (Figure (Figure1).1). This dataset was split into three specific chemical domains representing approved drug molecules, human metabolites and molecules from traditional Chinese medicines to allow individual analysis. We also quantified the impact of individual Phase I and Phase II metabolic reactions on calculated chemical descriptors using MetaPrint2D [3] and suggest new approaches to utilise metabolism for the design of drugs and cosmetics. The last section of this study investigates the properties of metabolites found in the bile, faeces and urine and analyses their commonalities and differences. Figure 1 Four important questions pertinent to the design and development of new molecules with favourable ADME properties addressed in this work. d, approved drugs; h, human metabolites; t, molecules from traditional Chinese medicines; MW, molecular weight.
- Published
- 2013
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.