Start Over

Proteochemometric modeling in a Bayesian framework

Authors :: Daniel S. Murrell
Eelke B. Lenselink
Thérèse E. Malliavin
Gerard J. P. van Westen
Isidro Cortes-Ciriano
Andreas Bender
Bioinformatique structurale - Structural Bioinformatics
Institut Pasteur [Paris] (IP)-Centre National de la Recherche Scientifique (CNRS)
ChEMBL Group
European Molecular Biology Laboratory European Bioinformatics Institute
Division of Medicinal Chemistry
Leiden Academic Center for Drug Research
The Unilever Centre for Molecular Science Informatics - Department of Chemistry [Cambridge, UK]
University of Cambridge [UK] (CAM)
ICC thanks the Paris-Pasteur International PhD Programme for funding. GvW thanks EMBL (EIPOD) and Marie Curie (COFUND) for funding. ICC and TM thanks CNRS, Institut Pasteur and ANR bipbip for funding. EBL thanks the Dutch Research Council (NWO) for financial support (NWO-TOP #714.011.001). AB thanks Unilever and the European Research Commission (Starting Grant ERC-2013-StG 336159 MIXTURE) for funding.
European Project: 336159 ,MIXTURE
Institut Pasteur [Paris]-Centre National de la Recherche Scientifique (CNRS)
Bioinformatique structurale
Institut Pasteur [Paris] - Centre National de la Recherche Scientifique (CNRS)
Unilever Centre for Molecular Science Informatics
University of Cambridge (UK)
Paris-Pasteur International PhD Programme. EMBL (EIPOD) and Marie Curie (COFUND). CNRS, Institut Pasteur and ANR bipbip. Dutch Research Council (NWO) (NWO-TOP #714.011.001). Unilever and the European Research Commission (Starting Grant ERC-2013-StG 336159 MIXTURE)
Source :: Journal of Cheminformatics, Journal of Cheminformatics, 2014, 6, pp.35. ⟨10.1186/1758-2946-6-35⟩, Journal of Cheminformatics, Chemistry Central Ltd. and BioMed Central, 2014, 6, pp.35. ⟨10.1186/1758-2946-6-35⟩, Journal of Cheminformatics, 2014, 6 (1), pp.35, Journal of Cheminformatics, 6, 35
Publication Year :: 2014
Publisher :: HAL CCSD, 2014.
Abstract: Proteochemometrics (PCM) is an approach for bioactivity predictive modeling which models the relationship between protein and chemical information. Gaussian Processes (GP), based on Bayesian inference, provide the most objective estimation of the uncertainty of the predictions, thus permitting the evaluation of the applicability domain (AD) of the model. Furthermore, the experimental error on bioactivity measurements can be used as input for this probabilistic model. In this study, we apply GP implemented with a panel of kernels on three various (and multispecies) PCM datasets. The first dataset consisted of information from 8 human and rat adenosine receptors with 10,999 small molecule ligands and their binding affinity. The second consisted of the catalytic activity of four dengue virus NS3 proteases on 56 small peptides. Finally, we have gathered bioactivity information of small molecule ligands on 91 aminergic GPCRs from 9 different species, leading to a dataset of 24,593 datapoints with a matrix completeness of only 2.43%. GP models trained on these datasets are statistically sound, at the same level of statistical significance as Support Vector Machines (SVM), with R 0 2 values on the external dataset ranging from 0.68 to 0.92, and RMSEP values close to the experimental error. Furthermore, the best GP models obtained with the normalized polynomial and radial kernels provide intervals of confidence for the predictions in agreement with the cumulative Gaussian distribution. GP models were also interpreted on the basis of individual targets and of ligand descriptors. In the dengue dataset, the model interpretation in terms of the amino-acid positions in the tetra-peptide ligands gave biologically meaningful results.

Subjects :: Computer science
Adenosine Receptors
Library and Information Sciences
computer.software_genre
Bayesian inference
01 natural sciences
GPCRs
03 medical and health sciences
chemistry.chemical_compound
symbols.namesake
Chemogenomics
Physical and Theoretical Chemistry
Gaussian process
030304 developmental biology
[INFO.INFO-BI] Computer Science [cs]/Bioinformatics [q-bio.QM]
0303 health sciences
[SDV.BIBS] Life Sciences [q-bio]/Quantitative Methods [q-bio.QM]
[SDV.BBM.BS]Life Sciences [q-bio]/Biochemistry, Molecular Biology/Structural Biology [q-bio.BM]
Statistical model
Applicability Domain
Computer Graphics and Computer-Aided Design
0104 chemical sciences
Computer Science Applications
[SDV.BBM.BP]Life Sciences [q-bio]/Biochemistry, Molecular Biology/Biophysics
010404 medicinal & biomolecular chemistry
chemistry
symbols
Bayesian Inference
Bayesian framework
Data mining
Gaussian Process
[INFO.INFO-BI]Computer Science [cs]/Bioinformatics [q-bio.QM]
computer
Proteochemometrics
Research Article
Applicability domain

Details

Language :: English
ISSN :: 17582946
Database :: OpenAIRE
Journal :: Journal of Cheminformatics, Journal of Cheminformatics, 2014, 6, pp.35. ⟨10.1186/1758-2946-6-35⟩, Journal of Cheminformatics, Chemistry Central Ltd. and BioMed Central, 2014, 6, pp.35. ⟨10.1186/1758-2946-6-35⟩, Journal of Cheminformatics, 2014, 6 (1), pp.35, Journal of Cheminformatics, 6, 35
Accession number :: edsair.doi.dedup.....0f846f36df4d44f8f56d9987c3f7c35f

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Proteochemometric modeling in a Bayesian framework

Abstract

Subjects

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Proteochemometric modeling in a Bayesian framework

Abstract

Subjects

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources