12 results on '"chEMBL"'
Search Results
2. Analyzing compound activity records and promiscuity degrees in light of publication statistics [version 2; referees: 2 approved]
- Author
-
Ye Hu and Jürgen Bajorath
- Subjects
Research Article ,Articles ,Bioinformatics ,Biomacromolecule-Ligand Interactions ,Macromolecular Chemistry ,ChEMBL ,publications ,bioactivity ,compound ,promiscuity - Abstract
For the generation of contemporary databases of bioactive compounds, activity information is usually extracted from the scientific literature. However, when activity data are analyzed, source publications are typically no longer taken into consideration. Therefore, compound activity data selected from ChEMBL were traced back to thousands of original publications, activity records including compound, assay, and target information were systematically generated, and their distributions across the literature were determined. In addition, publications were categorized on the basis of activity records. Furthermore, compound promiscuity, defined as the ability of small molecules to specifically interact with multiple target proteins, was analyzed in light of publication statistics, thus adding another layer of information to promiscuity assessment. It was shown that the degree of compound promiscuity was not influenced by increasing numbers of source publications. Rather, most non-promiscuous as well as promiscuous compounds, regardless of their degree of promiscuity, originated from single publications, which emerged as a characteristic feature of the medicinal chemistry literature.
- Published
- 2016
- Full Text
- View/download PDF
3. Understanding covariate shift in model performance [version 2; referees: 1 approved, 1 approved with reservations]
- Author
-
Georgia McGaughey, W. Patrick Walters, and Brian Goldman
- Subjects
Research Note ,Articles ,Macromolecular Chemistry ,Theory & Simulation ,covariate shift ,model building ,ChEMBL ,logistic regression ,k-NN - Abstract
Three (3) different methods (logistic regression, covariate shift and k-NN) were applied to five (5) internal datasets and one (1) external, publically available dataset where covariate shift existed. In all cases, k-NN’s performance was inferior to either logistic regression or covariate shift. Surprisingly, there was no obvious advantage for using covariate shift to reweight the training data in the examined datasets.
- Published
- 2016
- Full Text
- View/download PDF
4. Analyzing compound activity records and promiscuity degrees in light of publication statistics [version 1; referees: 2 approved]
- Author
-
Ye Hu and Jürgen Bajorath
- Subjects
Research Article ,Articles ,Bioinformatics ,Biomacromolecule-Ligand Interactions ,Macromolecular Chemistry ,ChEMBL ,publications ,bioactivity ,compound ,promiscuity - Abstract
For the generation of contemporary databases of bioactive compounds, activity information is usually extracted from the scientific literature. However, when activity data are analyzed, source publications are typically no longer taken into consideration. Therefore, compound activity data selected from ChEMBL were traced back to thousands of original publications, activity records including compound, assay, and target information were systematically generated, and their distributions across the literature were determined. In addition, publications were categorized on the basis of activity records. Furthermore, compound promiscuity, defined as the ability of small molecules to specifically interact with multiple target proteins, was analyzed in light of publication statistics, thus adding another layer of information to promiscuity assessment. It was shown that the degree of compound promiscuity was not influenced by increasing numbers of source publications. Rather, most non-promiscuous as well as promiscuous compounds, regardless of their degree of promiscuity, originated from single publications, which emerged as a characteristic feature of the medicinal chemistry literature.
- Published
- 2016
- Full Text
- View/download PDF
5. Understanding covariate shift in model performance [version 1; referees: 2 approved with reservations]
- Author
-
Georgia McGaughey, W. Patrick Walters, and Brian Goldman
- Subjects
Research Note ,Articles ,Macromolecular Chemistry ,Theory & Simulation ,covariate shift ,model building ,ChEMBL ,logistic regression ,k-NN - Abstract
Three (3) different methods (logistic regression, covariate shift and k-NN) were applied to five (5) internal datasets and one (1) external, publically available dataset where covariate shift existed. In all cases, k-NN’s performance was inferior to either logistic regression or covariate shift. Surprisingly, there was no obvious advantage for using covariate shift to reweight the training data in the examined datasets.
- Published
- 2016
- Full Text
- View/download PDF
6. ccbmlib - a Python package for modeling Tanimoto similarity value distributions
- Author
-
Martin Vogt and Jürgen Bajorath
- Subjects
0301 basic medicine ,similarity value distributions ,Databases, Factual ,Nearest neighbor search ,Tanimoto coefficient ,01 natural sciences ,General Biochemistry, Genetics and Molecular Biology ,03 medical and health sciences ,Statistical analysis ,p-value ,General Pharmacology, Toxicology and Pharmaceutics ,Mathematics ,computer.programming_language ,030304 developmental biology ,0303 health sciences ,Bernoulli model ,General Immunology and Microbiology ,business.industry ,Software Tool Article ,Pattern recognition ,Conditional probability distribution ,General Medicine ,Articles ,fingerprints ,Python (programming language) ,chEMBL ,0104 chemical sciences ,010404 medicinal & biomolecular chemistry ,030104 developmental biology ,Reference database ,Artificial intelligence ,business ,computer ,Software - Abstract
The ccbmlib Python package is a collection of modules for modeling similarity value distributions based on Tanimoto coefficients for fingerprints available in RDKit. It can be used to assess the statistical significance of Tanimoto coefficients and evaluate how molecular similarity is reflected when different fingerprint representations are used. Significance measures derived from p-values allow a quantitative comparison of similarity scores obtained from different fingerprint representations that might have very different value ranges. Furthermore, the package models conditional distributions of similarity coefficients for a given reference compound. The conditional significance score estimates where a test compound would be ranked in a similarity search. The models are based on the statistical analysis of feature distributions and feature correlations of fingerprints of a reference database. The resulting models have been evaluated for 11 RDKit fingerprints, taking a collection of ChEMBL compounds as a reference data set. For most fingerprints, highly accurate models were obtained, with differences of 1% or less for Tanimoto coefficients indicating high similarity.
- Published
- 2020
7. Functional group and diversity analysis of BIOFACQUIM: A Mexican natural product database
- Author
-
B. Angélica Pilón-Jiménez, José L. Medina-Franco, and Norberto Sánchez-Cruz
- Subjects
0301 basic medicine ,Databases, Factual ,Computer science ,natural products ,In silico ,computer.software_genre ,01 natural sciences ,General Biochemistry, Genetics and Molecular Biology ,diversity ,chemistry.chemical_compound ,03 medical and health sciences ,functional groups ,Drug Discovery ,General Pharmacology, Toxicology and Pharmaceutics ,Functional group (ecology) ,Mexico ,030304 developmental biology ,Biological Products ,0303 health sciences ,Natural product ,Data curation ,Database ,General Immunology and Microbiology ,Drug discovery ,010405 organic chemistry ,compound databases ,Articles ,Consensus Diversity Plot ,data mining ,General Medicine ,chEMBL ,Chemical space ,0104 chemical sciences ,030104 developmental biology ,chemistry ,in silico ,computer ,Diversity (business) ,Research Article - Abstract
Background: Natural product databases are important in drug discovery and other research areas. Their structural contents and functional group analysis are relevant to increase their knowledge in terms of chemical diversity and chemical space coverage. BIOFACQUIM is an emerging database of natural products characterized and isolated in Mexico. Herein, we discuss the results of a first systematic functional group analysis and global diversity of an updated version of BIOFACQUIM. Methods: BIOFACQUIM was augmented through a literature search and data curation. A structural content analysis of the dataset was done. This involved a functional group analysis with a novel algorithm to identify automatically all functional groups in a molecule and an assessment of the global diversity using consensus diversity plots. To this end, BIOFACQUIM was compared to two major and large databases: ChEMBL 25, and a herein assembled collection of natural products with 169,839 unique compounds. Results: The structural content analysis showed that 16.1% of compounds, 11.3% of scaffolds, and 6.3% of functional groups present in the current version of BIOFACQUIM have not been reported in the other large reference datasets. It also gave a diversity increase in terms of scaffolds and molecular fingerprints regarding the previous version of the dataset, as well as a higher similarity to the assembled collection of natural products than to ChEMBL 25, in terms of diversity and frequent functional groups. Conclusions: A total of 148 natural products were added to BIOFACQUIM, which meant a diversity increase in terms of scaffolds and fingerprints. Regardless of its relatively small size, there are a significant number of compounds, scaffolds, and functional groups that are not present in the reference datasets, showing that curated databases of natural products, such as BIOFACQUIM, can serve as a starting point to increase the biologically relevant chemical space.
- Published
- 2019
8. Analyzing compound activity records and promiscuity degrees in light of publication statistics
- Author
-
Jürgen Bajorath and Ye Hu
- Subjects
0301 basic medicine ,compound ,Bioinformatics ,ChEMBL ,Scientific literature ,Biology ,General Biochemistry, Genetics and Molecular Biology ,03 medical and health sciences ,0302 clinical medicine ,promiscuity ,Statistics ,General Pharmacology, Toxicology and Pharmaceutics ,Biomacromolecule-Ligand Interactions ,publications ,General Immunology and Microbiology ,Macromolecular Chemistry ,General Medicine ,Articles ,chEMBL ,Plant biology ,Multiple target ,030104 developmental biology ,Promiscuity ,bioactivity ,030217 neurology & neurosurgery ,Research Article - Abstract
For the generation of contemporary databases of bioactive compounds, activity information is usually extracted from the scientific literature. However, when activity data are analyzed, source publications are typically no longer taken into consideration. Therefore, compound activity data selected from ChEMBL were traced back to thousands of original publications, activity records including compound, assay, and target information were systematically generated, and their distributions across the literature were determined. In addition, publications were categorized on the basis of activity records. Furthermore, compound promiscuity, defined as the ability of small molecules to specifically interact with multiple target proteins, was analyzed in light of publication statistics, thus adding another layer of information to promiscuity assessment. It was shown that the degree of compound promiscuity was not influenced by increasing numbers of source publications. Rather, most non-promiscuous as well as promiscuous compounds, regardless of their degree of promiscuity, originated from single publications, which emerged as a characteristic feature of the medicinal chemistry literature.
- Published
- 2016
9. Matched molecular pair-based data sets for computer-aided medicinal chemistry
- Author
-
Antonio de la Vega de León, Bijun Zhang, Ye Hu, and Jürgen Bajorath
- Subjects
General Immunology and Microbiology ,Macromolecular Chemistry ,Computer-aided ,General Medicine ,Articles ,Biology ,General Pharmacology, Toxicology and Pharmaceutics ,chEMBL ,Plant biology ,Medicinal chemistry ,General Biochemistry, Genetics and Molecular Biology ,Data Article - Abstract
Matched molecular pairs (MMPs) are widely used in medicinal chemistry to study changes in compound properties including biological activity, which are associated with well-defined structural modifications. Herein we describe up-to-date versions of three MMP-based data sets that have originated from in-house research projects. These data sets include activity cliffs, structure-activity relationship (SAR) transfer series, and second generation MMPs based upon retrosynthetic rules. The data sets have in common that they have been derived from compounds included in the latest release of the ChEMBL database for which high-confidence activity data are available. Thus, the activity data associated with MMP-based activity cliffs, SAR transfer series, and retrosynthetic MMPs cover the entire spectrum of current pharmaceutical targets. Our data sets are made freely available to the scientific community.
- Published
- 2014
10. High quality, small molecule-activity datasets for kinase research
- Author
-
Rajan Sharma, Stephan C. Schürer, and Steven M. Muskal
- Subjects
0301 basic medicine ,Open science ,Computer science ,media_common.quotation_subject ,Computational biology ,Biology ,Data Note ,General Biochemistry, Genetics and Molecular Biology ,03 medical and health sciences ,0302 clinical medicine ,Structure–activity relationship ,Quality (business) ,Kinase activity ,Biomacromolecule-Ligand Interactions ,General Pharmacology, Toxicology and Pharmaceutics ,Drug Discovery & Design ,030304 developmental biology ,media_common ,0303 health sciences ,General Immunology and Microbiology ,Drug discovery ,Kinase ,Macromolecular Chemistry ,Articles ,Kinase, SAR, Bioactivity Database, Dataset, Drug Discovery, Bioactive Molecules, Kinase Knowledgebase, KKB ,General Medicine ,chEMBL ,Small molecule ,3. Good health ,Open data ,030104 developmental biology ,030220 oncology & carcinogenesis ,Neuroscience - Abstract
Kinases regulate cell growth, movement, and death. Deregulated kinase activity is a frequent cause of disease. The therapeutic potential of kinase inhibitors has led to large amounts of published structure activity relationship (SAR) data. Bioactivity databases such as the Kinase Knowledgebase (KKB), WOMBAT, GOSTAR, and ChEMBL provide researchers with quantitative data characterizing the activity of compounds across many biological assays. The KKB, for example, contains over 1.8M kinase structure-activity data points reported in peer-reviewed journals and patents. In the spirit of fostering methods development and validation worldwide, we have extracted and have made available from the KKB 258K structure activity data points and 76K associated unique chemical structures across eight kinase targets. These data are freely available for download within this data note.
- Published
- 2016
11. Analyzing compound activity records and promiscuity degrees in light of publication statistics.
- Author
-
Hu Y and Bajorath J
- Abstract
For the generation of contemporary databases of bioactive compounds, activity information is usually extracted from the scientific literature. However, when activity data are analyzed, source publications are typically no longer taken into consideration. Therefore, compound activity data selected from ChEMBL were traced back to thousands of original publications, activity records including compound, assay, and target information were systematically generated, and their distributions across the literature were determined. In addition, publications were categorized on the basis of activity records. Furthermore, compound promiscuity, defined as the ability of small molecules to specifically interact with multiple target proteins, was analyzed in light of publication statistics, thus adding another layer of information to promiscuity assessment. It was shown that the degree of compound promiscuity was not influenced by increasing numbers of source publications. Rather, most non-promiscuous as well as promiscuous compounds, regardless of their degree of promiscuity, originated from single publications, which emerged as a characteristic feature of the medicinal chemistry literature.
- Published
- 2016
- Full Text
- View/download PDF
12. Understanding covariate shift in model performance.
- Author
-
McGaughey G, Walters WP, and Goldman B
- Abstract
Three (3) different methods (logistic regression, covariate shift and k-NN) were applied to five (5) internal datasets and one (1) external, publically available dataset where covariate shift existed. In all cases, k-NN's performance was inferior to either logistic regression or covariate shift. Surprisingly, there was no obvious advantage for using covariate shift to reweight the training data in the examined datasets., Competing Interests: No competing interests were disclosed.
- Published
- 2016
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.