Back to Search
Start Over
ccbmlib - a Python package for modeling Tanimoto similarity value distributions
- Source :
- F1000Research
- Publication Year :
- 2020
-
Abstract
- The ccbmlib Python package is a collection of modules for modeling similarity value distributions based on Tanimoto coefficients for fingerprints available in RDKit. It can be used to assess the statistical significance of Tanimoto coefficients and evaluate how molecular similarity is reflected when different fingerprint representations are used. Significance measures derived from p-values allow a quantitative comparison of similarity scores obtained from different fingerprint representations that might have very different value ranges. Furthermore, the package models conditional distributions of similarity coefficients for a given reference compound. The conditional significance score estimates where a test compound would be ranked in a similarity search. The models are based on the statistical analysis of feature distributions and feature correlations of fingerprints of a reference database. The resulting models have been evaluated for 11 RDKit fingerprints, taking a collection of ChEMBL compounds as a reference data set. For most fingerprints, highly accurate models were obtained, with differences of 1% or less for Tanimoto coefficients indicating high similarity.
- Subjects :
- 0301 basic medicine
similarity value distributions
Databases, Factual
Nearest neighbor search
Tanimoto coefficient
01 natural sciences
General Biochemistry, Genetics and Molecular Biology
03 medical and health sciences
Statistical analysis
p-value
General Pharmacology, Toxicology and Pharmaceutics
Mathematics
computer.programming_language
030304 developmental biology
0303 health sciences
Bernoulli model
General Immunology and Microbiology
business.industry
Software Tool Article
Pattern recognition
Conditional probability distribution
General Medicine
Articles
fingerprints
Python (programming language)
chEMBL
0104 chemical sciences
010404 medicinal & biomolecular chemistry
030104 developmental biology
Reference database
Artificial intelligence
business
computer
Software
Subjects
Details
- ISSN :
- 20461402
- Volume :
- 9
- Database :
- OpenAIRE
- Journal :
- F1000Research
- Accession number :
- edsair.doi.dedup.....044b08e948112b08c212a9302e960fb1