1. QSAR-derived affinity fingerprints (part 1): fingerprint construction and modeling performance for similarity searching, bioactivity classification and scaffold hopping
- Author
-
Ctibor Škuta, Igor V. Tetko, Andreas Bender, Pavel Kříž, Daniel Svozil, G. J. P. van Westen, Wim Dehaen, Isidro Cortes-Ciriano, Škuta, C. [0000-0001-5325-4934], Cortés-Ciriano, I. [0000-0002-2036-494X], Dehaen, W. [0000-0002-9597-0629], Kříž, P. [0000-0003-2473-1919], van Westen, G. J. P. [0000-0003-0717-1817], Tetko, I. V. [0000-0002-6855-0012], Bender, A. [0000-0002-6683-7546], Svozil, D. [0000-0003-2577-5163], Apollo - University of Cambridge Repository, Škuta, C [0000-0001-5325-4934], Cortés-Ciriano, I [0000-0002-2036-494X], Dehaen, W [0000-0002-9597-0629], Kříž, P [0000-0003-2473-1919], van Westen, GJP [0000-0003-0717-1817], Tetko, IV [0000-0002-6855-0012], Bender, A [0000-0002-6683-7546], and Svozil, D [0000-0003-2577-5163]
- Subjects
Quantitative structure–activity relationship ,Computer science ,In silico ,Bioactivity modeling ,Library and Information Sciences ,Scaffold hopping ,01 natural sciences ,Biological fingerprint ,lcsh:Chemistry ,03 medical and health sciences ,Similarity (network science) ,Similarity searching ,Research article ,Physical and Theoretical Chemistry ,030304 developmental biology ,0303 health sciences ,lcsh:T58.5-58.64 ,lcsh:Information technology ,business.industry ,QSAR ,Fingerprint (computing) ,Pattern recognition ,chEMBL ,Computer Graphics and Computer-Aided Design ,0104 chemical sciences ,Computer Science Applications ,Random forest ,010404 medicinal & biomolecular chemistry ,lcsh:QD1-999 ,Big Data in Chemistry ,Affinity fingerprint ,Artificial intelligence ,business ,Research Article - Abstract
Funder: FP7 People: Marie-Curie Actions; doi: http://dx.doi.org/10.13039/100011264; Grant(s): 238701, 238701, An affinity fingerprint is the vector consisting of compound’s affinity or potency against the reference panel of protein targets. Here, we present the QAFFP fingerprint, 440 elements long in silico QSAR-based affinity fingerprint, components of which are predicted by Random Forest regression models trained on bioactivity data from the ChEMBL database. Both real-valued (rv-QAFFP) and binary (b-QAFFP) versions of the QAFFP fingerprint were implemented and their performance in similarity searching, biological activity classification and scaffold hopping was assessed and compared to that of the 1024 bits long Morgan2 fingerprint (the RDKit implementation of the ECFP4 fingerprint). In both similarity searching and biological activity classification, the QAFFP fingerprint yields retrieval rates, measured by AUC (~ 0.65 and ~ 0.70 for similarity searching depending on data sets, and ~ 0.85 for classification) and EF5 (~ 4.67 and ~ 5.82 for similarity searching depending on data sets, and ~ 2.10 for classification), comparable to that of the Morgan2 fingerprint (similarity searching AUC of ~ 0.57 and ~ 0.66, and EF5 of ~ 4.09 and ~ 6.41, depending on data sets, classification AUC of ~ 0.87, and EF5 of ~ 2.16). However, the QAFFP fingerprint outperforms the Morgan2 fingerprint in scaffold hopping as it is able to retrieve 1146 out of existing 1749 scaffolds, while the Morgan2 fingerprint reveals only 864 scaffolds.
- Published
- 2020