161 results on '"Tetko, IV"'
Search Results
2. Calculation of molecular lipophilicity: state of the art and comparison of methods on more than 96000 compounds
- Author
-
Tetko IV, Ostermann C, Poda GI, and Mannhold M
- Subjects
Chemistry ,QD1-999 - Published
- 2009
- Full Text
- View/download PDF
3. Calculation of molecular lipophilicity: state of the art and comparison of methods on more than 96000 compounds
- Author
-
Mannhold, M, Poda, GI, Ostermann, C, and Tetko, IV
- Published
- 2009
- Full Text
- View/download PDF
4. CERAPP: Collaborative estrogen receptor activity prediction project
- Author
-
Mansouri, K, Abdelaziz, A, Rybacka, A, Roncaglioni, A, Tropsha, A, Varnek, A, Zakharov, A, Worth, A, Richard, A, Grulke, C, Trisciuzzi, D, Fourches, D, Horvath, D, Benfenati, E, Muratov, E, Wedebye, E, Grisoni, F, Mangiatordi, G, Incisivo, G, Hong, H, Ng, H, Tetko, I, Balabin, I, Kancherla, J, Shen, J, Burton, J, Nicklaus, M, Cassotti, M, Nikolov, N, Nicolotti, O, Andersson, P, Zang, Q, Politi, R, Beger, R, Todeschini, R, Huang, R, Farag, S, Rosenberg, S, Slavov, S, Hu, X, Judson, R, Richard, AM, Grulke, CM, Wedebye, EB, Mangiatordi, GF, Incisivo, GM, Ng, HW, Tetko, IV, Nikolov, NG, Andersson, PL, Beger, RD, Rosenberg, SA, Judson, RS, Mansouri, K, Abdelaziz, A, Rybacka, A, Roncaglioni, A, Tropsha, A, Varnek, A, Zakharov, A, Worth, A, Richard, A, Grulke, C, Trisciuzzi, D, Fourches, D, Horvath, D, Benfenati, E, Muratov, E, Wedebye, E, Grisoni, F, Mangiatordi, G, Incisivo, G, Hong, H, Ng, H, Tetko, I, Balabin, I, Kancherla, J, Shen, J, Burton, J, Nicklaus, M, Cassotti, M, Nikolov, N, Nicolotti, O, Andersson, P, Zang, Q, Politi, R, Beger, R, Todeschini, R, Huang, R, Farag, S, Rosenberg, S, Slavov, S, Hu, X, Judson, R, Richard, AM, Grulke, CM, Wedebye, EB, Mangiatordi, GF, Incisivo, GM, Ng, HW, Tetko, IV, Nikolov, NG, Andersson, PL, Beger, RD, Rosenberg, SA, and Judson, RS
- Abstract
Background: Humans are exposed to thousands of man-made chemicals in the environment. Some chemicals mimic natural endocrine hormones and, thus, have the potential to be endocrine disruptors. Most of these chemicals have never been tested for their ability to interact with the estrogen receptor (ER). Risk assessors need tools to prioritize chemicals for evaluation in costly in vivo tests, for instance, within the U.S. EPA Endocrine Disruptor Screening Program. oBjectives: We describe a large-scale modeling project called CERAPP (Collaborative Estrogen Receptor Activity Prediction Project) and demonstrate the efficacy of using predictive computational models trained on high-throughput screening data to evaluate thousands of chemicals for ER-related activity and prioritize them for further testing. Methods: CERAPP combined multiple models developed in collaboration with 17 groups in the United States and Europe to predict ER activity of a common set of 32,464 chemical structures. Quantitative structure-activity relationship models and docking approaches were employed, mostly using a common training set of 1,677 chemical structures provided by the U.S. EPA, to build a total of 40 categorical and 8 continuous models for binding, agonist, and antagonist ER activity. All predictions were evaluated on a set of 7,522 chemicals curated from the literature. To overcome the limitations of single models, a consensus was built by weighting models on scores based on their evaluated accuracies. results: Individual model scores ranged from 0.69 to 0.85, showing high prediction reliabilities. Out of the 32,464 chemicals, the consensus model predicted 4,001 chemicals (12.3%) as high priority actives and 6,742 potential actives (20.8%) to be considered for further testing. conclusion: This project demonstrated the possibility to screen large libraries of chemicals using a consensus of different in silico approaches. This concept will be applied in future projects related to other end
- Published
- 2016
5. Experimental and Theoretical Studies in the EU FP7 Marie Curie Initial Training Network Project, Environmental ChemOinformatics (ECO)
- Author
-
Tetko, I, Schramm, K, Knepper, T, Peijnenburg, W, Hendriks, A, Nicholls, I, Oberg, T, Todeschini, R, Schlosser, E, Barndmaier, S, Tetko, IV, Schramm, KW, Knepper,T, Peijnenburg, WJGM, Hendriks, AJ, Nicholls, IA, Oberg,T, Tetko, I, Schramm, K, Knepper, T, Peijnenburg, W, Hendriks, A, Nicholls, I, Oberg, T, Todeschini, R, Schlosser, E, Barndmaier, S, Tetko, IV, Schramm, KW, Knepper,T, Peijnenburg, WJGM, Hendriks, AJ, Nicholls, IA, and Oberg,T
- Published
- 2014
6. Applicability Domains for Classification Problems: Benchmarking of Distance to Models for Ames Mutagenicity Set
- Author
-
Sushko, I, Novotarskyi, S, Körner, R, Pandey, A, Cherkasov, A, Li, J, Gramatica, P, Hansen, K, Schroeter, T, Müller, K, Xi, L, Liu, H, Yao, X, Öberg, T, Hormozdiari, F, Dao, P, Sahinalp, C, Todeschini, R, Polishchuk, P, Artemenko, A, Kuz'Min, V, Martin, T, Young, D, Fourches, D, Tropsha, A, Baskin, I, Horbath, D, Marcou, G, Varnek, A, Prokopenko, V, Tetko, I, Pandey, AK, Müller, KR, Kuz'min, V, Martin, TM, Young, DM, Prokopenko, VV, Tetko, IV, TODESCHINI, ROBERTO, Sushko, I, Novotarskyi, S, Körner, R, Pandey, A, Cherkasov, A, Li, J, Gramatica, P, Hansen, K, Schroeter, T, Müller, K, Xi, L, Liu, H, Yao, X, Öberg, T, Hormozdiari, F, Dao, P, Sahinalp, C, Todeschini, R, Polishchuk, P, Artemenko, A, Kuz'Min, V, Martin, T, Young, D, Fourches, D, Tropsha, A, Baskin, I, Horbath, D, Marcou, G, Varnek, A, Prokopenko, V, Tetko, I, Pandey, AK, Müller, KR, Kuz'min, V, Martin, TM, Young, DM, Prokopenko, VV, Tetko, IV, and TODESCHINI, ROBERTO
- Abstract
The estimation of accuracy and applicability of QSAR and QSPR models for biological and physicochemical properties represents a critical problem. The developed parameter of “distance to model” (DM) is defined as a metric of similarity between the training and test set compounds that have been subjected to QSAR/QSPR modeling. In our previous work, we demonstrated the utility and optimal performance of DM metrics that have been based on the standard deviation within an ensemble of QSAR models. The current study applies such analysis to 30 QSAR models for the Ames mutagenicity data set that were previously reported within the 2009 QSAR challenge. We demonstrate that the DMs based on an ensemble (consensus) model provide systematically better performance than other DMs. The presented approach identifies 30-60% of compounds having an accuracy of prediction similar to the interlaboratory accuracy of the Ames test, which is estimated to be 90%. Thus, the in silico predictions can be used to halve the cost of experimental measurements by providing a similar prediction accuracy. The developed model has been made publicly available at http://ochem.eu/models/1
- Published
- 2010
7. Critical assessment of QSAR models of environmental toxicity against tetrahymena pyriformis: Focusing on applicability domain and overfitting by variable selection
- Author
-
Tetko, I, Sushko, I, Pandey, A, Zhu, H, Tropsha, A, Papa, E, Oberg, T, Todeschini, R, Fourches, D, Varnek, A, Tetko, IV, Pandey, AK, Varnek, A., TODESCHINI, ROBERTO, Tetko, I, Sushko, I, Pandey, A, Zhu, H, Tropsha, A, Papa, E, Oberg, T, Todeschini, R, Fourches, D, Varnek, A, Tetko, IV, Pandey, AK, Varnek, A., and TODESCHINI, ROBERTO
- Abstract
The estimation of the accuracy of predictions is a critical problem in QSAR modeling. The "distance to model" can be defined as a metric that defines the similarity between the training set molecules and the test set compound for the given property in the context of a specific model. It could be expressed in many different ways, e.g., using Tanimoto coefficient, leverage, correlation in space of models, etc. In this paper we have used mixtures of Gaussian distributions as well as statistical tests to evaluate six types of distances to models with respect to their ability to discriminate compounds with small and large prediction errors. The analysis was performed for twelve QSAR models of aqueous toxicity against T. pyriformis obtained with different machine-learning methods and various types of descriptors. The distances to model based on standard deviation of predicted toxicity calculated from the ensemble of models afforded the best results. This distance also successfully discriminated molecules with low and large prediction errors for a mechanism-based model developed using log P and the Maximum Acceptor Superdelocalizability descriptors. Thus, the distance to model metric could also be used to augment mechanistic QSAR models by estimating their prediction errors. Moreover, the accuracy of prediction is mainly determined by the training set data distribution in the chemistry and activity spaces but not by QSAR approaches used to develop the models. We have shown that incorrect validation of a model may result in the wrong estimation of its performance and suggested how this problem could be circumvented. The toxicity of 3182 and 48774 molecules from the EPA High Production Volume (HPV) Challenge Program and EINECS (European chemical Substances Information System), respectively, was predicted, and the accuracy of prediction was estimated. The developed models are available online at http://www.qspr.org site. © 2008 American Chemical Society.
- Published
- 2008
8. Virtual computational chemistry laboratory - design and description
- Author
-
Tetko, I, Gasteiger, J, Todeschini, R, Mauri, A, Livingstone, D, Ertl, P, Palyulin, V, Radchenko, E, Zefirov, N, Makarenko, A, Tanchuk, V, Prokopenko, V, Tetko, IV, Zefirov, NS, Makarenko, AS, Tanchuk, VY, Prokopenko, VV, TODESCHINI, ROBERTO, Tetko, I, Gasteiger, J, Todeschini, R, Mauri, A, Livingstone, D, Ertl, P, Palyulin, V, Radchenko, E, Zefirov, N, Makarenko, A, Tanchuk, V, Prokopenko, V, Tetko, IV, Zefirov, NS, Makarenko, AS, Tanchuk, VY, Prokopenko, VV, and TODESCHINI, ROBERTO
- Abstract
Internet technology offers an excellent opportunity for the development of tools by the cooperative effort of various groups and institutions. We have developed a multi-platform software system, Virtual Computational Chemistry Laboratory, http://www.vcclab.org, allowing the computational chemist to perform a comprehensive series of molecular indices/properties calculations and data analysis. The implemented software is based on a three-tier architecture that is one of the standard technologies to provide client-server services on the Internet. The developed software includes several popular programs, including the indices generation program, DRAGON, a 3D structure generator, CORINA, a program to predict lipophilicity and aqueous solubility of chemicals, ALOGPS and others. All these programs are running at the host institutes located in five countries over Europe. In this article we review the main features and statistics of the developed system that can be used as a prototype for academic and industry models.
- Published
- 2005
9. QSAR-derived affinity fingerprints (part 1): fingerprint construction and modeling performance for similarity searching, bioactivity classification and scaffold hopping
- Author
-
Ctibor Škuta, Igor V. Tetko, Andreas Bender, Pavel Kříž, Daniel Svozil, G. J. P. van Westen, Wim Dehaen, Isidro Cortes-Ciriano, Škuta, C. [0000-0001-5325-4934], Cortés-Ciriano, I. [0000-0002-2036-494X], Dehaen, W. [0000-0002-9597-0629], Kříž, P. [0000-0003-2473-1919], van Westen, G. J. P. [0000-0003-0717-1817], Tetko, I. V. [0000-0002-6855-0012], Bender, A. [0000-0002-6683-7546], Svozil, D. [0000-0003-2577-5163], Apollo - University of Cambridge Repository, Škuta, C [0000-0001-5325-4934], Cortés-Ciriano, I [0000-0002-2036-494X], Dehaen, W [0000-0002-9597-0629], Kříž, P [0000-0003-2473-1919], van Westen, GJP [0000-0003-0717-1817], Tetko, IV [0000-0002-6855-0012], Bender, A [0000-0002-6683-7546], and Svozil, D [0000-0003-2577-5163]
- Subjects
Quantitative structure–activity relationship ,Computer science ,In silico ,Bioactivity modeling ,Library and Information Sciences ,Scaffold hopping ,01 natural sciences ,Biological fingerprint ,lcsh:Chemistry ,03 medical and health sciences ,Similarity (network science) ,Similarity searching ,Research article ,Physical and Theoretical Chemistry ,030304 developmental biology ,0303 health sciences ,lcsh:T58.5-58.64 ,lcsh:Information technology ,business.industry ,QSAR ,Fingerprint (computing) ,Pattern recognition ,chEMBL ,Computer Graphics and Computer-Aided Design ,0104 chemical sciences ,Computer Science Applications ,Random forest ,010404 medicinal & biomolecular chemistry ,lcsh:QD1-999 ,Big Data in Chemistry ,Affinity fingerprint ,Artificial intelligence ,business ,Research Article - Abstract
Funder: FP7 People: Marie-Curie Actions; doi: http://dx.doi.org/10.13039/100011264; Grant(s): 238701, 238701, An affinity fingerprint is the vector consisting of compound’s affinity or potency against the reference panel of protein targets. Here, we present the QAFFP fingerprint, 440 elements long in silico QSAR-based affinity fingerprint, components of which are predicted by Random Forest regression models trained on bioactivity data from the ChEMBL database. Both real-valued (rv-QAFFP) and binary (b-QAFFP) versions of the QAFFP fingerprint were implemented and their performance in similarity searching, biological activity classification and scaffold hopping was assessed and compared to that of the 1024 bits long Morgan2 fingerprint (the RDKit implementation of the ECFP4 fingerprint). In both similarity searching and biological activity classification, the QAFFP fingerprint yields retrieval rates, measured by AUC (~ 0.65 and ~ 0.70 for similarity searching depending on data sets, and ~ 0.85 for classification) and EF5 (~ 4.67 and ~ 5.82 for similarity searching depending on data sets, and ~ 2.10 for classification), comparable to that of the Morgan2 fingerprint (similarity searching AUC of ~ 0.57 and ~ 0.66, and EF5 of ~ 4.09 and ~ 6.41, depending on data sets, classification AUC of ~ 0.87, and EF5 of ~ 2.16). However, the QAFFP fingerprint outperforms the Morgan2 fingerprint in scaffold hopping as it is able to retrieve 1146 out of existing 1749 scaffolds, while the Morgan2 fingerprint reveals only 864 scaffolds.
- Published
- 2020
10. Analysis of dogs’ sleep patterns using convolutional neural networks
- Author
-
Zamansky, A, Sinitca, AM, Kaplun, DI, Plazner, M, Schork, IG, Young, RJ, de Azevedo, CS, Tetko, IV, Kurkova, V, Karpov, P, and Theis, F
- Subjects
GeneralLiterature_MISCELLANEOUS - Abstract
Video-based analysis is one of the most important tools of animal behavior and animal welfare scientists. While automatic analysis systems exist for many species, this problem has not yet been adequately addressed for one of the most studied species in animal science—dogs. In this paper we describe a system developed for analyzing sleeping patterns of kenneled dogs, which may serve as indicator of their welfare. The system combines convolutional neural networks with classical data processing methods, and works with very low quality video from cameras installed in dogs shelters.
- Published
- 2019
11. Overview on the PHRESCO Project: PHotonic REServoir COmputing
- Author
-
Jean-Pierre Locquet, Phresco Partners, Tetko, IV, Kurkova, V, Karpov, P, and Theis, F
- Subjects
Technology ,Science & Technology ,Computer science ,Cognitive computing ,Reservoir computing ,Maturity (finance) ,Computer Science, Artificial Intelligence ,Field (computer science) ,Engineering management ,Work (electrical) ,Computer Science, Theory & Methods ,Computer Science ,Machine learning ,Key (cryptography) - Abstract
PHRESCO is an EU-H2020 funded project that was running for four years and will be ending in September 2019. PHRESCO focused on the development of efficient cognitive computing into a specific silicon-based technology by co-designing a new reservoir computing chip, including innovative electronic and photonic components that will enable major breakthrough in the field. So far, a first-generation reservoir with 18 nodes and integrated readout was designed, fabricated, characterized and a training method has been developed. Additionally, large efforts of the consortium were dedicated to the design of the second-generation chip consisting of larger networks (60 nodes), with an on-chip readout and novel training approaches. This short abstract provides key information on the status of the work achieved and discuss further the potential exploitation routes and the key barriers that still need to be removed to bring the technology to a higher maturity level. A part of the exit strategy of PHRESCO is to identify potential future cooperation with interested stakeholders who are willing to co-develop the PHRESCO technology together with the PHRESCO partners for bringing it to an exploitable or marketable system. This abstract lays down the foundations for potential exploitation activities with interested stakeholders.
- Published
- 2019
- Full Text
- View/download PDF
12. The state-of-the-art machine learning model for plasma protein binding prediction: Computational modeling with OCHEM and experimental validation.
- Author
-
Han Z, Xia Z, Xia J, Tetko IV, and Wu S
- Subjects
- Computer Simulation, Humans, Models, Biological, Pharmaceutical Preparations metabolism, Pharmaceutical Preparations chemistry, Machine Learning, Protein Binding, Blood Proteins metabolism
- Abstract
Plasma protein binding (PPB) is closely related to pharmacokinetics, pharmacodynamics and drug toxicity. Existing models for predicting PPB often suffer from low prediction accuracy and poor interpretability, especially for high PPB compounds, and are most often not experimentally validated. Here, we carried out a strict data curation protocol, and applied consensus modeling to obtain a model with a coefficient of determination of 0.90 and 0.91 on the training set and the test set, respectively. This model (available on the OCHEM platform https://ochem.eu/article/29) was further retrospectively validated for a set of 63 poly-fluorinated molecules and prospectively validated for a set of 25 highly diverse compounds, and its performance for both these sets was superior to that of the other previously reported models. Furthermore, we identified the physicochemical and structural characteristics of high and low PPB molecules for further structural optimization. Finally, we provide practical and detailed recommendations for structural optimization to decrease PPB binding of lead compounds., (Copyright © 2024 The Author(s). Published by Elsevier B.V. All rights reserved.)
- Published
- 2025
- Full Text
- View/download PDF
13. Be aware of overfitting by hyperparameter optimization!
- Author
-
Tetko IV, van Deursen R, and Godin G
- Abstract
Hyperparameter optimization is very frequently employed in machine learning. However, an optimization of a large space of parameters could result in overfitting of models. In recent studies on solubility prediction the authors collected seven thermodynamic and kinetic solubility datasets from different data sources. They used state-of-the-art graph-based methods and compared models developed for each dataset using different data cleaning protocols and hyperparameter optimization. In our study we showed that hyperparameter optimization did not always result in better models, possibly due to overfitting when using the same statistical measures. Similar results could be calculated using pre-set hyperparameters, reducing the computational effort by around 10,000 times. We also extended the previous analysis by adding a representation learning method based on Natural Language Processing of smiles called Transformer CNN. We show that across all analyzed sets using exactly the same protocol, Transformer CNN provided better results than graph-based methods for 26 out of 28 pairwise comparisons by using only a tiny fraction of time as compared to other methods. Last but not least we stressed the importance of comparing calculation results using exactly the same statistical measures.Scientific Contribution We showed that models with pre-optimized hyperparameters can suffer from overfitting and that using pre-set hyperparameters yields similar performances but four orders faster. Transformer CNN provided significantly higher accuracy compared to other investigated methods., Competing Interests: Declarations. Competing interests: The authors declare no competing interests., (© 2024. The Author(s).)
- Published
- 2024
- Full Text
- View/download PDF
14. Tox24 Challenge.
- Author
-
Tetko IV
- Published
- 2024
- Full Text
- View/download PDF
15. Using test-time augmentation to investigate explainable AI: inconsistencies between method, model and human intuition.
- Author
-
Hartog PBR, Krüger F, Genheden S, and Tetko IV
- Abstract
Stakeholders of machine learning models desire explainable artificial intelligence (XAI) to produce human-understandable and consistent interpretations. In computational toxicity, augmentation of text-based molecular representations has been used successfully for transfer learning on downstream tasks. Augmentations of molecular representations can also be used at inference to compare differences between multiple representations of the same ground-truth. In this study, we investigate the robustness of eight XAI methods using test-time augmentation for a molecular-representation model in the field of computational toxicity prediction. We report significant differences between explanations for different representations of the same ground-truth, and show that randomized models have similar variance. We hypothesize that text-based molecular representations in this and past research reflect tokenization more than learned parameters. Furthermore, we see a greater variance between in-domain predictions than out-of-domain predictions, indicating XAI measures something other than learned parameters. Finally, we investigate the relative importance given to expert-derived structural alerts and find similar importance given irregardless of applicability domain, randomization and varying training procedures. We therefore caution future research to validate their methods using a similar comparison to human intuition without further investigation. SCIENTIFIC CONTRIBUTION: In this research we critically investigate XAI through test-time augmentation, contrasting previous assumptions about using expert validation and showing inconsistencies within models for identical representations. SMILES augmentation has been used to increase model accuracy, but was here adapted from the field of image test-time augmentation to be used as an independent indication of the consistency within SMILES-based molecular representation models., (© 2024. The Author(s).)
- Published
- 2024
- Full Text
- View/download PDF
16. The openOCHEM consensus model is the best-performing open-source predictive model in the First EUOS/SLAS joint compound solubility challenge.
- Author
-
Hunklinger A, Hartog P, Šícho M, Godin G, and Tetko IV
- Subjects
- Solubility, Consensus, Databases, Chemical, Neural Networks, Computer, Algorithms
- Abstract
The EUOS/SLAS challenge aimed to facilitate the development of reliable algorithms to predict the aqueous solubility of small molecules using experimental data from 100 K compounds. In total, hundred teams took part in the challenge to predict low, medium and highly soluble compounds as measured by the nephelometry assay. This article describes the winning model, which was developed using the publicly available Online CHEmical database and Modeling environment (OCHEM) available on the website https://ochem.eu/article/27. We describe in detail the assumptions and steps used to select methods, descriptors and strategy which contributed to the winning solution. In particular we show that consensus based on 28 models calculated using descriptor-based and representation learning methods allowed us to obtain the best score, which was higher than those based on individual approaches or consensus models developed using each individual approach. A combination of diverse models allowed us to decrease both bias and variance of individual models and to calculate the highest score. The model based on Transformer CNN contributed the best individual score thus highlighting the power of Natural Language Processing (NLP) methods. The inclusion of information about aleatoric uncertainty would be important to better understand and use the challenge data by the contestants., Competing Interests: Declaration of competing interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported., (Copyright © 2024. Published by Elsevier Inc.)
- Published
- 2024
- Full Text
- View/download PDF
17. When Yield Prediction Does Not Yield Prediction: An Overview of the Current Challenges.
- Author
-
Voinarovska V, Kabeshov M, Dudenko D, Genheden S, and Tetko IV
- Subjects
- Machine Learning, Cheminformatics
- Abstract
Machine Learning (ML) techniques face significant challenges when predicting advanced chemical properties, such as yield, feasibility of chemical synthesis, and optimal reaction conditions. These challenges stem from the high-dimensional nature of the prediction task and the myriad essential variables involved, ranging from reactants and reagents to catalysts, temperature, and purification processes. Successfully developing a reliable predictive model not only holds the potential for optimizing high-throughput experiments but can also elevate existing retrosynthetic predictive approaches and bolster a plethora of applications within the field. In this review, we systematically evaluate the efficacy of current ML methodologies in chemoinformatics, shedding light on their milestones and inherent limitations. Additionally, a detailed examination of a representative case study provides insights into the prevailing issues related to data availability and transferability in the discipline.
- Published
- 2024
- Full Text
- View/download PDF
18. Therapeutic Potential of Targeting Prokineticin Receptors in Diseases.
- Author
-
Vincenzi M, Kremić A, Jouve A, Lattanzi R, Miele R, Benharouga M, Alfaidy N, Migrenne-Li S, Kanthasamy AG, Porcionatto M, Ferrara N, Tetko IV, Désaubry L, and Nebigil CG
- Subjects
- Humans, Receptors, G-Protein-Coupled metabolism, Peptides, Biomarkers, Neuropeptides metabolism, Neoplasms drug therapy
- Abstract
The prokineticins (PKs) were discovered approximately 20 years ago as small peptides inducing gut contractility. Today, they are established as angiogenic, anorectic, and proinflammatory cytokines, chemokines, hormones, and neuropeptides involved in variety of physiologic and pathophysiological pathways. Their altered expression or mutations implicated in several diseases make them a potential biomarker. Their G-protein coupled receptors, PKR1 and PKR2, have divergent roles that can be therapeutic target for treatment of cardiovascular, metabolic, and neural diseases as well as pain and cancer. This article reviews and summarizes our current knowledge of PK family functions from development of heart and brain to regulation of homeostasis in health and diseases. Finally, the review summarizes the established roles of the endogenous peptides, synthetic peptides and the selective ligands of PKR1 and PKR2, and nonpeptide orthostatic and allosteric modulator of the receptors in preclinical disease models. The present review emphasizes the ambiguous aspects and gaps in our knowledge of functions of PKR ligands and elucidates future perspectives for PK research. SIGNIFICANCE STATEMENT: This review provides an in-depth view of the prokineticin family and PK receptors that can be active without their endogenous ligand and exhibits "constitutive" activity in diseases. Their non- peptide ligands display promising effects in several preclinical disease models. PKs can be the diagnostic biomarker of several diseases. A thorough understanding of the role of prokineticin family and their receptor types in health and diseases is critical to develop novel therapeutic strategies with safety concerns., (Copyright © 2023 by The Author(s).)
- Published
- 2023
- Full Text
- View/download PDF
19. Introduction to the Special Issue: AI Meets Toxicology.
- Author
-
Klambauer G, Clevert DA, Shah I, Benfenati E, and Tetko IV
- Published
- 2023
- Full Text
- View/download PDF
20. MEDIATE - Molecular DockIng at homE: Turning collaborative simulations into therapeutic solutions.
- Author
-
Vistoli G, Manelfi C, Talarico C, Fava A, Warshel A, Tetko IV, Apostolov R, Ye Y, Latini C, Ficarelli F, Palermo G, Gadioli D, Vitali E, Varriale G, Pisapia V, Scaturro M, Coletti S, Gregori D, Gruffat D, Leija E, Hessenauer S, Delbianco A, Allegretti M, and Beccari AR
- Subjects
- Humans, Molecular Docking Simulation, Proteins, Antiviral Agents, SARS-CoV-2, COVID-19
- Abstract
Introduction: Collaborative computing has attracted great interest in the possibility of joining the efforts of researchers worldwide. Its relevance has further increased during the pandemic crisis since it allows for the strengthening of scientific collaborations while avoiding physical interactions. Thus, the E4C consortium presents the MEDIATE initiative which invited researchers to contribute via their virtual screening simulations that will be combined with AI-based consensus approaches to provide robust and method-independent predictions. The best compounds will be tested, and the biological results will be shared with the scientific community., Areas Covered: In this paper, the MEDIATE initiative is described. This shares compounds' libraries and protein structures prepared to perform standardized virtual screenings. Preliminary analyses are also reported which provide encouraging results emphasizing the MEDIATE initiative's capacity to identify active compounds., Expert Opinion: Structure-based virtual screening is well-suited for collaborative projects provided that the participating researchers work on the same input file. Until now, such a strategy was rarely pursued and most initiatives in the field were organized as challenges. The MEDIATE platform is focused on SARS-CoV-2 targets but can be seen as a prototype which can be utilized to perform collaborative virtual screening campaigns in any therapeutic field by sharing the appropriate input files.
- Published
- 2023
- Full Text
- View/download PDF
21. Artificial Intelligence Meets Toxicology.
- Author
-
Tetko IV, Klambauer G, Clevert DA, Shah I, and Benfenati E
- Subjects
- Artificial Intelligence
- Published
- 2022
- Full Text
- View/download PDF
22. Theoretical and Experimental Studies of Phosphonium Ionic Liquids as Potential Antibacterials of MDR Acinetobacter baumannii .
- Author
-
Metelytsia LO, Hodyna DM, Semenyuta IV, Kovalishyn VV, Rogalsky SP, Derevianko KY, Brovarets VS, and Tetko IV
- Abstract
A previously developed model to predict antibacterial activity of ionic liquids against a resistant A. baumannii strain was used to assess activity of phosphonium ionic liquids. Their antioxidant potential was additionally evaluated with newly developed models, which were based on public data. The accuracy of the models was rigorously evaluated using cross-validation as well as test set prediction. Six alkyl triphenylphosphonium and alkyl tributylphosphonium bromides with the C
8 , C10 , and C12 alkyl chain length were synthesized and tested in vitro. Experimental studies confirmed their activity against A. baumannii as well as showed pronounced antioxidant properties. These results suggest that phosphonium ionic liquids could be promising lead structures against A. baumannii .- Published
- 2022
- Full Text
- View/download PDF
23. Highly Accurate Filters to Flag Frequent Hitters in AlphaScreen Assays by Suggesting their Mechanism.
- Author
-
Ghosh D, Koch U, Hadian K, Sattler M, and Tetko IV
- Subjects
- Artificial Intelligence, Biological Assay, Drug Discovery methods, High-Throughput Screening Assays methods, Small Molecule Libraries
- Abstract
AlphaScreen is one of the most widely used assay technologies in drug discovery due to its versatility, dynamic range and sensitivity. However, a presence of false positives and frequent hitters contributes to difficulties with an interpretation of measured HTS data. Although filters do exist to identify frequent hitters for AlphaScreen, they are frequently based on privileged scaffolds. The development of such filters is time consuming and requires deep domain knowledge. Recently, machine learning and artificial intelligence methods are emerging as important tools to advance drug discovery and chemoinformatics, including their application to identification of frequent hitters in screening assays. However, the relative performance and complementarity of the Machine Learning and scaffold-based techniques has not yet been comprehensively compared. In this study, we analysed filters based on the privileged scaffolds with filters built using machine learning. Our results demonstrate that machine-learning methods provide more accurate filters for identification of frequent hitters in AlphaScreen assays than scaffold-based methods and can be easily redeveloped once new data are measured. We present highly accurate models to identify frequent hitters in AlphaScreen assays., (© 2021 Wiley-VCH GmbH.)
- Published
- 2022
- Full Text
- View/download PDF
24. Deep neural network model for highly accurate prediction of BODIPYs absorption.
- Author
-
Ksenofontov AA, Lukanov MM, Bocharov PS, Berezin MB, and Tetko IV
- Subjects
- Crystallography, X-Ray, Neural Networks, Computer, Boron Compounds, Fluorescent Dyes
- Abstract
A possibility to accurately predict the absorption maximum wavelength of BODIPYs was investigated. We found that previously reported models had a low accuracy (40-57 nm) to predict BODIPYs due to the limited dataset sizes and/or number of BODIPYs (few hundreds). New models developed in this study were based on data of 6000-plus fluorescent dyes (including 4000-plus BODIPYs) and the deep neural network architecture. The high prediction accuracy (five-fold cross-validation room mean squared error (RMSE) of 18.4 nm) was obtained using a consensus model, which was more accurate than individual models. This model provided the excellent accuracy (RMSE of 8 nm) for molecules previously synthesized in our laboratory as well as for prospective validation of three new BODIPYs. We found that solvent properties did not significantly influence the model accuracy since only few BODIPYs exhibited solvatochromism. The analysis of large prediction errors suggested that compounds able to have intermolecular interactions with solvent or salts were likely to be incorrectly predicted. The consensus model is freely available at https://ochem.eu/article/134921 and can help the other researchers to accelerate design of new dyes with desired properties., Competing Interests: Declaration of Competing Interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper., (Copyright © 2021 Elsevier B.V. All rights reserved.)
- Published
- 2022
- Full Text
- View/download PDF
25. More Is Not Always Better: Local Models Provide Accurate Predictions of Spectral Properties of Porphyrins.
- Author
-
Rusanov AI, Dmitrieva OA, Mamardashvili NZ, and Tetko IV
- Subjects
- Models, Molecular, Molecular Structure, Computer Simulation, Porphyrins chemistry, Spectrophotometry methods
- Abstract
The development of new functional materials based on porphyrins requires fast and accurate prediction of their spectral properties. The available models in the literature for absorption wavelength and extinction coefficient of the Soret band have low accuracy for this class of compounds. We collected spectral data for porphyrins to extend the literature set and compared the performance of global and local models for their modelling using different machine learning methods. Interestingly, extension of the public database contributed models with lower accuracies compared to the models, which we built using porphyrins only. The later model calculated acceptable RMSE = 2.61 for prediction of the absorption band of 335 porphyrins synthesized in our laboratory, but had a low accuracy (RMSE = 0.52) for extinction coefficient. A development of models using only compounds from our laboratory significantly decreased errors for these compounds (RMSE = 0.5 and 0.042 for absorption band and extinction coefficient, respectively), but limited their applicability only to these homologous series. When developing models, one should clearly keep in mind their potential use and select a strategy that could contribute the most accurate predictions for the target application. The models and data are publicly available.
- Published
- 2022
- Full Text
- View/download PDF
26. Erratum: CATMoS: Collaborative Acute Toxicity Modeling Suite.
- Author
-
Mansouri K, Karmaus AL, Fitzpatrick J, Patlewicz G, Pradeep P, Alberga D, Alepee N, Allen TEH, Allen D, Alves VM, Andrade CH, Auernhammer TR, Ballabio D, Bell S, Benfenati E, Bhattacharya S, Bastos JV, Boyd S, Brown JB, Capuzzi SJ, Chushak Y, Ciallella H, Clark AM, Consonni V, Daga PR, Ekins S, Farag S, Fedorov M, Fourches D, Gadaleta D, Gao F, Gearhart JM, Goh G, Goodman JM, Grisoni F, Grulke CM, Hartung T, Hirn M, Karpov P, Korotcov A, Lavado GJ, Lawless M, Li X, Luechtefeld T, Lunghini F, Mangiatordi GF, Marcou G, Marsh D, Martin T, Mauri A, Muratov EN, Myatt GJ, Nguyen DT, Nicolotti O, Note R, Pande P, Parks AK, Peryea T, Polash AH, Rallo R, Roncaglioni A, Rowlands C, Ruiz P, Russo DP, Sayed A, Sayre R, Sheils T, Siegel C, Silva AC, Simeonov A, Sosnin S, Southall N, Strickland J, Tang Y, Teppen B, Tetko IV, Thomas D, Tkachenko V, Todeschini R, Toma C, Tripodi I, Trisciuzzi D, Tropsha A, Varnek A, Vukovic K, Wang Z, Wang L, Waters KM, Wedlake AJ, Wijeyesakere SJ, Wilson D, Xiao Z, Yang H, Zahoranszky-Kohalmi G, Zakharov AV, Zhang FF, Zhang Z, Zhao T, Zhu H, Zorn KM, Casey W, and Kleinstreuer NC
- Published
- 2021
- Full Text
- View/download PDF
27. Anti-MRSA drug discovery by ligand-based virtual screening and biological evaluation.
- Author
-
Lian X, Xia Z, Li X, Karpov P, Jin H, Tetko IV, Xia J, and Wu S
- Subjects
- Anti-Bacterial Agents chemical synthesis, Anti-Bacterial Agents metabolism, Anti-Bacterial Agents toxicity, DNA Gyrase metabolism, Drug Evaluation, Preclinical, Hep G2 Cells, Human Umbilical Vein Endothelial Cells, Humans, Ligands, Microbial Sensitivity Tests, Molecular Docking Simulation, Molecular Dynamics Simulation, Protein Binding, Quinoxalines chemical synthesis, Quinoxalines metabolism, Quinoxalines toxicity, Topoisomerase II Inhibitors chemical synthesis, Topoisomerase II Inhibitors metabolism, Topoisomerase II Inhibitors pharmacology, Topoisomerase II Inhibitors toxicity, Anti-Bacterial Agents pharmacology, Methicillin-Resistant Staphylococcus aureus drug effects, Quinoxalines pharmacology
- Abstract
S. aureus resistant to methicillin (MRSA) is one of the most-concerned multidrug resistant bacteria, due to its role in life-threatening infections. There is an urgent need to develop new antibiotics against MRSA. In this study, we firstly compiled a data set of 2,3-diaminoquinoxalines by chemical synthesis and antibacterial screening against S. aureus, and then performed cheminformatics modeling and virtual screening. The compound with the Specs ID of AG-205/33156020 was discovered as a new antibacterial agent, and was further identified as a Gyrase B (GyrB) inhibitor. In light of the common features, we hypothesized that the 6c as the representative of 2,3-diaminoquinoxalines also inhibited GyrB and eventually proved it. Via molecular docking and molecular dynamics simulations, we identified binding modes of AG-205/33156020 and 6c to the ATPase domain of GyrB. Importantly, these GyrB inhibitors inhibited the MRSA strains and showed selectivity to HepG2 and HUVEC. Taken together, this research work provides an effective ligand-based computational workflow for scaffold hopping in anti-MRSA drug discovery, and discovers two new GyrB inhibitors that are worthy of further development., (Copyright © 2021 Elsevier Inc. All rights reserved.)
- Published
- 2021
- Full Text
- View/download PDF
28. CATMoS: Collaborative Acute Toxicity Modeling Suite.
- Author
-
Mansouri K, Karmaus AL, Fitzpatrick J, Patlewicz G, Pradeep P, Alberga D, Alepee N, Allen TEH, Allen D, Alves VM, Andrade CH, Auernhammer TR, Ballabio D, Bell S, Benfenati E, Bhattacharya S, Bastos JV, Boyd S, Brown JB, Capuzzi SJ, Chushak Y, Ciallella H, Clark AM, Consonni V, Daga PR, Ekins S, Farag S, Fedorov M, Fourches D, Gadaleta D, Gao F, Gearhart JM, Goh G, Goodman JM, Grisoni F, Grulke CM, Hartung T, Hirn M, Karpov P, Korotcov A, Lavado GJ, Lawless M, Li X, Luechtefeld T, Lunghini F, Mangiatordi GF, Marcou G, Marsh D, Martin T, Mauri A, Muratov EN, Myatt GJ, Nguyen DT, Nicolotti O, Note R, Pande P, Parks AK, Peryea T, Polash AH, Rallo R, Roncaglioni A, Rowlands C, Ruiz P, Russo DP, Sayed A, Sayre R, Sheils T, Siegel C, Silva AC, Simeonov A, Sosnin S, Southall N, Strickland J, Tang Y, Teppen B, Tetko IV, Thomas D, Tkachenko V, Todeschini R, Toma C, Tripodi I, Trisciuzzi D, Tropsha A, Varnek A, Vukovic K, Wang Z, Wang L, Waters KM, Wedlake AJ, Wijeyesakere SJ, Wilson D, Xiao Z, Yang H, Zahoranszky-Kohalmi G, Zakharov AV, Zhang FF, Zhang Z, Zhao T, Zhu H, Zorn KM, Casey W, and Kleinstreuer NC
- Subjects
- Animals, Computer Simulation, Rats, Toxicity Tests, Acute, United States, United States Environmental Protection Agency, Government Agencies
- Abstract
Background: Humans are exposed to tens of thousands of chemical substances that need to be assessed for their potential toxicity. Acute systemic toxicity testing serves as the basis for regulatory hazard classification, labeling, and risk management. However, it is cost- and time-prohibitive to evaluate all new and existing chemicals using traditional rodent acute toxicity tests. In silico models built using existing data facilitate rapid acute toxicity predictions without using animals., Objectives: The U.S. Interagency Coordinating Committee on the Validation of Alternative Methods (ICCVAM) Acute Toxicity Workgroup organized an international collaboration to develop in silico models for predicting acute oral toxicity based on five different end points: Lethal Dose 50 ( LD 50 value, U.S. Environmental Protection Agency hazard (four) categories, Globally Harmonized System for Classification and Labeling hazard (five) categories, very toxic chemicals [ LD 50 ( LD 50 ≤ 50 mg / kg )], and nontoxic chemicals ( L D 50 > 2,000 mg / kg )., Methods: An acute oral toxicity data inventory for 11,992 chemicals was compiled, split into training and evaluation sets, and made available to 35 participating international research groups that submitted a total of 139 predictive models. Predictions that fell within the applicability domains of the submitted models were evaluated using external validation sets. These were then combined into consensus models to leverage strengths of individual approaches., Results: The resulting consensus predictions, which leverage the collective strengths of each individual model, form the Collaborative Acute Toxicity Modeling Suite (CATMoS). CATMoS demonstrated high performance in terms of accuracy and robustness when compared with in vivo results., Discussion: CATMoS is being evaluated by regulatory agencies for its utility and applicability as a potential replacement for in vivo rat acute oral toxicity studies. CATMoS predictions for more than 800,000 chemicals have been made available via the National Toxicology Program's Integrated Chemical Environment tools and data sets (ice.ntp.niehs.nih.gov). The models are also implemented in a free, standalone, open-source tool, OPERA, which allows predictions of new and untested chemicals to be made. https://doi.org/10.1289/EHP8495.
- Published
- 2021
- Full Text
- View/download PDF
29. Introduction to Special Issue: Computational Toxicology.
- Author
-
Kleinstreuer NC, Tetko IV, and Tong W
- Subjects
- Animals, Artificial Intelligence, High-Throughput Screening Assays, Humans, Machine Learning, Structure-Activity Relationship, Computational Biology, Toxicity Tests
- Published
- 2021
- Full Text
- View/download PDF
30. Trade-off Predictivity and Explainability for Machine-Learning Powered Predictive Toxicology: An in-Depth Investigation with Tox21 Data Sets.
- Author
-
Wu L, Huang R, Tetko IV, Xia Z, Xu J, and Tong W
- Subjects
- Databases, Factual, Humans, Models, Molecular, Quantitative Structure-Activity Relationship, Chemical and Drug Induced Liver Injury, Machine Learning, Pharmaceutical Preparations chemistry
- Abstract
Selecting a model in predictive toxicology often involves a trade-off between prediction performance and explainability: should we sacrifice the model performance to gain explainability or vice versa. Here we present a comprehensive study to assess algorithm and feature influences on model performance in chemical toxicity research. We conducted over 5000 models for a Tox21 bioassay data set of 65 assays and ∼7600 compounds. Seven molecular representations as features and 12 modeling approaches varying in complexity and explainability were employed to systematically investigate the impact of various factors on model performance and explainability. We demonstrated that end points dictated a model's performance, regardless of the chosen modeling approach including deep learning and chemical features. Overall, more complex models such as (LS-)SVM and Random Forest performed marginally better than simpler models such as linear regression and KNN in the presented Tox21 data analysis. Since a simpler model with acceptable performance often also is easy to interpret for the Tox21 data set, it clearly was the preferred choice due to its better explainability. Given that each data set had its own error structure both for dependent and independent variables, we strongly recommend that it is important to conduct a systematic study with a broad range of model complexity and feature explainability to identify model balancing its predictivity and explainability.
- Published
- 2021
- Full Text
- View/download PDF
31. Structure-Activity Relationship Modeling and Experimental Validation of the Imidazolium and Pyridinium Based Ionic Liquids as Potential Antibacterials of MDR Acinetobacter Baumannii and Staphylococcus Aureus .
- Author
-
Semenyuta IV, Trush MM, Kovalishyn VV, Rogalsky SP, Hodyna DM, Karpov P, Xia Z, Tetko IV, and Metelytsia LO
- Subjects
- Acinetobacter baumannii pathogenicity, Bacterial Infections microbiology, Drug Resistance, Multiple, Humans, Imidazoles chemical synthesis, Ionic Liquids chemical synthesis, Ionic Liquids chemistry, Pyridines chemical synthesis, Staphylococcus aureus drug effects, Staphylococcus aureus pathogenicity, Structure-Activity Relationship, Acinetobacter baumannii drug effects, Bacterial Infections drug therapy, Imidazoles chemistry, Pyridines chemistry
- Abstract
Online Chemical Modeling Environment (OCHEM) was used for QSAR analysis of a set of ionic liquids (ILs) tested against multi-drug resistant (MDR) clinical isolate Acinetobacter baumannii and Staphylococcus aureus strains. The predictive accuracy of regression models has coefficient of determination q
2 = 0.66 - 0.79 with cross-validation and independent test sets. The models were used to screen a virtual chemical library of ILs, which was designed with targeted activity against MDR Acinetobacter baumannii and Staphylococcus aureus strains. Seven most promising ILs were selected, synthesized, and tested. Three ILs showed high activity against both these MDR clinical isolates.- Published
- 2021
- Full Text
- View/download PDF
32. From Big Data to Artificial Intelligence: chemoinformatics meets new challenges.
- Author
-
Tetko IV and Engkvist O
- Abstract
The increasing volume of biomedical data in chemistry and life sciences requires development of new methods and approaches for their analysis. Artificial Intelligence and machine learning, especially neural networks, are increasingly used in the chemical industry, in particular with respect to Big Data. This editorial highlights the main results presented during the special session of the International Conference on Neural Networks organized by "Big Data in Chemistry" project and draws perspectives on the future progress of the field.
- Published
- 2020
- Full Text
- View/download PDF
33. State-of-the-art augmented NLP transformer models for direct and single-step retrosynthesis.
- Author
-
Tetko IV, Karpov P, Van Deursen R, and Godin G
- Abstract
We investigated the effect of different training scenarios on predicting the (retro)synthesis of chemical compounds using text-like representation of chemical reactions (SMILES) and Natural Language Processing (NLP) neural network Transformer architecture. We showed that data augmentation, which is a powerful method used in image processing, eliminated the effect of data memorization by neural networks and improved their performance for prediction of new sequences. This effect was observed when augmentation was used simultaneously for input and the target data simultaneously. The top-5 accuracy was 84.8% for the prediction of the largest fragment (thus identifying principal transformation for classical retro-synthesis) for the USPTO-50k test dataset, and was achieved by a combination of SMILES augmentation and a beam search algorithm. The same approach provided significantly better results for the prediction of direct reactions from the single-step USPTO-MIT test set. Our model achieved 90.6% top-1 and 96.1% top-5 accuracy for its challenging mixed set and 97% top-5 accuracy for the USPTO-MIT separated set. It also significantly improved results for USPTO-full set single-step retrosynthesis for both top-1 and top-10 accuracies. The appearance frequency of the most abundantly generated SMILES was well correlated with the prediction outcome and can be used as a measure of the quality of reaction prediction.
- Published
- 2020
- Full Text
- View/download PDF
34. Focused Library Generator: case of Mdmx inhibitors.
- Author
-
Xia Z, Karpov P, Popowicz G, and Tetko IV
- Subjects
- Antineoplastic Agents chemistry, Antineoplastic Agents pharmacology, Binding Sites, Cell Cycle Proteins chemistry, Computer-Aided Design statistics & numerical data, Databases, Chemical statistics & numerical data, Databases, Pharmaceutical, Drug Discovery methods, Drug Discovery statistics & numerical data, Humans, Ligands, Molecular Docking Simulation, Molecular Dynamics Simulation, Neural Networks, Computer, Protein Binding, Proto-Oncogene Proteins chemistry, Quantitative Structure-Activity Relationship, Cell Cycle Proteins antagonists & inhibitors, Drug Design, Proto-Oncogene Proteins antagonists & inhibitors, Small Molecule Libraries
- Abstract
We present a Focused Library Generator that is able to create from scratch new molecules with desired properties. After training the Generator on the ChEMBL database, transfer learning was used to switch the generator to producing new Mdmx inhibitors that are a promising class of anticancer drugs. Lilly medicinal chemistry filters, molecular docking, and a QSAR IC
50 model were used to refine the output of the Generator. Pharmacophore screening and molecular dynamics (MD) simulations were then used to further select putative ligands. Finally, we identified five promising hits with equivalent or even better predicted binding free energies and IC50 values than known Mdmx inhibitors. The source code of the project is available on https://github.com/bigchem/online-chem.- Published
- 2020
- Full Text
- View/download PDF
35. Correction: QSAR without borders.
- Author
-
Muratov EN, Bajorath J, Sheridan RP, Tetko IV, Filimonov D, Poroikov V, Oprea TI, Baskin II, Varnek A, Roitberg A, Isayev O, Curtarolo S, Fourches D, Cohen Y, Aspuru-Guzik A, Winkler DA, Agrafiotis D, Cherkasov A, and Tropsha A
- Abstract
Correction for 'QSAR without borders' by Eugene N. Muratov et al., Chem. Soc. Rev., 2020, DOI: 10.1039/d0cs00098a.
- Published
- 2020
- Full Text
- View/download PDF
36. QSAR without borders.
- Author
-
Muratov EN, Bajorath J, Sheridan RP, Tetko IV, Filimonov D, Poroikov V, Oprea TI, Baskin II, Varnek A, Roitberg A, Isayev O, Curtarolo S, Fourches D, Cohen Y, Aspuru-Guzik A, Winkler DA, Agrafiotis D, Cherkasov A, and Tropsha A
- Subjects
- Algorithms, Animals, Artificial Intelligence, Databases, Factual, Drug Design, History, 20th Century, History, 21st Century, Humans, Models, Molecular, Quantitative Structure-Activity Relationship, Quantum Theory, Reproducibility of Results, Chemistry, Pharmaceutical methods, Drug-Related Side Effects and Adverse Reactions metabolism, Pharmaceutical Preparations chemistry
- Abstract
Prediction of chemical bioactivity and physical properties has been one of the most important applications of statistical and more recently, machine learning and artificial intelligence methods in chemical sciences. This field of research, broadly known as quantitative structure-activity relationships (QSAR) modeling, has developed many important algorithms and has found a broad range of applications in physical organic and medicinal chemistry in the past 55+ years. This Perspective summarizes recent technological advances in QSAR modeling but it also highlights the applicability of algorithms, modeling methods, and validation practices developed in QSAR to a wide range of research areas outside of traditional QSAR boundaries including synthesis planning, nanotechnology, materials science, biomaterials, and clinical informatics. As modern research methods generate rapidly increasing amounts of data, the knowledge of robust data-driven modelling methods professed within the QSAR field can become essential for scientists working both within and outside of chemical research. We hope that this contribution highlighting the generalizable components of QSAR modeling will serve to address this challenge.
- Published
- 2020
- Full Text
- View/download PDF
37. In silico and in vitro studies of a number PILs as new antibacterials against MDR clinical isolate Acinetobacter baumannii.
- Author
-
Trush MM, Kovalishyn V, Hodyna D, Golovchenko OV, Chumachenko S, Tetko IV, Brovarets VS, and Metelytsia L
- Subjects
- Animals, Anti-Bacterial Agents pharmacology, Computer Simulation, Crustacea drug effects, Databases, Chemical, Drug Evaluation, Preclinical, Drug Resistance, Multiple, Bacterial, Humans, Ionic Liquids pharmacology, Machine Learning, Microbial Sensitivity Tests, Quantitative Structure-Activity Relationship, Acinetobacter baumannii drug effects, Anti-Bacterial Agents chemistry, Ionic Liquids chemistry, Organophosphorus Compounds chemistry
- Abstract
QSAR analysis of a set of previously synthesized phosphonium ionic liquids (PILs) tested against Gram-negative multidrug-resistant clinical isolate Acinetobacter baumannii was done using the Online Chemical Modeling Environment (OCHEM). To overcome the problem of overfitting due to descriptor selection, fivefold cross-validation with variable selection in each step of the model development was applied. The predictive ability of the classification models was tested by cross-validation, giving balanced accuracies (BA) of 76%-82%. The validation of the models using an external test set proved that the models can be used to predict the activity of newly designed compounds with a reasonable accuracy within the applicability domain (BA = 83%-89%). The models were applied to screen a virtual chemical library with expected activity of compounds against MDR Acinetobacter baumannii. The eighteen most promising compounds were identified, synthesized, and tested. Biological testing of compounds was performed using the disk diffusion method in Mueller-Hinton agar. All tested molecules demonstrated high anti-A. baumannii activity and different toxicity levels. The developed classification SAR models are freely available online at http://ochem.eu/article/113921 and could be used by scientists for design of new more effective antibiotics., (© 2020 John Wiley & Sons A/S.)
- Published
- 2020
- Full Text
- View/download PDF
38. QSAR-derived affinity fingerprints (part 1): fingerprint construction and modeling performance for similarity searching, bioactivity classification and scaffold hopping.
- Author
-
Škuta C, Cortés-Ciriano I, Dehaen W, Kříž P, van Westen GJP, Tetko IV, Bender A, and Svozil D
- Abstract
An affinity fingerprint is the vector consisting of compound's affinity or potency against the reference panel of protein targets. Here, we present the QAFFP fingerprint, 440 elements long in silico QSAR-based affinity fingerprint, components of which are predicted by Random Forest regression models trained on bioactivity data from the ChEMBL database. Both real-valued (rv-QAFFP) and binary (b-QAFFP) versions of the QAFFP fingerprint were implemented and their performance in similarity searching, biological activity classification and scaffold hopping was assessed and compared to that of the 1024 bits long Morgan2 fingerprint (the RDKit implementation of the ECFP4 fingerprint). In both similarity searching and biological activity classification, the QAFFP fingerprint yields retrieval rates, measured by AUC (~ 0.65 and ~ 0.70 for similarity searching depending on data sets, and ~ 0.85 for classification) and EF5 (~ 4.67 and ~ 5.82 for similarity searching depending on data sets, and ~ 2.10 for classification), comparable to that of the Morgan2 fingerprint (similarity searching AUC of ~ 0.57 and ~ 0.66, and EF5 of ~ 4.09 and ~ 6.41, depending on data sets, classification AUC of ~ 0.87, and EF5 of ~ 2.16). However, the QAFFP fingerprint outperforms the Morgan2 fingerprint in scaffold hopping as it is able to retrieve 1146 out of existing 1749 scaffolds, while the Morgan2 fingerprint reveals only 864 scaffolds.
- Published
- 2020
- Full Text
- View/download PDF
39. Water envelope has a critical impact on the design of protein-protein interaction inhibitors.
- Author
-
Ratkova EL, Dawidowski M, Napolitano V, Dubin G, Fino R, Ostertag MS, Sattler M, Popowicz G, and Tetko IV
- Subjects
- Computer Simulation, Crystallography, X-Ray, Humans, Membrane Proteins chemistry, Molecular Structure, Peroxisome-Targeting Signal 1 Receptor metabolism, Proof of Concept Study, Protein Binding drug effects, Protozoan Proteins chemistry, Structure-Activity Relationship, Trypanosoma brucei brucei chemistry, Water chemistry, Membrane Proteins metabolism, Protein Multimerization drug effects, Protozoan Proteins metabolism, Pyrazoles chemistry, Pyrrolidines chemistry, Water metabolism
- Abstract
We show that a water envelope network plays a critical role in protein-protein interactions (PPI). The potency of a PPI inhibitor is modulated by orders of magnitude on manipulation of the solvent envelope alone. The structure-activity relationship of PEX14 inhibitors was analyzed as an example using in silico and X-ray data.
- Published
- 2020
- Full Text
- View/download PDF
40. GEN: highly efficient SMILES explorer using autodidactic generative examination networks.
- Author
-
van Deursen R, Ertl P, Tetko IV, and Godin G
- Abstract
Recurrent neural networks have been widely used to generate millions of de novo molecules in defined chemical spaces. Reported deep generative models are exclusively based on LSTM and/or GRU units and frequently trained using canonical SMILES. In this study, we introduce Generative Examination Networks (GEN) as a new approach to train deep generative networks for SMILES generation. In our GENs, we have used an architecture based on multiple concatenated bidirectional RNN units to enhance the validity of generated SMILES. GENs autonomously learn the target space in a few epochs and are stopped early using an independent online examination mechanism, measuring the quality of the generated set. Herein we have used online statistical quality control (SQC) on the percentage of valid molecular SMILES as examination measure to select the earliest available stable model weights. Very high levels of valid SMILES (95-98%) can be generated using multiple parallel encoding layers in combination with SMILES augmentation using unrestricted SMILES randomization. Our trained models combine an excellent novelty rate (85-90%) while generating SMILES with strong conservation of the property space (95-99%). In GENs, both the generative network and the examination mechanism are open to other architectures and quality criteria.
- Published
- 2020
- Full Text
- View/download PDF
41. Joint Virtual Special Issue on Computational Toxicology.
- Author
-
Tetko IV and Tropsha A
- Subjects
- Computational Biology, Toxicology
- Published
- 2020
- Full Text
- View/download PDF
42. Transformer-CNN: Swiss knife for QSAR modeling and interpretation.
- Author
-
Karpov P, Godin G, and Tetko IV
- Abstract
We present SMILES-embeddings derived from the internal encoder state of a Transformer [1] model trained to canonize SMILES as a Seq2Seq problem. Using a CharNN [2] architecture upon the embeddings results in higher quality interpretable QSAR/QSPR models on diverse benchmark datasets including regression and classification tasks. The proposed Transformer-CNN method uses SMILES augmentation for training and inference, and thus the prognosis is based on an internal consensus. That both the augmentation and transfer learning are based on embeddings allows the method to provide good results for small datasets. We discuss the reasons for such effectiveness and draft future directions for the development of the method. The source code and the embeddings needed to train a QSAR model are available on https://github.com/bigchem/transformer-cnn. The repository also has a standalone program for QSAR prognosis which calculates individual atoms contributions, thus interpreting the model's result. OCHEM [3] environment (https://ochem.eu) hosts the on-line implementation of the method proposed.
- Published
- 2020
- Full Text
- View/download PDF
43. Computational Toxicology.
- Author
-
Kleinstreuer NC, Tong W, and Tetko IV
- Subjects
- Humans, Computational Biology, Toxicology
- Published
- 2020
- Full Text
- View/download PDF
44. CoMPARA: Collaborative Modeling Project for Androgen Receptor Activity.
- Author
-
Mansouri K, Kleinstreuer N, Abdelaziz AM, Alberga D, Alves VM, Andersson PL, Andrade CH, Bai F, Balabin I, Ballabio D, Benfenati E, Bhhatarai B, Boyer S, Chen J, Consonni V, Farag S, Fourches D, García-Sosa AT, Gramatica P, Grisoni F, Grulke CM, Hong H, Horvath D, Hu X, Huang R, Jeliazkova N, Li J, Li X, Liu H, Manganelli S, Mangiatordi GF, Maran U, Marcou G, Martin T, Muratov E, Nguyen DT, Nicolotti O, Nikolov NG, Norinder U, Papa E, Petitjean M, Piir G, Pogodin P, Poroikov V, Qiao X, Richard AM, Roncaglioni A, Ruiz P, Rupakheti C, Sakkiah S, Sangion A, Schramm KW, Selvaraj C, Shah I, Sild S, Sun L, Taboureau O, Tang Y, Tetko IV, Todeschini R, Tong W, Trisciuzzi D, Tropsha A, Van Den Driessche G, Varnek A, Wang Z, Wedebye EB, Williams AJ, Xie H, Zakharov AV, Zheng Z, and Judson RS
- Subjects
- Androgens, Databases, Factual, High-Throughput Screening Assays, Humans, Receptors, Androgen, United States, United States Environmental Protection Agency, Computer Simulation, Endocrine Disruptors
- Abstract
Background: Endocrine disrupting chemicals (EDCs) are xenobiotics that mimic the interaction of natural hormones and alter synthesis, transport, or metabolic pathways. The prospect of EDCs causing adverse health effects in humans and wildlife has led to the development of scientific and regulatory approaches for evaluating bioactivity. This need is being addressed using high-throughput screening (HTS) in vitro approaches and computational modeling., Objectives: In support of the Endocrine Disruptor Screening Program, the U.S. Environmental Protection Agency (EPA) led two worldwide consortiums to virtually screen chemicals for their potential estrogenic and androgenic activities. Here, we describe the Collaborative Modeling Project for Androgen Receptor Activity (CoMPARA) efforts, which follows the steps of the Collaborative Estrogen Receptor Activity Prediction Project (CERAPP)., Methods: The CoMPARA list of screened chemicals built on CERAPP's list of 32,464 chemicals to include additional chemicals of interest, as well as simulated ToxCast™ metabolites, totaling 55,450 chemical structures. Computational toxicology scientists from 25 international groups contributed 91 predictive models for binding, agonist, and antagonist activity predictions. Models were underpinned by a common training set of 1,746 chemicals compiled from a combined data set of 11 ToxCast™/Tox21 HTS in vitro assays., Results: The resulting models were evaluated using curated literature data extracted from different sources. To overcome the limitations of single-model approaches, CoMPARA predictions were combined into consensus models that provided averaged predictive accuracy of approximately 80% for the evaluation set., Discussion: The strengths and limitations of the consensus predictions were discussed with example chemicals; then, the models were implemented into the free and open-source OPERA application to enable screening of new chemicals with a defined applicability domain and accuracy assessment. This implementation was used to screen the entire EPA DSSTox database of ∼ 875,000 chemicals, and their predicted AR activities have been made available on the EPA CompTox Chemicals dashboard and National Toxicology Program's Integrated Chemical Environment. https://doi.org/10.1289/EHP5580.
- Published
- 2020
- Full Text
- View/download PDF
45. Structure-Activity Relationship in Pyrazolo[4,3- c ]pyridines, First Inhibitors of PEX14-PEX5 Protein-Protein Interaction with Trypanocidal Activity.
- Author
-
Dawidowski M, Kalel VC, Napolitano V, Fino R, Schorpp K, Emmanouilidis L, Lenhart D, Ostertag M, Kaiser M, Kolonko M, Tippler B, Schliebs W, Dubin G, Mäser P, Tetko IV, Hadian K, Plettenburg O, Erdmann R, Sattler M, and Popowicz GM
- Subjects
- Animals, Crystallography, X-Ray, Drug Design, Humans, Magnetic Resonance Spectroscopy, Membrane Proteins biosynthesis, Models, Molecular, Molecular Docking Simulation, Molecular Dynamics Simulation, Myoblasts drug effects, Myoblasts parasitology, Protozoan Proteins biosynthesis, Rats, Structure-Activity Relationship, Trypanosoma brucei gambiense drug effects, Trypanosoma brucei gambiense metabolism, Trypanosoma brucei rhodesiense drug effects, Membrane Proteins antagonists & inhibitors, Protozoan Proteins antagonists & inhibitors, Pyridines chemical synthesis, Pyridines pharmacology, Trypanocidal Agents chemical synthesis, Trypanocidal Agents pharmacology
- Abstract
Trypanosoma protists are pathogens leading to a spectrum of devastating infectious diseases. The range of available chemotherapeutics against Trypanosoma is limited, and the existing therapies are partially ineffective and cause serious adverse effects. Formation of the PEX14-PEX5 complex is essential for protein import into the parasites' glycosomes. This transport is critical for parasite metabolism and failure leads to mislocalization of glycosomal enzymes, with fatal consequences for the parasite. Hence, inhibiting the PEX14-PEX5 protein-protein interaction (PPI) is an attractive way to affect multiple metabolic pathways. Herein, we have used structure-guided computational screening and optimization to develop the first line of compounds that inhibit PEX14-PEX5 PPI. The optimization was driven by several X-ray structures, NMR binding data, and molecular dynamics simulations. Importantly, the developed compounds show significant cellular activity against Trypanosoma , including the human pathogen Trypanosoma brucei gambiense and Trypanosoma cruzi parasites.
- Published
- 2020
- Full Text
- View/download PDF
46. A Survey of Multi-task Learning Methods in Chemoinformatics.
- Author
-
Sosnin S, Vashurina M, Withnall M, Karpov P, Fedorov M, and Tetko IV
- Subjects
- Informatics standards, Chemistry methods, Databases, Chemical, Informatics methods, Machine Learning
- Abstract
Despite the increasing volume of available data, the proportion of experimentally measured data remains small compared to the virtual chemical space of possible chemical structures. Therefore, there is a strong interest in simultaneously predicting different ADMET and biological properties of molecules, which are frequently strongly correlated with one another. Such joint data analyses can increase the accuracy of models by exploiting their common representation and identifying common features between individual properties. In this work we review the recent developments in multi-learning approaches as well as cover the freely available tools and packages that can be used to perform such studies., (© 2018 The Authors. Published by Wiley-VCH Verlag GmbH & Co. KGaA.)
- Published
- 2019
- Full Text
- View/download PDF
47. Comparative Study of Multitask Toxicity Modeling on a Broad Chemical Space.
- Author
-
Sosnin S, Karlov D, Tetko IV, and Fedorov MV
- Subjects
- Animals, Endpoint Determination, Deep Learning, Models, Theoretical, Toxicity Tests, Acute
- Abstract
Acute toxicity is one of the most challenging properties to predict purely with computational methods due to its direct relationship to biological interactions. Moreover, toxicity can be represented by different end points: it can be measured for different species using different types of administration, etc., and it is questionable if the knowledge transfer between end points is possible. We performed a comparative study of prediction multitask toxicity for a broad chemical space using different descriptors and modeling algorithms and applied multitask learning for a large toxicity data set extracted from the Registry of Toxic Effects of Chemical Substances (RTECS). We demonstrated that multitask modeling provides significant improvement over single-output models and other machine learning methods. Our research reveals that multitask learning can be very useful to improve the quality of acute toxicity modeling and raises a discussion about the usage of multitask approaches for regulation purposes. Our MultiTox models are freely available in OCHEM platform ( ochem.eu/multitox ) under CC-BY-NC license.
- Published
- 2019
- Full Text
- View/download PDF
48. Chemical space exploration guided by deep neural networks.
- Author
-
Karlov DS, Sosnin S, Tetko IV, and Fedorov MV
- Abstract
A parametric t-SNE approach based on deep feed-forward neural networks was applied to the chemical space visualization problem. It is able to retain more information than certain dimensionality reduction techniques used for this purpose (principal component analysis (PCA), multidimensional scaling (MDS)). The applicability of this method to some chemical space navigation tasks (activity cliffs and activity landscapes identification) is discussed. We created a simple web tool to illustrate our work (http://space.syntelly.com)., Competing Interests: IVT is CEO of BIGCHEM GmbH, which licenses the OCHEM (http://ochem.eu) software. The other authors declared that they have no actual or potential conflicts of interests., (This journal is © The Royal Society of Chemistry.)
- Published
- 2019
- Full Text
- View/download PDF
49. Rational design of isonicotinic acid hydrazide derivatives with antitubercular activity: Machine learning, molecular docking, synthesis and biological testing.
- Author
-
Kovalishyn V, Grouleff J, Semenyuta I, Sinenko VO, Slivchuk SR, Hodyna D, Brovarets V, Blagodatny V, Poda G, Tetko IV, and Metelytsia L
- Subjects
- Antitubercular Agents pharmacology, Antitubercular Agents therapeutic use, Bacterial Proteins chemistry, Bacterial Proteins metabolism, Binding Sites, Catalytic Domain, Humans, Isoniazid pharmacology, Isoniazid therapeutic use, Machine Learning, Microbial Sensitivity Tests, Molecular Docking Simulation, Mycobacterium tuberculosis drug effects, Mycobacterium tuberculosis metabolism, Oxidoreductases chemistry, Oxidoreductases metabolism, Tuberculosis, Multidrug-Resistant drug therapy, Tuberculosis, Multidrug-Resistant pathology, Antitubercular Agents chemical synthesis, Drug Design, Isoniazid chemistry
- Abstract
The problem of designing new antitubercular drugs against multiple drug-resistant tuberculosis (MDR-TB) was addressed using advanced machine learning methods. As there are only few published measurements against MDR-TB, we collected a large literature data set and developed models against the non-resistant H37Rv strain. The predictive accuracy of these models had a coefficient of determination q
2 = .7-.8 (regression models) and balanced accuracies of about 80% (classification models) with cross-validation and independent test sets. The models were applied to screen a virtual chemical library, which was designed to have MDR-TB activity. The seven most promising compounds were identified, synthesized and tested. All of them showed activity against the H37Rv strain, and three molecules demonstrated activity against the MDR-TB strain. The docking analysis indicated that the discovered molecules could bind enoyl reductase, InhA, which is required in mycobacterial cell wall development. The models are freely available online (http://ochem.eu/article/103868) and can be used to predict potential anti-TB activity of new chemicals., (© 2018 John Wiley & Sons A/S.)- Published
- 2018
- Full Text
- View/download PDF
50. Luciferase Advisor: High-Accuracy Model To Flag False Positive Hits in Luciferase HTS Assays.
- Author
-
Ghosh D, Koch U, Hadian K, Sattler M, and Tetko IV
- Subjects
- Enzyme Inhibitors chemistry, Enzyme Inhibitors metabolism, False Positive Reactions, Luciferases chemistry, Luciferases metabolism, Molecular Docking Simulation, Protein Conformation, Drug Evaluation, Preclinical methods, Enzyme Inhibitors pharmacology, High-Throughput Screening Assays methods, Luciferases antagonists & inhibitors
- Abstract
Firefly luciferase is an enzyme that has found ubiquitous use in biological assays in high-throughput screening (HTS) campaigns. The inhibition of luciferase in such assays could lead to a false positive result. This issue has been known for a long time, and there have been significant efforts to identify luciferase inhibitors in order to enhance recognition of false positives in screening assays. However, although a large amount of publicly accessible luciferase counterscreen data is available, to date little effort has been devoted to building a chemoinformatic model that can identify such molecules in a given data set. In this study we developed models to identify these molecules using various methods, such as molecular docking, SMARTS screening, pharmacophores, and machine learning methods. Among the structure-based methods, the pharmacophore-based method showed promising results, with a balanced accuracy of 74.2%. However, machine-learning approaches using associative neural networks outperformed all of the other methods explored, producing a final model with a balanced accuracy of 89.7%. The high predictive accuracy of this model is expected to be useful for advising which compounds are potential luciferase inhibitors present in luciferase HTS assays. The models developed in this work are freely available at the OCHEM platform at http://ochem.eu .
- Published
- 2018
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.