171 results on '"Varnek, A."'
Search Results
2. The freedom space - a new set of commercially available molecules for hit discovery.
- Author
-
Protopopov MV, Tararina VV, Bonachera F, Dzyuba IM, Kapeliukha A, Hlotov S, Chuk O, Marcou G, Klimchuk O, Horvath D, Yeghyan E, Savych O, Tarkhanova OO, Varnek A, and Moroz YS
- Subjects
- Databases, Chemical, Small Molecule Libraries chemistry, Small Molecule Libraries pharmacology, Drug Discovery methods
- Abstract
The advent of high-performance virtual screening techniques nowadays allows drug designers to explore ultra-large sets of candidate compounds in search of molecules predicted to have desired properties. However, the success of such an endeavor heavily relies on the pertinence (drug-likeness and, foremost, chemical feasibility) of these candidates, or otherwise, virtual screening will return valueless "hits", by the garbage in/garbage out principle. The huge popularity of the judiciously enumerated Enamine REAL Space is clear proof of the strength of this Big Data trend in drug discovery. Here we describe a new dataset of make-on-demand compounds called the Freedom space. It follows the principles of Enamine REAL Space and contains highly feasible molecules (synthesis success rate over 75 percent). However, the scaffold and chemography analysis revealed significant differences to both the REAL and biologically annotated compounds from the ChEMBL database. The Freedom Space is a significant extension of the REAL Space and can be utilized for a more comprehensive exploration of the synthetically feasible chemical space in hit finding and hit-to-lead campaigns., (© 2024 Wiley-VCH GmbH.)
- Published
- 2024
- Full Text
- View/download PDF
3. Chemography-guided analysis of a reaction path network for ethylene hydrogenation with a model Wilkinson's catalyst.
- Author
-
Gantzer P, Staub R, Harabuchi Y, Maeda S, and Varnek A
- Abstract
Visualization and analysis of large chemical reaction networks become rather challenging when conventional graph-based approaches are used. As an alternative, we propose to use the chemical cartography ("chemography") approach, describing the data distribution on a 2-dimensional map. Here, the Generative Topographic Mapping (GTM) algorithm - an advanced chemography approach - has been applied to visualize the reaction path network of a simplified Wilkinson's catalyst-catalyzed hydrogenation containing some 10
5 structures generated with the help of the Artificial Force Induced Reaction (AFIR) method using either Density Functional Theory or Neural Network Potential (NNP) for potential energy surface calculations. Using new atoms permutation invariant 3D descriptors for structure encoding, we've demonstrated that GTM possesses the abilities to cluster structures that share the same 2D representation, to visualize potential energy surface, to provide an insight on the reaction path exploration as a function of time and to compare reaction path networks obtained with different methods of energy assessment., (© 2024 Wiley-VCH GmbH.)- Published
- 2024
- Full Text
- View/download PDF
4. Implementation of a soft grading system for chemistry in a Moodle plugin: reaction handling.
- Author
-
Plyer L, Marcou G, Perves C, Bonachera F, and Varnek A
- Abstract
Here, we present a new method for evaluating questions on chemical reactions in the context of remote education. This method can be used when binary grading is not sufficient as some tolerance may be acceptable. In order to determine a grade, the developed workflow uses the pairwise similarity assessment of two considered reactions, each encoded by a single molecular graph with the help of the Condensed Graph of Reaction (CGR) approach. This workflow is part of the ChemMoodle project and is implemented as a Moodle Plugin. It uses the Chemdoodle engine for reaction drawing and visualization and communicates with a REST server calculating the similarity score using ISIDA fragment descriptors. The plugin is open-source, accessible in GitHub ( https://github.com/Laboratoire-de-Chemoinformatique/moodle-qtype_reacsimilarity ) and on the Moodle plugin store ( https://moodle.org/plugins/qtype_reacsimilarity?lang=en ). Both similarity measures and fragmentation can be configured.Scientific contribution This work introduces an open-source method for evaluating chemical reaction questions within Moodle using the CGR approach. Our contribution provides a nuanced grading mechanism that accommodates acceptable tolerances in reaction assessments, enhancing the accuracy and flexibility of the grading process., (© 2024. The Author(s).)
- Published
- 2024
- Full Text
- View/download PDF
5. Predicting S. aureus antimicrobial resistance with interpretable genomic space maps.
- Author
-
Pikalyova K, Orlov A, Horvath D, Marcou G, and Varnek A
- Subjects
- Genome, Bacterial, Genomics methods, Humans, Staphylococcus aureus drug effects, Staphylococcus aureus genetics, Anti-Bacterial Agents pharmacology, Drug Resistance, Bacterial genetics, Drug Resistance, Bacterial drug effects, Machine Learning
- Abstract
Increasing antimicrobial resistance (AMR) represents a global healthcare threat. To decrease the spread of AMR and associated mortality, methods for rapid selection of optimal antibiotic treatment are urgently needed. Machine learning (ML) models based on genomic data to predict resistant phenotypes can serve as a fast screening tool prior to phenotypic testing. Nonetheless, many existing ML methods lack interpretability. Therefore, we present a methodology for visualization of sequence space and AMR prediction based on the non-linear dimensionality reduction method - generative topographic mapping (GTM). This approach, applied to AMR data of >5000 S. aureus isolates retrieved from the PATRIC database, yielded GTM models with reasonable accuracy for all drugs (balanced accuracy values ≥0.75). The Generative Topographic Maps (GTMs) represent data in the form of illustrative maps of the genomic space and allow for antibiotic-wise comparison of resistant phenotypes. The maps were also found to be useful for the analysis of genetic determinants responsible for drug resistance. Overall, the GTM-based methodology is a useful tool for both the illustrative exploration of the genomic sequence space and AMR prediction., (© 2024 The Authors. Molecular Informatics published by Wiley-VCH GmbH.)
- Published
- 2024
- Full Text
- View/download PDF
6. Benchmarking of BMDC assay and related QSAR study for identifying sensitizing chemicals.
- Author
-
Chedik L, Baybekov S, Marcou G, Cosnier F, Mourot-Bousquenaud M, Jacquenet S, Varnek A, and Battais F
- Subjects
- Humans, Animals, Support Vector Machine, Computer Simulation, Dermatitis, Allergic Contact, Allergens toxicity, Animal Testing Alternatives methods, Bone Marrow Cells drug effects, Local Lymph Node Assay, Mice, Quantitative Structure-Activity Relationship, Benchmarking, Dendritic Cells drug effects
- Abstract
The Bone-Marrow derived Dendritic Cell (BMDC) test is a promising assay for identifying sensitizing chemicals based on the 3Rs (Replace, Reduce, Refine) principle. This study expanded the BMDC benchmarking to various in vitro, in chemico, and in silico assays targeting different key events (KE) in the skin sensitization pathway, using common substances datasets. Additionally, a Quantitative Structure-Activity Relationship (QSAR) model was developed to predict the BMDC test outcomes for sensitizing or non-sensitizing chemicals. The modeling workflow involved ISIDA (In Silico Design and Data Analysis) molecular fragment descriptors and the SVM (Support Vector Machine) machine-learning method. The BMDC model's performance was at least comparable to that of all ECVAM-validated models regardless of the KE considered. Compared with other tests targeting KE3, related to dendritic cell activation, BMDC assay was shown to have higher balanced accuracy and sensitivity concerning both the Local Lymph Node Assay (LLNA) and human labels, providing additional evidence for its reliability. The consensus QSAR model exhibits promising results, correlating well with observed sensitization potential. Integrated into a publicly available web service, the BMDC-based QSAR model may serve as a cost-effective and rapid alternative to lab experiments, providing preliminary screening for sensitization potential, compound prioritization, optimization and risk assessment., Competing Interests: Declaration of competing interest The authors declare no conflicts of interest., (Copyright © 2024 Elsevier Inc. All rights reserved.)
- Published
- 2024
- Full Text
- View/download PDF
7. Will we ever be able to accurately predict solubility?
- Author
-
Llompart P, Minoletti C, Baybekov S, Horvath D, Marcou G, and Varnek A
- Abstract
Accurate prediction of thermodynamic solubility by machine learning remains a challenge. Recent models often display good performances, but their reliability may be deceiving when used prospectively. This study investigates the origins of these discrepancies, following three directions: a historical perspective, an analysis of the aqueous solubility dataverse and data quality. We investigated over 20 years of published solubility datasets and models, highlighting overlooked datasets and the overlaps between popular sets. We benchmarked recently published models on a novel curated solubility dataset and report poor performances. We also propose a workflow to cure aqueous solubility data aiming at producing useful models for bench chemist. Our results demonstrate that some state-of-the-art models are not ready for public usage because they lack a well-defined applicability domain and overlook historical data sources. We report the impact of factors influencing the utility of the models: interlaboratory standard deviation, ionic state of the solute and data sources. The herein obtained models, and quality-assessed datasets are publicly available., (© 2024. The Author(s).)
- Published
- 2024
- Full Text
- View/download PDF
8. An update of skin permeability data based on a systematic review of recent research.
- Author
-
Chedik L, Baybekov S, Cosnier F, Marcou G, Varnek A, and Champmartin C
- Subjects
- Permeability, Datasets as Topic, Humans, Skin metabolism, Skin Absorption, Xenobiotics metabolism
- Abstract
The cutaneous absorption parameters of xenobiotics are crucial for the development of drugs and cosmetics, as well as for assessing environmental and occupational chemical risks. Despite the great variability in the design of experimental conditions due to uncertain international guidelines, datasets like HuskinDB have been created to report skin absorption endpoints. This review updates available skin permeability data by rigorously compiling research published between 2012 and 2021. Inclusion and exclusion criteria have been selected to build the most harmonized and reusable dataset possible. The Generative Topographic Mapping method was applied to the present dataset and compared to HuskinDB to monitor the progress in skin permeability research and locate chemotypes of particular concern. The open-source dataset (SkinPiX) includes steady-state flux, maximum flux, lag time and permeability coefficient results for the substances tested, as well as relevant information on experimental parameters that can impact the data. It can be used to extract subsets of data for comparisons and to build predictive models., (© 2024. The Author(s).)
- Published
- 2024
- Full Text
- View/download PDF
9. Kinetic solubility: Experimental and machine-learning modeling perspectives.
- Author
-
Baybekov S, Llompart P, Marcou G, Gizzi P, Galzi JL, Ramos P, Saurel O, Bourban C, Minoletti C, and Varnek A
- Subjects
- Solubility, Reproducibility of Results, Water, Machine Learning, Drug Discovery, High-Throughput Screening Assays
- Abstract
Kinetic aqueous or buffer solubility is important parameter measuring suitability of compounds for high throughput assays in early drug discovery while thermodynamic solubility is reserved for later stages of drug discovery and development. Kinetic solubility is also considered to have low inter-laboratory reproducibility because of its sensitivity to protocol parameters [1]. Presumably, this is why little efforts have been put to build QSPR models for kinetic in comparison to thermodynamic aqueous solubility. Here, we investigate the reproducibility and modelability of kinetic solubility assays. We first analyzed the relationship between kinetic and thermodynamic solubility data, and then examined the consistency of data from different kinetic assays. In this contribution, we report differences between kinetic and thermodynamic solubility data that are consistent with those reported by others [1, 2] and good agreement between data from different kinetic solubility campaigns in contrast to general expectations. The latter is confirmed by achieving high performing QSPR models trained on merged kinetic solubility datasets. The poor performance of QSPR model trained on thermodynamic solubility when applied to kinetic solubility dataset reinforces the conclusion that kinetic and thermodynamic solubilities do not correlate: one cannot be used as an ersatz for the other. This encourages for building predictive models for kinetic solubility. The kinetic solubility QSPR model developed in this study is freely accessible through the Predictor web service of the Laboratory of Chemoinformatics (https://chematlas.chimie.unistra.fr/cgi-bin/predictor2.cgi)., (© 2024 Wiley-VCH GmbH.)
- Published
- 2024
- Full Text
- View/download PDF
10. Integrating QSAR modelling and deep learning in drug discovery: the emergence of deep QSAR.
- Author
-
Tropsha A, Isayev O, Varnek A, Schneider G, and Cherkasov A
- Subjects
- Humans, Artificial Intelligence, Computing Methodologies, Quantum Theory, Drug Discovery methods, Drug Design, Quantitative Structure-Activity Relationship, Deep Learning
- Abstract
Quantitative structure-activity relationship (QSAR) modelling, an approach that was introduced 60 years ago, is widely used in computer-aided drug design. In recent years, progress in artificial intelligence techniques, such as deep learning, the rapid growth of databases of molecules for virtual screening and dramatic improvements in computational power have supported the emergence of a new field of QSAR applications that we term 'deep QSAR'. Marking a decade from the pioneering applications of deep QSAR to tasks involved in small-molecule drug discovery, we herein describe key advances in the field, including deep generative and reinforcement learning approaches in molecular design, deep learning models for synthetic planning and the application of deep QSAR models in structure-based virtual screening. We also reflect on the emergence of quantum computing, which promises to further accelerate deep QSAR applications and the need for open-source and democratized resources to support computer-aided drug design., (© 2023. Springer Nature Limited.)
- Published
- 2024
- Full Text
- View/download PDF
11. A community effort in SARS-CoV-2 drug discovery.
- Author
-
Schimunek J, Seidl P, Elez K, Hempel T, Le T, Noé F, Olsson S, Raich L, Winter R, Gokcan H, Gusev F, Gutkin EM, Isayev O, Kurnikova MG, Narangoda CH, Zubatyuk R, Bosko IP, Furs KV, Karpenko AD, Kornoushenko YV, Shuldau M, Yushkevich A, Benabderrahmane MB, Bousquet-Melou P, Bureau R, Charton B, Cirou BC, Gil G, Allen WJ, Sirimulla S, Watowich S, Antonopoulos N, Epitropakis N, Krasoulis A, Itsikalis V, Theodorakis S, Kozlovskii I, Maliutin A, Medvedev A, Popov P, Zaretckii M, Eghbal-Zadeh H, Halmich C, Hochreiter S, Mayr A, Ruch P, Widrich M, Berenger F, Kumar A, Yamanishi Y, Zhang KYJ, Bengio E, Bengio Y, Jain MJ, Korablyov M, Liu CH, Marcou G, Glaab E, Barnsley K, Iyengar SM, Ondrechen MJ, Haupt VJ, Kaiser F, Schroeder M, Pugliese L, Albani S, Athanasiou C, Beccari A, Carloni P, D'Arrigo G, Gianquinto E, Goßen J, Hanke A, Joseph BP, Kokh DB, Kovachka S, Manelfi C, Mukherjee G, Muñiz-Chicharro A, Musiani F, Nunes-Alves A, Paiardi G, Rossetti G, Sadiq SK, Spyrakis F, Talarico C, Tsengenes A, Wade RC, Copeland C, Gaiser J, Olson DR, Roy A, Venkatraman V, Wheeler TJ, Arthanari H, Blaschitz K, Cespugli M, Durmaz V, Fackeldey K, Fischer PD, Gorgulla C, Gruber C, Gruber K, Hetmann M, Kinney JE, Padmanabha Das KM, Pandita S, Singh A, Steinkellner G, Tesseyre G, Wagner G, Wang ZF, Yust RJ, Druzhilovskiy DS, Filimonov DA, Pogodin PV, Poroikov V, Rudik AV, Stolbov LA, Veselovsky AV, De Rosa M, De Simone G, Gulotta MR, Lombino J, Mekni N, Perricone U, Casini A, Embree A, Gordon DB, Lei D, Pratt K, Voigt CA, Chen KY, Jacob Y, Krischuns T, Lafaye P, Zettor A, Rodríguez ML, White KM, Fearon D, Von Delft F, Walsh MA, Horvath D, Brooks CL 3rd, Falsafi B, Ford B, García-Sastre A, Yup Lee S, Naffakh N, Varnek A, Klambauer G, and Hermans TM
- Subjects
- Humans, Pandemics, Biological Assay, Drug Discovery, SARS-CoV-2, COVID-19
- Abstract
The COVID-19 pandemic continues to pose a substantial threat to human lives and is likely to do so for years to come. Despite the availability of vaccines, searching for efficient small-molecule drugs that are widely available, including in low- and middle-income countries, is an ongoing challenge. In this work, we report the results of an open science community effort, the "Billion molecules against COVID-19 challenge", to identify small-molecule inhibitors against SARS-CoV-2 or relevant human receptors. Participating teams used a wide variety of computational methods to screen a minimum of 1 billion virtual molecules against 6 protein targets. Overall, 31 teams participated, and they suggested a total of 639,024 molecules, which were subsequently ranked to find 'consensus compounds'. The organizing team coordinated with various contract research organizations (CROs) and collaborating institutions to synthesize and test 878 compounds for biological activity against proteases (Nsp5, Nsp3, TMPRSS2), nucleocapsid N, RdRP (only the Nsp12 domain), and (alpha) spike protein S. Overall, 27 compounds with weak inhibition/binding were experimentally identified by binding-, cleavage-, and/or viral suppression assays and are presented here. Open science approaches such as the one presented here contribute to the knowledge base of future drug discovery efforts in finding better SARS-CoV-2 treatments., (© 2023 The Authors. Molecular Informatics published by Wiley-VCH GmbH.)
- Published
- 2024
- Full Text
- View/download PDF
12. Multi-Instance Learning Approach to the Modeling of Enantioselectivity of Conformationally Flexible Organic Catalysts.
- Author
-
Zankov D, Madzhidov T, Polishchuk P, Sidorov P, and Varnek A
- Subjects
- Catalysis
- Abstract
Computational design of chiral organic catalysts for asymmetric synthesis is a promising technology that can significantly reduce the material and human resources required for the preparation of enantiopure compounds. Herein, for the modeling of catalysts' enantioselectivity, we propose to use the multi-instance learning approach accounting for multiple catalyst conformers and requiring neither conformer selection nor their spatial alignment. A catalyst was represented by an ensemble of conformers, each encoded by three-dimesinonal (3D) pmapper descriptors. A catalyzed reactant transformation was converted into a single molecular graph, a condensed graph of reaction, encoded by 2D fragment descriptors. A whole chemical reaction was finally encoded by concatenated 3D catalyst and 2D transformation descriptors. The performance of the proposed method was demonstrated in the modeling of the enantioselectivity of homogeneous and phase-transfer reactions and compared with the state-of-the-art approaches.
- Published
- 2023
- Full Text
- View/download PDF
13. School of cheminformatics in Latin America.
- Author
-
Gonzalez-Ponce K, Horta Andrade C, Hunter F, Kirchmair J, Martinez-Mayorga K, Medina-Franco JL, Rarey M, Tropsha A, Varnek A, and Zdrazil B
- Abstract
We report the major highlights of the School of Cheminformatics in Latin America, Mexico City, November 24-25, 2022. Six lectures, one workshop, and one roundtable with four editors were presented during an online public event with speakers from academia, big pharma, and public research institutions. One thousand one hundred eighty-one students and academics from seventy-nine countries registered for the meeting. As part of the meeting, advances in enumeration and visualization of chemical space, applications in natural product-based drug discovery, drug discovery for neglected diseases, toxicity prediction, and general guidelines for data analysis were discussed. Experts from ChEMBL presented a workshop on how to use the resources of this major compounds database used in cheminformatics. The school also included a round table with editors of cheminformatics journals. The full program of the meeting and the recordings of the sessions are publicly available at https://www.youtube.com/@SchoolChemInfLA/featured ., (© 2023. Springer Nature Switzerland AG.)
- Published
- 2023
- Full Text
- View/download PDF
14. Meta-GTM: Visualization and Analysis of the Chemical Library Space.
- Author
-
Pikalyova R, Zabolotna Y, Horvath D, Marcou G, and Varnek A
- Subjects
- Gene Library, Small Molecule Libraries
- Abstract
In chemical library analysis, it may be useful to describe libraries as individual items rather than collections of compounds. This is particularly true for ultra-large noncherry-pickable compound mixtures, such as DNA-encoded libraries (DELs). In this sense, the chemical library space (CLS) is useful for the management of a portfolio of libraries, just like chemical space (CS) helps manage a portfolio of molecules. Several possible CLSs were previously defined using vectorial library representations obtained from generative topographic mapping (GTM). Given the steadily growing number of DEL designs, the CLS becomes "crowded" and requires analysis tools beyond pairwise library comparison. Therefore, herein, we investigate the cartography of CLS on meta-(μ)GTMs─"meta" to remind that these are maps of the CLS, itself based on responsibility vectors issued by regular CS GTMs. 2,5 K DELs and ChEMBL (reference) were projected on the μGTM, producing landscapes of library-specific properties. These describe both interlibrary similarity and intrinsic library characteristics in the same view, herewith facilitating the selection of the best project-specific libraries.
- Published
- 2023
- Full Text
- View/download PDF
15. GENERA: A Combined Genetic/Deep-Learning Algorithm for Multiobjective Target-Oriented De Novo Design.
- Author
-
Lamanna G, Delre P, Marcou G, Saviano M, Varnek A, Horvath D, and Mangiatordi GF
- Subjects
- Humans, Algorithms, Drug Design, Deep Learning, COVID-19
- Abstract
This study introduces a new de novo design algorithm called GENERA that combines the capabilities of a deep-learning algorithm for automated drug-like analogue design, called DeLA-Drug , with a genetic algorithm for generating molecules with desired target-oriented properties. Specifically, GENERA was applied to the angiotensin-converting enzyme 2 (ACE2) target, which is implicated in many pathological conditions, including COVID-19. The ability of GENERA to de novo design promising candidates for a specific target was assessed using two docking programs, PLANTS and GLIDE. A fitness function based on the Pareto dominance resulting from computed PLANTS and GLIDE scores was applied to demonstrate the algorithm's ability to perform multiobjective optimizations effectively. GENERA can quickly generate focused libraries that produce better scores compared to a starting set of known ACE-2 binders. This study is the first to utilize a DL-based algorithm designed for analogue generation as a mutational operator within a GA framework, representing an innovative approach to target-oriented de novo design.
- Published
- 2023
- Full Text
- View/download PDF
16. Conjugated Quantitative Structure-Property Relationship Models: Prediction of Kinetic Characteristics Linked by the Arrhenius Equation.
- Author
-
Varnek A, Zankov D, Madzhidov TI, and Baskin I
- Abstract
Conjugated QSPR models for reactions integrate fundamental chemical laws expressed by mathematical equations with machine learning algorithms. Herein we present a methodology for building conjugated QSPR models integrated with the Arrhenius equation. Conjugated QSPR models were used to predict kinetic characteristics of cycloaddition reactions related by the Arrhenius equation: rate constant log k , pre-exponential factor log A , and activation energy E a . They were benchmarked against single-task (individual and equation-based models) and multi-task models. In individual models, all characteristics were modeled separately, while in multi-task models log k , log A and E a were treated cooperatively. An equation-based model assessed log k using the Arrhenius equation and log A and E a values predicted by individual models. It has been demonstrated that the conjugated QSPR models can accurately predict the reaction rate constants at extreme temperatures, at which reaction rate constants hardly can be measured experimentally. Also, in case of small training sets conjugated models are more robust than related single-task approaches., (© 2023 Wiley-VCH GmbH.)
- Published
- 2023
- Full Text
- View/download PDF
17. Chemical Library Space: Definition and DNA-Encoded Library Comparison Study Case.
- Author
-
Pikalyova R, Zabolotna Y, Horvath D, Marcou G, and Varnek A
- Subjects
- Gene Library, Drug Discovery methods, Small Molecule Libraries chemistry, DNA chemistry
- Abstract
The development of DNA-encoded library (DEL) technology introduced new challenges for the analysis of chemical libraries. It is often useful to consider a chemical library as a stand-alone chemoinformatic object─represented both as a collection of independent molecules, and yet an individual entity─in particular, when they are inseparable mixtures, like DELs. Herein, we introduce the concept of chemical library space (CLS), in which resident items are individual chemical libraries. We define and compare four vectorial library representations obtained using generative topographic mapping. These allow for an effective comparison of libraries, with the ability to tune and chemically interpret the similarity relationships. In particular, property-tuned CLS encodings enable to simultaneously compare libraries with respect to both property and chemotype distributions. We apply the various CLS encodings for the selection problem of DELs that optimally "match" a reference collection (here ChEMBL28), showing how the choice of the CLS descriptors may help to fine-tune the "matching" (overlap) criteria. Hence, the proposed CLS may represent a new efficient way for polyvalent analysis of thousands of chemical libraries. Selection of an easily accessible compound collection for drug discovery, as a substitute for a difficult to produce reference library, can be tuned for either primary or target-focused screening, also considering property distributions of compounds. Alternatively, selection of libraries covering novel regions of the chemical space with respect to a reference compound subspace may serve for library portfolio enrichment.
- Published
- 2023
- Full Text
- View/download PDF
18. Challenges for Kinetics Predictions via Neural Network Potentials: A Wilkinson's Catalyst Case.
- Author
-
Staub R, Gantzer P, Harabuchi Y, Maeda S, and Varnek A
- Subjects
- Kinetics, Hydrogenation, Neural Networks, Computer
- Abstract
Ab initio kinetic studies are important to understand and design novel chemical reactions. While the Artificial Force Induced Reaction (AFIR) method provides a convenient and efficient framework for kinetic studies, accurate explorations of reaction path networks incur high computational costs. In this article, we are investigating the applicability of Neural Network Potentials (NNP) to accelerate such studies. For this purpose, we are reporting a novel theoretical study of ethylene hydrogenation with a transition metal complex inspired by Wilkinson's catalyst, using the AFIR method. The resulting reaction path network was analyzed by the Generative Topographic Mapping method. The network's geometries were then used to train a state-of-the-art NNP model, to replace expensive ab initio calculations with fast NNP predictions during the search. This procedure was applied to run the first NNP-powered reaction path network exploration using the AFIR method. We discovered that such explorations are particularly challenging for general purpose NNP models, and we identified the underlying limitations. In addition, we are proposing to overcome these challenges by complementing NNP models with fast semiempirical predictions. The proposed solution offers a generally applicable framework, laying the foundations to further accelerate ab initio kinetic studies with Machine Learning Force Fields, and ultimately explore larger systems that are currently inaccessible.
- Published
- 2023
- Full Text
- View/download PDF
19. French dispatch: GTM-based analysis of the Chimiothèque Nationale Chemical Space.
- Author
-
Oleneva P, Zabolotna Y, Horvath D, Marcou G, Bonachera F, and Varnek A
- Subjects
- Databases, Chemical
- Abstract
In order to analyze the Chimiothèque Nationale (CN) - The French National Compound Library - in the context of screening and biologically relevant compounds, the library was compared with ZINC in-stock collection and ChEMBL. This includes the study of chemical space coverage, physicochemical properties and Bemis-Murcko (BM) scaffold populations. More than 5 K CN-unique scaffolds (relative to ZINC and ChEMBL collections) were identified. Generative Topographic Maps (GTMs) accommodating those libraries were generated and used to compare the compound populations. Hierarchical GTM («zooming») was applied to generate an ensemble of maps at various resolution levels, from global overview to precise mapping of individual structures. The respective maps were added to the ChemSpace Atlas website. The analysis of synthetic accessibility in the context of combinatorial chemistry showed that only 29,7 % of CN compounds can be fully synthesized using commercially available building blocks., (© 2023 The Authors. Electroanalysis published by Wiley-VCH Verlag GmbH.)
- Published
- 2023
- Full Text
- View/download PDF
20. Predicting Highly Enantioselective Catalysts Using Tunable Fragment Descriptors.
- Author
-
Tsuji N, Sidorov P, Zhu C, Nagata Y, Gimadiev T, Varnek A, and List B
- Abstract
Catalyst optimization processes typically rely on inductive and qualitative assumptions of chemists based on screening data. While machine learning models using molecular properties or calculated 3D structures enable quantitative data evaluation, costly quantum chemical calculations are often required. In contrast, readily available binary fingerprint descriptors are time- and cost-efficient, but their predictive performance remains insufficient. Here, we describe a machine learning model based on fragment descriptors, which are fine-tuned for asymmetric catalysis and represent cyclic or polyaromatic hydrocarbons, enabling robust and efficient virtual screening. Using training data with only moderate selectivities, we designed theoretically and validated experimentally new catalysts showing higher selectivities in a challenging asymmetric tetrahydropyran synthesis., (© 2023 The Authors. Angewandte Chemie International Edition published by Wiley-VCH GmbH.)
- Published
- 2023
- Full Text
- View/download PDF
21. Inverse QSAR: Reversing Descriptor-Driven Prediction Pipeline Using Attention-Based Conditional Variational Autoencoder.
- Author
-
Bort W, Mazitov D, Horvath D, Bonachera F, Lin A, Marcou G, Baskin I, Madzhidov T, and Varnek A
- Subjects
- Molecular Docking Simulation, Quantitative Structure-Activity Relationship
- Abstract
In order to better foramize it, the notorious inverse-QSAR problem (finding structures of given QSAR-predicted properties) is considered in this paper as a two-step process including (i) finding "seed" descriptor vectors corresponding to user-constrained QSAR model output values and (ii) identifying the chemical structures best matching the "seed" vectors. The main development effort here was focused on the latter stage, proposing a new attention-based conditional variational autoencoder neural-network architecture based on recent developments in attention-based methods. The obtained results show that this workflow was capable of generating compounds predicted to display desired activity while being completely novel compared to the training database (ChEMBL). Moreover, the generated compounds show acceptable druglikeness and synthetic accessibility. Both pharmacophore and docking studies were carried out as "orthogonal" in silico validation methods, proving that some of de novo structures are, beyond being predicted active by 2D-QSAR models, clearly able to match binding 3D pharmacophores and bind the protein pocket.
- Published
- 2022
- Full Text
- View/download PDF
22. Implementation of a soft grading system for chemistry in a Moodle plugin.
- Author
-
Plyer L, Marcou G, Perves C, Schurhammer R, and Varnek A
- Abstract
We report a novel approach for grading chemical structure drawings for remote teaching, integrated into the Moodle platform. Typically, existing online platforms use a binary grading system, which often fails to give a nuanced evaluation of the answers given by the students. Therefore, such platforms are unevenly adapted to different disciplines. This is particularly true in the case of chemical structures, where most questions simply cannot be evaluated on a true/false basis. Specifically, a strict comparison of candidate and expected chemical structures is not sufficient when some tolerance is deemed acceptable. To overcome this limitation, we have developed a grading workflow based on the pairwise similarity score of two considered chemical structures. This workflow is implemented as a Moodle plugin, using the Chemdoodle engine for drawing structures and communicating with a REST server to compute the similarity score using molecular descriptors. The plugin ( https://github.com/Laboratoire-de-Chemoinformatique/moodle-qtype_molsimilarity ) is easily adaptable to any academic user; both embedding and similarity measures can be configured., (© 2022. The Author(s).)
- Published
- 2022
- Full Text
- View/download PDF
23. Chemspace Atlas: Multiscale Chemography of Ultralarge Libraries for Drug Discovery.
- Author
-
Zabolotna Y, Bonachera F, Horvath D, Lin A, Marcou G, Klimchuk O, and Varnek A
- Subjects
- DNA chemistry, Gene Library, Zinc, Drug Discovery, Small Molecule Libraries chemistry, Small Molecule Libraries pharmacology
- Abstract
Nowadays, drug discovery is inevitably intertwined with the usage of large compound collections. Understanding of their chemotype composition and physicochemical property profiles is of the highest importance for successful hit identification. Efficient polyfunctional tools allowing multifaceted analysis of constantly growing chemical libraries must be Big Data-compatible. Here, we present the freely accessible ChemSpace Atlas (https://chematlas.chimie.unistra.fr), which includes almost 40K hierarchically organized Generative Topographic Maps (GTM) accommodating up to 500 M compounds covering fragment-like, lead-like, drug-like, PPI-like, and NP-like chemical subspaces. They allow users to navigate and analyze ZINC, ChEMBL, and COCONUT from multiple perspectives on different scales: from a bird's eye view of the entire library to structural pattern detection in small clusters. Around 20 physicochemical properties and almost 750 biological activities can be visualized (associated with map zones), supporting activity profiling and analogue search. Moreover, ChemScape Atlas will be extended toward new chemical subspaces (e.g., DNA-encoded libraries and synthons) and functionalities (ADMETox profiling and property-guided de novo compound generation).
- Published
- 2022
- Full Text
- View/download PDF
24. Editorial: Chemical Reactions Mining.
- Author
-
Madzhidov TI and Varnek A
- Published
- 2022
- Full Text
- View/download PDF
25. In Vitro Evaluation of In Silico Screening Approaches in Search for Selective ACE2 Binding Chemical Probes.
- Author
-
Rayevsky AV, Poturai AS, Kravets IO, Pashenko AE, Borisova TA, Tolstanova GM, Volochnyuk DM, Borysko PO, Vadzyuk OB, Alieksieieva DO, Zabolotna Y, Klimchuk O, Horvath D, Marcou G, Ryabukhin SV, and Varnek A
- Subjects
- Humans, Molecular Docking Simulation, Peptidyl-Dipeptidase A metabolism, RNA, Viral, SARS-CoV-2, Angiotensin-Converting Enzyme 2, COVID-19
- Abstract
New models for ACE2 receptor binding, based on QSAR and docking algorithms were developed, using XRD structural data and ChEMBL 26 database hits as training sets. The selectivity of the potential ACE2-binding ligands towards Neprilysin (NEP) and ACE was evaluated. The Enamine screening collection (3.2 million compounds) was virtually screened according to the above models, in order to find possible ACE2-chemical probes, useful for the study of SARS-CoV2-induced neurological disorders. An enzymology inhibition assay for ACE2 was optimized, and the combined diversified set of predicted selective ACE2-binding molecules from QSAR modeling, docking, and ultrafast docking was screened in vitro. The in vitro hits included two novel chemotypes suitable for further optimization.
- Published
- 2022
- Full Text
- View/download PDF
26. HyFactor: A Novel Open-Source, Graph-Based Architecture for Chemical Structure Generation.
- Author
-
Akhmetshin T, Lin A, Mazitov D, Zabolotna Y, Ziaikin E, Madzhidov T, and Varnek A
- Subjects
- Software
- Abstract
Graph-based architectures are becoming increasingly popular as a tool for structure generation. Here, we introduce novel open-source architecture HyFactor in which, similar to the InChI linear notation, the number of hydrogens attached to the heavy atoms was considered instead of the bond types. HyFactor was benchmarked on the ZINC 250K, MOSES, and ChEMBL data sets against conventional graph-based architecture ReFactor, representing our implementation of the reported DEFactor architecture in the literature. On average, HyFactor models contain some 20% less fitting parameters than those of ReFactor. The two architectures display similar validity, uniqueness, and reconstruction rates. Compared to the training set compounds, HyFactor generates more similar structures than ReFactor. This could be explained by the fact that the latter generates many open-chain analogues of cyclic structures in the training set. It has been demonstrated that the reconstruction error of heavy molecules can be significantly reduced using the data augmentation technique. The codes of HyFactor and ReFactor as well as all models obtained in this study are publicly available from our GitHub repository: https://github.com/Laboratoire-de-Chemoinformatique/HyFactor.
- Published
- 2022
- Full Text
- View/download PDF
27. Toward in Silico Modeling of Dynamic Combinatorial Libraries.
- Author
-
Casciuc I, Osypenko A, Kozibroda B, Horvath D, Marcou G, Bonachera F, Varnek A, and Lehn JM
- Abstract
Dynamic combinatorial libraries (DCLs) display adaptive behavior, enabled by the reversible generation of their molecular constituents from building blocks, in response to external effectors, e.g., protein receptors. So far, chemoinformatics has not yet been used for the design of DCLs-which comprise a radically different set of challenges compared to classical library design. Here, we propose a chemoinformatic model for theoretically assessing the composition of DCLs in the presence and the absence of an effector. An imine-based DCL in interaction with the effector human carbonic anhydrase II (CA II) served as a case study. Support vector regression models for the imine formation constants and imine-CA II binding were derived from, respectively, a set of 276 imines synthesized and experimentally studied in this work and 4350 inhibitors of CA II from ChEMBL. These models predict constants for all DCL constituents, to feed software assessing equilibrium concentrations. They are publicly available on the dedicated website. Models rationally selected two amines and two aldehydes predicted to yield stable imines with high affinity for CA II and provided a virtual illustration on how effector affinity regulates DCL members., Competing Interests: The authors declare no competing financial interest., (© 2022 The Authors. Published by American Chemical Society.)
- Published
- 2022
- Full Text
- View/download PDF
28. Exploration of the Chemical Space of DNA-encoded Libraries.
- Author
-
Pikalyova R, Zabolotna Y, Volochnyuk DM, Horvath D, Marcou G, and Varnek A
- Subjects
- Cheminformatics, Chemistry, Pharmaceutical, DNA chemistry, Drug Discovery methods, Small Molecule Libraries chemistry
- Abstract
DNA-Encoded Library (DEL) technology has emerged as an alternative method for bioactive molecules discovery in medicinal chemistry. It enables the simple synthesis and screening of compound libraries of enormous size. Even though it gains more and more popularity each day, there are almost no reports of chemoinformatics analysis of DEL chemical space. Therefore, in this project, we aimed to generate and analyze the ultra-large chemical space of DEL. Around 2500 DELs were designed using commercially available building blocks resulting in 2,5B DEL compounds that were compared to biologically relevant compounds from ChEMBL using Generative Topographic Mapping. This allowed to choose several optimal DELs covering the chemical space of ChEMBL to the highest extent and thus containing the maximum possible percentage of biologically relevant chemotypes. Different combinations of DELs were also analyzed to identify a set of mutually complementary libraries allowing to attain even higher coverage of ChEMBL than it is possible with one single DEL., (© 2022 Wiley-VCH GmbH.)
- Published
- 2022
- Full Text
- View/download PDF
29. Molecular Similarity Perception Based on Machine-Learning Models.
- Author
-
Gandini E, Marcou G, Bonachera F, Varnek A, Pieraccini S, and Sironi M
- Subjects
- Humans, Perception, Structure-Activity Relationship, Machine Learning, Receptors, Drug
- Abstract
Molecular similarity is an impressively broad topic with many implications in several areas of chemistry. Its roots lie in the paradigm that 'similar molecules have similar properties'. For this reason, methods for determining molecular similarity find wide application in pharmaceutical companies, e.g., in the context of structure-activity relationships. The similarity evaluation is also used in the field of chemical legislation, specifically in the procedure to judge if a new molecule can obtain the status of orphan drug with the consequent financial benefits. For this procedure, the European Medicines Agency uses experts' judgments. It is clear that the perception of the similarity depends on the observer, so the development of models to reproduce the human perception is useful. In this paper, we built models using both 2D fingerprints and 3D descriptors, i.e., molecular shape and pharmacophore descriptors. The proposed models were also evaluated by constructing a dataset of pairs of molecules which was submitted to a group of experts for the similarity judgment. The proposed machine-learning models can be useful to reduce or assist human efforts in future evaluations. For this reason, the new molecules dataset and an online tool for molecular similarity estimation have been made freely available.
- Published
- 2022
- Full Text
- View/download PDF
30. SynthI: A New Open-Source Tool for Synthon-Based Library Design.
- Author
-
Zabolotna Y, Volochnyuk DM, Ryabukhin SV, Gavrylenko K, Horvath D, Klimchuk O, Oksiuta O, Marcou G, and Varnek A
- Subjects
- Indicators and Reagents
- Abstract
Most of the existing computational tools for de novo library design are focused on the generation, rational selection, and combination of promising structural motifs to form members of the new library. However, the absence of a direct link between the chemical space of the retrosynthetically generated fragments and the pool of available reagents makes such approaches appear as rather theoretical and reality-disconnected. In this context, here we present Synthons Interpreter ( SynthI ), a new open-source toolkit for de novo library design that allows merging those two chemical spaces into a single synthons space. Here synthons are defined as actual fragments with valid valences and special labels, specifying the position and the nature of reactive centers. They can be issued from either the "breakup" of reference compounds according to 38 retrosynthetic rules or real reagents, after leaving group withdrawal or transformation. Such an approach not only enables the design of synthetically accessible libraries and analog generation but also facilitates reagents (building blocks) analysis in the medicinal chemistry context. SynthI code is publicly available at https://github.com/Laboratoire-de-Chemoinformatique/SynthI.
- Published
- 2022
- Full Text
- View/download PDF
31. A Close-up Look at the Chemical Space of Commercially Available Building Blocks for Medicinal Chemistry.
- Author
-
Zabolotna Y, Volochnyuk DM, Ryabukhin SV, Horvath D, Gavrilenko KS, Marcou G, Moroz YS, Oksiuta O, and Varnek A
- Subjects
- Indicators and Reagents, Chemistry, Pharmaceutical, Drug Discovery methods
- Abstract
The ability to efficiently synthesize desired compounds can be a limiting factor for chemical space exploration in drug discovery. This ability is conditioned not only by the existence of well-studied synthetic protocols but also by the availability of corresponding reagents, so-called building blocks (BBs). In this work, we present a detailed analysis of the chemical space of 400 000 purchasable BBs. The chemical space was defined by corresponding synthons─fragments contributed to the final molecules upon reaction. They allow an analysis of BB physicochemical properties and diversity, unbiased by the leaving and protective groups in actual reagents. The main classes of BBs were analyzed in terms of their availability, rule-of-two-defined quality, and diversity. Available BBs were eventually compared to a reference set of biologically relevant synthons derived from ChEMBL fragmentation, in order to illustrate how well they cover the actual medicinal chemistry needs. This was performed on a newly constructed universal generative topographic map of synthon chemical space that enables visualization of both libraries and analysis of their overlapped and library-specific regions.
- Published
- 2022
- Full Text
- View/download PDF
32. CGRdb2.0: A Python Database Management System for Molecules, Reactions, and Chemical Data.
- Author
-
Gimadiev T, Nugmanov R, Khakimova A, Fatykhova A, Madzhidov T, Sidorov P, and Varnek A
- Subjects
- Databases, Factual, Benchmarking, Database Management Systems
- Abstract
This work introduces CGRdb2.0─an open-source database management system for molecules, reactions, and chemical data. CGRdb2.0 is a Python package connecting to a PostgreSQL database that enables native searches for molecules and reactions without complicated SQL syntax. The library provides out-of-the-box implementations for similarity and substructure searches for molecules, as well as similarity and substructure searches for reactions in two ways─based on reaction components and based on the Condensed Graph of Reaction approach, the latter significantly accelerating the performance. In benchmarking studies with the RDKit database cartridge, we demonstrate that CGRdb2.0 performs searches faster for smaller data sets, while allowing for interactive access to the retrieved data.
- Published
- 2022
- Full Text
- View/download PDF
33. HIV-1 drug resistance profiling using amino acid sequence space cartography.
- Author
-
Pikalyova K, Orlov A, Lin A, Tarasova O, Marcou M, Horvath D, Poroikov V, and Varnek A
- Subjects
- Humans, Amino Acid Sequence, Retrospective Studies, HIV Reverse Transcriptase chemistry, HIV Reverse Transcriptase genetics, HIV Reverse Transcriptase metabolism, Mutation, HIV Protease genetics, HIV Protease metabolism, Drug Resistance, Drug Resistance, Viral genetics, Genotype, HIV-1 genetics, HIV-1 metabolism, HIV Infections drug therapy
- Abstract
Motivation: Human immunodeficiency virus (HIV) drug resistance is a global healthcare issue. The emergence of drug resistance influenced the efficacy of treatment regimens, thus stressing the importance of treatment adaptation. Computational methods predicting the drug resistance profile from genomic data of HIV isolates are advantageous for monitoring drug resistance in patients. However, existing computational methods for drug resistance prediction are either not suitable for emerging HIV strains with complex mutational patterns or lack interpretability, which is of paramount importance in clinical practice. The approach reported here overcomes these limitations and combines high accuracy of predictions and interpretability of the models., Results: In this work, a new methodology based on generative topographic mapping (GTM) for biological sequence space representation and quantitative genotype-phenotype relationships prediction purposes was introduced. The GTM-based resistance landscapes allowed us to predict the resistance of HIV strains based on sequencing and drug resistance data for three viral proteins [integrase (IN), protease (PR) and reverse transcriptase (RT)] from Stanford HIV drug resistance database. The average balanced accuracy for PR inhibitors was 0.89 ± 0.01, for IN inhibitors 0.85 ± 0.01, for non-nucleoside RT inhibitors 0.73 ± 0.01 and for nucleoside RT inhibitors 0.84 ± 0.01. We have demonstrated in several case studies that GTM-based resistance landscapes are useful for visualization and analysis of sequence space as well as for treatment optimization purposes. Here, GTMs were applied for the in-depth analysis of the relationships between mutation pattern and drug resistance using mutation landscapes. This allowed us to predict retrospectively the importance of the presence of particular mutations (e.g. V32I, L10F and L33F in HIV PR) for the resistance development. This study highlights some perspectives of GTM applications in clinical informatics and particularly in the field of sequence space exploration., Availability and Implementation: https://github.com/karinapikalyova/ISIDASeq., Supplementary Information: Supplementary data are available at Bioinformatics online., (© The Author(s) 2022. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.)
- Published
- 2022
- Full Text
- View/download PDF
34. Atom-to-atom Mapping: A Benchmarking Study of Popular Mapping Algorithms and Consensus Strategies.
- Author
-
Lin A, Dyubankova N, Madzhidov TI, Nugmanov RI, Verhoeven J, Gimadiev TR, Afonina VA, Ibragimova Z, Rakhimbekova A, Sidorov P, Gedich A, Suleymanov R, Mukhametgaleev R, Wegner J, Ceulemans H, and Varnek A
- Subjects
- Algorithms, Databases, Factual, Benchmarking, Biochemical Phenomena
- Abstract
In this paper, we compare the most popular Atom-to-Atom Mapping (AAM) tools: ChemAxon,
[1] Indigo,[2] RDTool,[3] NameRXN (NextMove),[4] and RXNMapper[5] which implement different AAM algorithms. An open-source RDTool program was optimized, and its modified version ("new RDTool") was considered together with several consensus mapping strategies. The Condensed Graph of Reaction approach was used to calculate chemical distances and develop the "AAM fixer" algorithm for an automatized correction of erroneous mapping. The benchmarking calculations were performed on a Golden dataset containing 1851 manually mapped and curated reactions. The best performing RXNMapper program together with the AMM Fixer was applied to map the USPTO database. The Golden dataset, mapped USPTO and optimized RDTool are available in the GitHub repository https://github.com/Laboratoire-de-Chemoinformatique., (© 2021 Wiley-VCH GmbH.)- Published
- 2022
- Full Text
- View/download PDF
35. Computational screening methodology identifies effective solvents for CO 2 capture.
- Author
-
Orlov AA, Valtz A, Coquelet C, Rozanska X, Wimmer E, Marcou G, Horvath D, Poulain B, Varnek A, and de Meyer F
- Abstract
Carbon capture and storage technologies are projected to increasingly contribute to cleaner energy transitions by significantly reducing CO
2 emissions from fossil fuel-driven power and industrial plants. The industry standard technology for CO2 capture is chemical absorption with aqueous alkanolamines, which are often being mixed with an activator, piperazine, to increase the overall CO2 absorption rate. Inefficiency of the process due to the parasitic energy required for thermal regeneration of the solvent drives the search for new tertiary amines with better kinetics. Improving the efficiency of experimental screening using computational tools is challenging due to the complex nature of chemical absorption. We have developed a novel computational approach that combines kinetic experiments, molecular simulations and machine learning for the in silico screening of hundreds of prospective candidates and identify a class of tertiary amines that absorbs CO2 faster than a typical commercial solvent when mixed with piperazine, which was confirmed experimentally., (© 2022. The Author(s).)- Published
- 2022
- Full Text
- View/download PDF
36. Rapid Discrimination of Neuromyelitis Optica Spectrum Disorder and Multiple Sclerosis Using Machine Learning on Infrared Spectra of Sera.
- Author
-
El Khoury Y, Gebelin M, de Sèze J, Patte-Mensah C, Marcou G, Varnek A, Mensah-Nyagan AG, Hellwig P, and Collongues N
- Subjects
- Aquaporin 4, Autoantibodies, Humans, Machine Learning, Myelin-Oligodendrocyte Glycoprotein, Multiple Sclerosis diagnosis, Neuromyelitis Optica
- Abstract
Neuromyelitis optica spectrum disorder (NMOSD) and multiple sclerosis (MS) are both autoimmune inflammatory and demyelinating diseases of the central nervous system. NMOSD is a highly disabling disease and rapid introduction of the appropriate treatment at the acute phase is crucial to prevent sequelae. Specific criteria were established in 2015 and provide keys to distinguish NMOSD and MS. One of the most reliable criteria for NMOSD diagnosis is detection in patient's serum of an antibody that attacks the water channel aquaporin-4 (AQP-4). Another target in NMOSD is myelin oligodendrocyte glycoprotein (MOG), delineating a new spectrum of diseases called MOG-associated diseases. Lastly, patients with NMOSD can be negative for both AQP-4 and MOG antibodies. At disease onset, NMOSD symptoms are very similar to MS symptoms from a clinical and radiological perspective. Thus, at first episode, given the urgency of starting the anti-inflammatory treatment, there is an unmet need to differentiate NMOSD subtypes from MS. Here, we used Fourier transform infrared spectroscopy in combination with a machine learning algorithm with the aim of distinguishing the infrared signatures of sera of a first episode of NMOSD from those of a first episode of relapsing-remitting MS, as well as from those of healthy subjects and patients with chronic inflammatory demyelinating polyneuropathy. Our results showed that NMOSD patients were distinguished from MS patients and healthy subjects with a sensitivity of 100% and a specificity of 100%. We also discuss the distinction between the different NMOSD serostatuses. The coupling of infrared spectroscopy of sera to machine learning is a promising cost-effective, rapid and reliable differential diagnosis tool capable of helping to gain valuable time in patients' treatment.
- Published
- 2022
- Full Text
- View/download PDF
37. Prediction of Optimal Conditions of Hydrogenation Reaction Using the Likelihood Ranking Approach.
- Author
-
Afonina VA, Mazitov DA, Nurmukhametova A, Shevelev MD, Khasanova DA, Nugmanov RI, Burilov VA, Madzhidov TI, and Varnek A
- Subjects
- Hydrogenation, Likelihood Functions, Stereoisomerism, Models, Chemical
- Abstract
The selection of experimental conditions leading to a reasonable yield is an important and essential element for the automated development of a synthesis plan and the subsequent synthesis of the target compound. The classical QSPR approach, requiring one-to-one correspondence between chemical structure and a target property, can be used for optimal reaction conditions prediction only on a limited scale when only one condition component (e.g., catalyst or solvent) is considered. However, a particular reaction can proceed under several different conditions. In this paper, we describe the Likelihood Ranking Model representing an artificial neural network that outputs a list of different conditions ranked according to their suitability to a given chemical transformation. Benchmarking calculations demonstrated that our model outperformed some popular approaches to the theoretical assessment of reaction conditions, such as k Nearest Neighbors, and a recurrent artificial neural network performance prediction of condition components (reagents, solvents, catalysts, and temperature). The ability of the Likelihood Ranking model trained on a hydrogenation reactions dataset, (~42,000 reactions) from Reaxys
® database, to propose conditions that led to the desired product was validated experimentally on a set of three reactions with rich selectivity issues.- Published
- 2021
- Full Text
- View/download PDF
38. Pre-Steady-State Kinetics of the SARS-CoV-2 Main Protease as a Powerful Tool for Antiviral Drug Discovery.
- Author
-
Zakharova MY, Kuznetsova AA, Uvarova VI, Fomina AD, Kozlovskaya LI, Kaliberda EN, Kurbatskaia IN, Smirnov IV, Bulygin AA, Knorre VD, Fedorova OS, Varnek A, Osolodkin DI, Ishmukhametov AA, Egorov AM, Gabibov AG, and Kuznetsov NA
- Abstract
The design of effective target-specific drugs for COVID-19 treatment has become an intriguing challenge for modern science. The SARS-CoV-2 main protease, M
pro , responsible for the processing of SARS-CoV-2 polyproteins and production of individual components of viral replication machinery, is an attractive candidate target for drug discovery. Specific Mpro inhibitors have turned out to be promising anticoronaviral agents. Thus, an effective platform for quantitative screening of Mpro -targeting molecules is urgently needed. Here, we propose a pre-steady-state kinetic analysis of the interaction of Mpro with inhibitors as a basis for such a platform. We examined the kinetic mechanism of peptide substrate binding and cleavage by wild-type Mpro and by its catalytically inactive mutant C145A. The enzyme induces conformational changes of the peptide during the reaction. The inhibition of Mpro by boceprevir, telaprevir, GC-376, PF-00835231, or thimerosal was investigated. Detailed pre-steady-state kinetics of the interaction of the wild-type enzyme with the most potent inhibitor, PF-00835231, revealed a two-step binding mechanism, followed by covalent complex formation. The C145A Mpro mutant interacts with PF-00835231 approximately 100-fold less effectively. Nevertheless, the binding constant of PF-00835231 toward C145A Mpro is still good enough to inhibit the enzyme. Therefore, our results suggest that even noncovalent inhibitor binding due to a fine conformational fit into the active site is sufficient for efficient inhibition. A structure-based virtual screening and a subsequent detailed assessment of inhibition efficacy allowed us to select two compounds as promising noncovalent inhibitor leads of SARS-CoV-2 Mpro ., Competing Interests: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest., (Copyright © 2021 Zakharova, Kuznetsova, Uvarova, Fomina, Kozlovskaya, Kaliberda, Kurbatskaia, Smirnov, Bulygin, Knorre, Fedorova, Varnek, Osolodkin, Ishmukhametov, Egorov, Gabibov and Kuznetsov.)- Published
- 2021
- Full Text
- View/download PDF
39. Reaction Data Curation I: Chemical Structures and Transformations Standardization.
- Author
-
Gimadiev TR, Lin A, Afonina VA, Batyrshin D, Nugmanov RI, Akhmetshin T, Sidorov P, Duybankova N, Verhoeven J, Wegner J, Ceulemans H, Gedich A, Madzhidov TI, and Varnek A
- Subjects
- Databases, Factual, Reference Standards, Data Curation
- Abstract
The quality of experimental data for chemical reactions is a critical consideration for any reaction-driven study. However, the curation of reaction data has not been extensively discussed in the literature so far. Here, we suggest a 4 steps protocol that includes the curation of individual structures (reactants and products), chemical transformations, reaction conditions and endpoints. Its implementation in Python3 using CGRTools toolkit has been used to clean three popular reaction databases Reaxys, USPTO and Pistachio. The curated USPTO database is available in the GitHub repository (Laboratoire-de-Chemoinformatique/Reaction_Data_Cleaning)., (© 2021 Wiley-VCH GmbH.)
- Published
- 2021
- Full Text
- View/download PDF
40. Chemoinformatics-Driven Design of New Physical Solvents for Selective CO 2 Absorption.
- Author
-
Orlov AA, Demenko DY, Bignaud C, Valtz A, Marcou G, Horvath D, Coquelet C, Varnek A, and de Meyer F
- Subjects
- Gases, Solubility, Solvents, Carbon Dioxide, Cheminformatics
- Abstract
The removal of CO
2 from gases is an important industrial process in the transition to a low-carbon economy. The use of selective physical (co-)solvents is especially perspective in cases when the amount of CO2 is large as it enables one to lower the energy requirements for solvent regeneration. However, only a few physical solvents have found industrial application and the design of new ones can pave the way to more efficient gas treatment techniques. Experimental screening of gas solubility is a labor-intensive process, and solubility modeling is a viable strategy to reduce the number of solvents subject to experimental measurements. In this paper, a chemoinformatics-based modeling workflow was applied to build a predictive model for the solubility of CO2 and four other industrially important gases (CO, CH4 , H2 , and N2 ). A dataset containing solubilities of gases in 280 solvents was collected from literature sources and supplemented with the new data for six solvents measured in the present study. A modeling workflow based on the usage of several state-of-the-art machine learning algorithms was applied to establish quantitative structure-solubility relationships. The best models were used to perform virtual screening of the industrially produced chemicals. It enabled the identification of compounds with high predicted CO2 solubility and selectivity toward other gases. The prediction for one of the compounds, 4-methylmorpholine, was confirmed experimentally.- Published
- 2021
- Full Text
- View/download PDF
41. QSAR Modeling Based on Conformation Ensembles Using a Multi-Instance Learning Approach.
- Author
-
Zankov DV, Matveieva M, Nikonenko AV, Nugmanov RI, Baskin II, Varnek A, Polishchuk P, and Madzhidov TI
- Subjects
- Databases, Factual, Drug Discovery, Molecular Conformation, Algorithms, Quantitative Structure-Activity Relationship
- Abstract
Modern QSAR approaches have wide practical applications in drug discovery for designing potentially bioactive molecules. If such models are based on the use of 2D descriptors, important information contained in the spatial structures of molecules is lost. The major problem in constructing models using 3D descriptors is the choice of a putative bioactive conformation, which affects the predictive performance. The multi-instance (MI) learning approach considering multiple conformations in model training could be a reasonable solution to the above problem. In this study, we implemented several multi-instance algorithms, both conventional and based on deep learning, and investigated their performance. We compared the performance of MI-QSAR models with those based on the classical single-instance QSAR (SI-QSAR) approach in which each molecule is encoded by either 2D descriptors computed for the corresponding molecular graph or 3D descriptors issued for a single lowest energy conformation. The calculations were carried out on 175 data sets extracted from the ChEMBL23 database. It is demonstrated that (i) MI-QSAR outperforms SI-QSAR in numerous cases and (ii) MI algorithms can automatically identify plausible bioactive conformations.
- Published
- 2021
- Full Text
- View/download PDF
42. Erratum: CATMoS: Collaborative Acute Toxicity Modeling Suite.
- Author
-
Mansouri K, Karmaus AL, Fitzpatrick J, Patlewicz G, Pradeep P, Alberga D, Alepee N, Allen TEH, Allen D, Alves VM, Andrade CH, Auernhammer TR, Ballabio D, Bell S, Benfenati E, Bhattacharya S, Bastos JV, Boyd S, Brown JB, Capuzzi SJ, Chushak Y, Ciallella H, Clark AM, Consonni V, Daga PR, Ekins S, Farag S, Fedorov M, Fourches D, Gadaleta D, Gao F, Gearhart JM, Goh G, Goodman JM, Grisoni F, Grulke CM, Hartung T, Hirn M, Karpov P, Korotcov A, Lavado GJ, Lawless M, Li X, Luechtefeld T, Lunghini F, Mangiatordi GF, Marcou G, Marsh D, Martin T, Mauri A, Muratov EN, Myatt GJ, Nguyen DT, Nicolotti O, Note R, Pande P, Parks AK, Peryea T, Polash AH, Rallo R, Roncaglioni A, Rowlands C, Ruiz P, Russo DP, Sayed A, Sayre R, Sheils T, Siegel C, Silva AC, Simeonov A, Sosnin S, Southall N, Strickland J, Tang Y, Teppen B, Tetko IV, Thomas D, Tkachenko V, Todeschini R, Toma C, Tripodi I, Trisciuzzi D, Tropsha A, Varnek A, Vukovic K, Wang Z, Wang L, Waters KM, Wedlake AJ, Wijeyesakere SJ, Wilson D, Xiao Z, Yang H, Zahoranszky-Kohalmi G, Zakharov AV, Zhang FF, Zhang Z, Zhao T, Zhu H, Zorn KM, Casey W, and Kleinstreuer NC
- Published
- 2021
- Full Text
- View/download PDF
43. NP Navigator: A New Look at the Natural Product Chemical Space.
- Author
-
Zabolotna Y, Ertl P, Horvath D, Bonachera F, Marcou G, and Varnek A
- Subjects
- Combinatorial Chemistry Techniques, Macromolecular Substances analysis, Zinc chemistry, Biological Products
- Abstract
Natural products (NPs), being evolutionary selected over millions of years to bind to biological macromolecules, remained an important source of inspiration for medicinal chemists even after the advent of efficient drug discovery technologies such as combinatorial chemistry and high-throughput screening. Thus, there is a strong demand for efficient and user-friendly computational tools that allow to analyze large libraries of NPs. In this context, we introduce NP Navigator - a freely available intuitive online tool for visualization and navigation through the chemical space of NPs and NP-like molecules. It is based on the hierarchical ensemble of generative topographic maps, featuring NPs from the COlleCtion of Open NatUral producTs (COCONUT), bioactive compounds from ChEMBL and commercially available molecules from ZINC. NP Navigator allows to efficiently analyze different aspects of NPs - chemotype distribution, physicochemical properties, biological activity and commercial availability of NPs. The latter concerns not only purchasable NPs but also their close analogs that can be considered as synthetic mimetics of NPs or pseudo-NPs., (© 2021 Wiley-VCH GmbH.)
- Published
- 2021
- Full Text
- View/download PDF
44. A critical overview of computational approaches employed for COVID-19 drug discovery.
- Author
-
Muratov EN, Amaro R, Andrade CH, Brown N, Ekins S, Fourches D, Isayev O, Kozakov D, Medina-Franco JL, Merz KM, Oprea TI, Poroikov V, Schneider G, Todd MH, Varnek A, Winkler DA, Zakharov AV, Cherkasov A, and Tropsha A
- Subjects
- Antiviral Agents therapeutic use, COVID-19 virology, Clinical Trials as Topic, Humans, Pandemics, SARS-CoV-2 drug effects, Computer Simulation, Drug Design, Drug Discovery methods, Drug Repositioning, COVID-19 Drug Treatment
- Abstract
COVID-19 has resulted in huge numbers of infections and deaths worldwide and brought the most severe disruptions to societies and economies since the Great Depression. Massive experimental and computational research effort to understand and characterize the disease and rapidly develop diagnostics, vaccines, and drugs has emerged in response to this devastating pandemic and more than 130 000 COVID-19-related research papers have been published in peer-reviewed journals or deposited in preprint servers. Much of the research effort has focused on the discovery of novel drug candidates or repurposing of existing drugs against COVID-19, and many such projects have been either exclusively computational or computer-aided experimental studies. Herein, we provide an expert overview of the key computational methods and their applications for the discovery of COVID-19 small-molecule therapeutics that have been reported in the research literature. We further outline that, after the first year the COVID-19 pandemic, it appears that drug repurposing has not produced rapid and global solutions. However, several known drugs have been used in the clinic to cure COVID-19 patients, and a few repurposed drugs continue to be considered in clinical trials, along with several novel clinical candidates. We posit that truly impactful computational tools must deliver actionable, experimentally testable hypotheses enabling the discovery of novel drugs and drug combinations, and that open science and rapid sharing of research results are critical to accelerate the development of novel, much needed therapeutics for COVID-19.
- Published
- 2021
- Full Text
- View/download PDF
45. Erratum: CATMoS: Collaborative Acute Toxicity Modeling Suite.
- Author
-
Mansouri K, Karmaus A, Fitzpatrick J, Patlewicz G, Pradeep P, Alberga D, Alepee N, Allen TEH, Allen D, Alves VM, Andrade CH, Auernhammer TR, Ballabio D, Bell S, Benfenati E, Bhattacharya S, Bastos JV, Boyd S, Brown JB, Capuzzi SJ, Chushak Y, Ciallella H, Clark AM, Consonni V, Daga PR, Ekins S, Farag S, Fedorov M, Fourches D, Gadaleta D, Gao F, Gearhart JM, Goh G, Goodman JM, Grisoni F, Grulke CM, Hartung T, Hirn M, Karpov P, Korotcov A, Lavado GJ, Lawless M, Li X, Luechtefeld T, Lunghini F, Mangiatordi GF, Marcou G, Marsh D, Martin T, Mauri A, Muratov EN, Myatt GJ, Nguyen DT, Nicolotti O, Note R, Pande P, Parks AK, Peryea T, Polash A, Rallo R, Roncaglioni A, Rowlands C, Ruiz P, Russo D, Sayed A, Sayre R, Sheils T, Siegel C, Silva AC, Simeonov A, Sosnin S, Southall N, Strickland J, Tang Y, Teppen B, Tetko IV, Thomas D, Tkachenko V, Todeschini R, Toma C, Tripodi I, Trisciuzzi D, Tropsha A, Varnek A, Vukovic K, Wang Z, Wang L, Waters KM, Wedlake AJ, Wijeyesakere SJ, Wilson D, Xiao Z, Yang H, Zahoranszky-Kohalmi G, Zakharov AV, Zhang FF, Zhang Z, Zhao T, Zhu H, Zorn KM, Casey W, and Kleinstreuer NC
- Published
- 2021
- Full Text
- View/download PDF
46. DMSO Solubility Assessment for Fragment-Based Screening.
- Author
-
Baybekov S, Marcou G, Ramos P, Saurel O, Galzi JL, and Varnek A
- Abstract
In this paper, we report comprehensive experimental and chemoinformatics analyses of the solubility of small organic molecules ("fragments") in dimethyl sulfoxide (DMSO) in the context of their ability to be tested in screening experiments. Here, DMSO solubility of 939 fragments has been measured experimentally using an NMR technique. A Support Vector Classification model was built on the obtained data using the ISIDA fragment descriptors. The analysis revealed 34 outliers: experimental issues were retrospectively identified for 28 of them. The updated model performs well in 5-fold cross-validation (balanced accuracy = 0.78). The datasets are available on the Zenodo platform (DOI:10.5281/zenodo.4767511) and the model is available on the website of the Laboratory of Chemoinformatics.
- Published
- 2021
- Full Text
- View/download PDF
47. CATMoS: Collaborative Acute Toxicity Modeling Suite.
- Author
-
Mansouri K, Karmaus AL, Fitzpatrick J, Patlewicz G, Pradeep P, Alberga D, Alepee N, Allen TEH, Allen D, Alves VM, Andrade CH, Auernhammer TR, Ballabio D, Bell S, Benfenati E, Bhattacharya S, Bastos JV, Boyd S, Brown JB, Capuzzi SJ, Chushak Y, Ciallella H, Clark AM, Consonni V, Daga PR, Ekins S, Farag S, Fedorov M, Fourches D, Gadaleta D, Gao F, Gearhart JM, Goh G, Goodman JM, Grisoni F, Grulke CM, Hartung T, Hirn M, Karpov P, Korotcov A, Lavado GJ, Lawless M, Li X, Luechtefeld T, Lunghini F, Mangiatordi GF, Marcou G, Marsh D, Martin T, Mauri A, Muratov EN, Myatt GJ, Nguyen DT, Nicolotti O, Note R, Pande P, Parks AK, Peryea T, Polash AH, Rallo R, Roncaglioni A, Rowlands C, Ruiz P, Russo DP, Sayed A, Sayre R, Sheils T, Siegel C, Silva AC, Simeonov A, Sosnin S, Southall N, Strickland J, Tang Y, Teppen B, Tetko IV, Thomas D, Tkachenko V, Todeschini R, Toma C, Tripodi I, Trisciuzzi D, Tropsha A, Varnek A, Vukovic K, Wang Z, Wang L, Waters KM, Wedlake AJ, Wijeyesakere SJ, Wilson D, Xiao Z, Yang H, Zahoranszky-Kohalmi G, Zakharov AV, Zhang FF, Zhang Z, Zhao T, Zhu H, Zorn KM, Casey W, and Kleinstreuer NC
- Subjects
- Animals, Computer Simulation, Rats, Toxicity Tests, Acute, United States, United States Environmental Protection Agency, Government Agencies
- Abstract
Background: Humans are exposed to tens of thousands of chemical substances that need to be assessed for their potential toxicity. Acute systemic toxicity testing serves as the basis for regulatory hazard classification, labeling, and risk management. However, it is cost- and time-prohibitive to evaluate all new and existing chemicals using traditional rodent acute toxicity tests. In silico models built using existing data facilitate rapid acute toxicity predictions without using animals., Objectives: The U.S. Interagency Coordinating Committee on the Validation of Alternative Methods (ICCVAM) Acute Toxicity Workgroup organized an international collaboration to develop in silico models for predicting acute oral toxicity based on five different end points: Lethal Dose 50 ( LD 50 value, U.S. Environmental Protection Agency hazard (four) categories, Globally Harmonized System for Classification and Labeling hazard (five) categories, very toxic chemicals [ LD 50 ( LD 50 ≤ 50 mg / kg )], and nontoxic chemicals ( L D 50 > 2,000 mg / kg )., Methods: An acute oral toxicity data inventory for 11,992 chemicals was compiled, split into training and evaluation sets, and made available to 35 participating international research groups that submitted a total of 139 predictive models. Predictions that fell within the applicability domains of the submitted models were evaluated using external validation sets. These were then combined into consensus models to leverage strengths of individual approaches., Results: The resulting consensus predictions, which leverage the collective strengths of each individual model, form the Collaborative Acute Toxicity Modeling Suite (CATMoS). CATMoS demonstrated high performance in terms of accuracy and robustness when compared with in vivo results., Discussion: CATMoS is being evaluated by regulatory agencies for its utility and applicability as a potential replacement for in vivo rat acute oral toxicity studies. CATMoS predictions for more than 800,000 chemicals have been made available via the National Toxicology Program's Integrated Chemical Environment tools and data sets (ice.ntp.niehs.nih.gov). The models are also implemented in a free, standalone, open-source tool, OPERA, which allows predictions of new and untested chemicals to be made. https://doi.org/10.1289/EHP8495.
- Published
- 2021
- Full Text
- View/download PDF
48. Visualization and Analysis of the REACH-chemical Space with Generative Topographic Mapping.
- Author
-
Lunghini F, Gilles M, Azam P, Enrici MH, Van Miert E, and Varnek A
- Subjects
- Algorithms, Animals, Databases, Factual, Humans, Models, Molecular, Molecular Structure, Organic Chemicals toxicity, Rats, Organic Chemicals analysis
- Abstract
In the framework of REACH (Registration Evaluation Authorization and restriction of Chemicals) regulation, industries have generated and reported a huge amount of (eco)toxicological data on substance produced or imported in Europe. The registration procedure initiated the creation of a large REACH database of well defined (eco)toxicological properties. Here, the data distribution in the REACH chemical space was analyzed with the help of the Generative Topographic Mapping (GTM) approach. GTM generates 2-dimensional maps on which each compound is represented as a data point. The 3
rd dimension can be used in order to display a distribution of the given (eco)toxicological property, which can further be used for property assessment of new compounds projected on the map. We report the "Universal REACH map" which accommodates 11 endpoints, covering environmental fate and (eco)toxicological properties. This map demonstrates acceptable predictive performance: in cross-validation, balanced accuracy ranges from 0.60 to 0.78. The 11 endpoints profile has been computed for each REACH-registered substance. Some concerns related to acute aquatic toxicity have been identified, whereas for environmental fate and human health endpoints the amount of compounds predicted as of concern was much smaller. It has been demonstrated that superposition of several class landscapes allows to select the zones in the chemical space populated by compounds with a given (eco)toxicological profile., (© 2020 Wiley-VCH GmbH.)- Published
- 2021
- Full Text
- View/download PDF
49. Cross-validation strategies in QSPR modelling of chemical reactions.
- Author
-
Rakhimbekova A, Akhmetshin TN, Minibaeva GI, Nugmanov RI, Gimadiev TR, Madzhidov TI, Baskin II, and Varnek A
- Subjects
- Software, Validation Studies as Topic, Models, Chemical, Quantitative Structure-Activity Relationship
- Abstract
In this article, we consider cross-validation of the quantitative structure-property relationship models for reactions and show that the conventional k-fold cross-validation (CV) procedure gives an 'optimistically' biased assessment of prediction performance. To address this issue, we suggest two strategies of model cross-validation, 'transformation-out' CV, and 'solvent-out' CV. Unlike the conventional k-fold cross-validation approach that does not consider the nature of objects, the proposed procedures provide an unbiased estimation of the predictive performance of the models for novel types of structural transformations in chemical reactions and reactions going under new conditions. Both the suggested strategies have been applied to predict the rate constants of bimolecular elimination and nucleophilic substitution reactions, and Diels-Alder cycloaddition. All suggested cross-validation methodologies and tutorial are implemented in the open-source software package CIMtools (https://github.com/cimm-kzn/CIMtools).
- Published
- 2021
- Full Text
- View/download PDF
50. Combined Graph/Relational Database Management System for Calculated Chemical Reaction Pathway Data.
- Author
-
Gimadiev T, Nugmanov R, Batyrshin D, Madzhidov T, Maeda S, Sidorov P, and Varnek A
- Subjects
- Databases, Factual, Algorithms, Database Management Systems
- Abstract
Presently, quantum chemical calculations are widely used to generate extensive data sets for machine learning applications; however, generally, these sets only include information on equilibrium structures and some close conformers. Exploration of potential energy surfaces provides important information on ground and transition states, but analysis of such data is complicated due to the number of possible reaction pathways. Here, we present RePathDB, a database system for managing 3D structural data for both ground and transition states resulting from quantum chemical calculations. Our tool allows one to store, assemble, and analyze reaction pathway data. It combines relational database CGR DB for handling compounds and reactions as molecular graphs with a graph database architecture for pathway analysis by graph algorithms. Original condensed graph of reaction technology is used to store any chemical reaction as a single graph.
- Published
- 2021
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.