176 results on '"A. Varnek"'
Search Results
2. Meta-GTM: Visualization and Analysis of the Chemical Library Space
- Author
-
Pikalyova, Regina, Zabolotna, Yuliana, Horvath, Dragos, Marcou, Gilles, and Varnek, Alexandre
- Abstract
In chemical library analysis, it may be useful to describe libraries as individual items rather than collections of compounds. This is particularly true for ultra-large noncherry-pickable compound mixtures, such as DNA-encoded libraries (DELs). In this sense, the chemical library space (CLS) is useful for the management of a portfolio of libraries, just like chemical space (CS) helps manage a portfolio of molecules. Several possible CLSs were previously defined using vectorial library representations obtained from generative topographic mapping (GTM). Given the steadily growing number of DEL designs, the CLS becomes “crowded” and requires analysis tools beyond pairwise library comparison. Therefore, herein, we investigate the cartography of CLS on meta-(μ)GTMs─“meta” to remind that these are maps of the CLS, itself based on responsibility vectors issued by regular CS GTMs. 2,5 K DELs and ChEMBL (reference) were projected on the μGTM, producing landscapes of library-specific properties. These describe both interlibrary similarity and intrinsic library characteristics in the same view, herewith facilitating the selection of the best project-specific libraries.
- Published
- 2023
- Full Text
- View/download PDF
3. Integrating QSAR modelling and deep learning in drug discovery: the emergence of deep QSAR
- Author
-
Tropsha, Alexander, Isayev, Olexandr, Varnek, Alexandre, Schneider, Gisbert, and Cherkasov, Artem
- Abstract
Quantitative structure–activity relationship (QSAR) modelling, an approach that was introduced 60 years ago, is widely used in computer-aided drug design. In recent years, progress in artificial intelligence techniques, such as deep learning, the rapid growth of databases of molecules for virtual screening and dramatic improvements in computational power have supported the emergence of a new field of QSAR applications that we term ‘deep QSAR’. Marking a decade from the pioneering applications of deep QSAR to tasks involved in small-molecule drug discovery, we herein describe key advances in the field, including deep generative and reinforcement learning approaches in molecular design, deep learning models for synthetic planning and the application of deep QSAR models in structure-based virtual screening. We also reflect on the emergence of quantum computing, which promises to further accelerate deep QSAR applications and the need for open-source and democratized resources to support computer-aided drug design.
- Published
- 2023
- Full Text
- View/download PDF
4. The freedom space – a new set of commercially available molecules for hit discovery
- Author
-
Protopopov, Mykola V., Tararina, Valentyna V., Bonachera, Fanny, Dzyuba, Igor M., Kapeliukha, Anna, Hlotov, Serhii, Chuk, Oleksii, Marcou, Gilles, Klimchuk, Olga, Horvath, Dragos, Yeghyan, Erik, Savych, Olena, Tarkhanova, Olga O., Varnek, Alexandre, and Moroz, Yurii S.
- Abstract
The advent of high‐performance virtual screening techniques nowadays allows drug designers to explore ultra‐large sets of candidate compounds in search of molecules predicted to have desired properties. However, the success of such an endeavor heavily relies on the pertinence (drug‐likeness and, foremost, chemical feasibility) of these candidates, or otherwise, virtual screening will return valueless “hits”, by the garbage in/garbage out principle. The huge popularity of the judiciously enumerated Enamine REAL Space is clear proof of the strength of this Big Data trend in drug discovery. Here we describe a new dataset of make‐on‐demand compounds called the Freedom space. It follows the principles of Enamine REAL Space and contains highly feasible molecules (synthesis success rate over 75 percent). However, the scaffold and chemography analysis revealed significant differences to both the REAL and biologically annotated compounds from the ChEMBL database. The Freedom Space is a significant extension of the REAL Space and can be utilized for a more comprehensive exploration of the synthetically feasible chemical space in hit finding and hit‐to‐lead campaigns.
- Published
- 2024
- Full Text
- View/download PDF
5. Chemspace Atlas: Multiscale Chemography of Ultralarge Libraries for Drug Discovery
- Author
-
Zabolotna, Yuliana, Bonachera, Fanny, Horvath, Dragos, Lin, Arkadii, Marcou, Gilles, Klimchuk, Olga, and Varnek, Alexandre
- Abstract
Nowadays, drug discovery is inevitably intertwined with the usage of large compound collections. Understanding of their chemotype composition and physicochemical property profiles is of the highest importance for successful hit identification. Efficient polyfunctional tools allowing multifaceted analysis of constantly growing chemical libraries must be Big Data-compatible. Here, we present the freely accessible ChemSpace Atlas (https://chematlas.chimie.unistra.fr), which includes almost 40K hierarchically organized Generative Topographic Maps (GTM) accommodating up to 500 M compounds covering fragment-like, lead-like, drug-like, PPI-like, and NP-like chemical subspaces. They allow users to navigate and analyze ZINC, ChEMBL, and COCONUT from multiple perspectives on different scales: from a bird’s eye view of the entire library to structural pattern detection in small clusters. Around 20 physicochemical properties and almost 750 biological activities can be visualized (associated with map zones), supporting activity profiling and analogue search. Moreover, ChemScape Atlas will be extended toward new chemical subspaces (e.g., DNA-encoded libraries and synthons) and functionalities (ADMETox profiling and property-guided de novo compound generation).
- Published
- 2022
- Full Text
- View/download PDF
6. HyFactor: A Novel Open-Source, Graph-Based Architecture for Chemical Structure Generation
- Author
-
Akhmetshin, Tagir, Lin, Arkadii, Mazitov, Daniyar, Zabolotna, Yuliana, Ziaikin, Evgenii, Madzhidov, Timur, and Varnek, Alexandre
- Abstract
Graph-based architectures are becoming increasingly popular as a tool for structure generation. Here, we introduce novel open-source architecture HyFactor in which, similar to the InChI linear notation, the number of hydrogens attached to the heavy atoms was considered instead of the bond types. HyFactor was benchmarked on the ZINC 250K, MOSES, and ChEMBL data sets against conventional graph-based architecture ReFactor, representing our implementation of the reported DEFactor architecture in the literature. On average, HyFactor models contain some 20% less fitting parameters than those of ReFactor. The two architectures display similar validity, uniqueness, and reconstruction rates. Compared to the training set compounds, HyFactor generates more similar structures than ReFactor. This could be explained by the fact that the latter generates many open-chain analogues of cyclic structures in the training set. It has been demonstrated that the reconstruction error of heavy molecules can be significantly reduced using the data augmentation technique. The codes of HyFactor and ReFactor as well as all models obtained in this study are publicly available from our GitHub repository: https://github.com/Laboratoire-de-Chemoinformatique/HyFactor.
- Published
- 2022
- Full Text
- View/download PDF
7. Toward in Silico Modeling of Dynamic Combinatorial Libraries
- Author
-
Casciuc, Iuri, Osypenko, Artem, Kozibroda, Bohdan, Horvath, Dragos, Marcou, Gilles, Bonachera, Fanny, Varnek, Alexandre, and Lehn, Jean-Marie
- Abstract
Dynamic combinatorial libraries (DCLs) display adaptive behavior, enabled by the reversible generation of their molecular constituents from building blocks, in response to external effectors, e.g., protein receptors. So far, chemoinformatics has not yet been used for the design of DCLs─which comprise a radically different set of challenges compared to classical library design. Here, we propose a chemoinformatic model for theoretically assessing the composition of DCLs in the presence and the absence of an effector. An imine-based DCL in interaction with the effector human carbonic anhydrase II (CA II) served as a case study. Support vector regression models for the imine formation constants and imine-CA II binding were derived from, respectively, a set of 276 imines synthesized and experimentally studied in this work and 4350 inhibitors of CA II from ChEMBL. These models predict constants for all DCL constituents, to feed software assessing equilibrium concentrations. They are publicly available on the dedicated website. Models rationally selected two amines and two aldehydes predicted to yield stable imines with high affinity for CA II and provided a virtual illustration on how effector affinity regulates DCL members.
- Published
- 2022
- Full Text
- View/download PDF
8. SynthI: A New Open-Source Tool for Synthon-Based Library Design
- Author
-
Zabolotna, Yuliana, Volochnyuk, Dmitriy M., Ryabukhin, Sergey V., Gavrylenko, Kostiantyn, Horvath, Dragos, Klimchuk, Olga, Oksiuta, Oleksandr, Marcou, Gilles, and Varnek, Alexandre
- Abstract
Most of the existing computational tools for de novo library design are focused on the generation, rational selection, and combination of promising structural motifs to form members of the new library. However, the absence of a direct link between the chemical space of the retrosynthetically generated fragments and the pool of available reagents makes such approaches appear as rather theoretical and reality-disconnected. In this context, here we present Synthons Interpreter (SynthI), a new open-source toolkit for de novo library design that allows merging those two chemical spaces into a single synthons space. Here synthons are defined as actual fragments with valid valences and special labels, specifying the position and the nature of reactive centers. They can be issued from either the “breakup” of reference compounds according to 38 retrosynthetic rules or real reagents, after leaving group withdrawal or transformation. Such an approach not only enables the design of synthetically accessible libraries and analog generation but also facilitates reagents (building blocks) analysis in the medicinal chemistry context. SynthI code is publicly available at https://github.com/Laboratoire-de-Chemoinformatique/SynthI.
- Published
- 2022
- Full Text
- View/download PDF
9. A Close-up Look at the Chemical Space of Commercially Available Building Blocks for Medicinal Chemistry
- Author
-
Zabolotna, Yuliana, Volochnyuk, Dmitriy M., Ryabukhin, Sergey V., Horvath, Dragos, Gavrilenko, Konstantin S., Marcou, Gilles, Moroz, Yurii S., Oksiuta, Oleksandr, and Varnek, Alexandre
- Abstract
The ability to efficiently synthesize desired compounds can be a limiting factor for chemical space exploration in drug discovery. This ability is conditioned not only by the existence of well-studied synthetic protocols but also by the availability of corresponding reagents, so-called building blocks (BBs). In this work, we present a detailed analysis of the chemical space of 400 000 purchasable BBs. The chemical space was defined by corresponding synthons─fragments contributed to the final molecules upon reaction. They allow an analysis of BB physicochemical properties and diversity, unbiased by the leaving and protective groups in actual reagents. The main classes of BBs were analyzed in terms of their availability, rule-of-two-defined quality, and diversity. Available BBs were eventually compared to a reference set of biologically relevant synthons derived from ChEMBL fragmentation, in order to illustrate how well they cover the actual medicinal chemistry needs. This was performed on a newly constructed universal generative topographic map of synthon chemical space that enables visualization of both libraries and analysis of their overlapped and library-specific regions.
- Published
- 2022
- Full Text
- View/download PDF
10. CGRdb2.0: A Python Database Management System for Molecules, Reactions, and Chemical Data
- Author
-
Gimadiev, Timur, Nugmanov, Ramil, Khakimova, Aigul, Fatykhova, Adeliya, Madzhidov, Timur, Sidorov, Pavel, and Varnek, Alexandre
- Abstract
This work introduces CGRdb2.0─an open-source database management system for molecules, reactions, and chemical data. CGRdb2.0 is a Python package connecting to a PostgreSQL database that enables native searches for molecules and reactions without complicated SQL syntax. The library provides out-of-the-box implementations for similarity and substructure searches for molecules, as well as similarity and substructure searches for reactions in two ways─based on reaction components and based on the Condensed Graph of Reaction approach, the latter significantly accelerating the performance. In benchmarking studies with the RDKit database cartridge, we demonstrate that CGRdb2.0 performs searches faster for smaller data sets, while allowing for interactive access to the retrieved data.
- Published
- 2022
- Full Text
- View/download PDF
11. Chemoinformatics Approaches to Virtual Screening. Von Alexandre Varnek und Alexander Tropsha Hrsg.
- Author
-
WegscheidGerlach, Christof
- Abstract
no abstract
- Published
- 2009
- Full Text
- View/download PDF
12. QSAR Modeling Based on Conformation Ensembles Using a Multi-Instance Learning Approach
- Author
-
Zankov, Dmitry V., Matveieva, Mariia, Nikonenko, Aleksandra V., Nugmanov, Ramil I., Baskin, Igor I., Varnek, Alexandre, Polishchuk, Pavel, and Madzhidov, Timur I.
- Abstract
Modern QSAR approaches have wide practical applications in drug discovery for designing potentially bioactive molecules. If such models are based on the use of 2D descriptors, important information contained in the spatial structures of molecules is lost. The major problem in constructing models using 3D descriptors is the choice of a putative bioactive conformation, which affects the predictive performance. The multi-instance (MI) learning approach considering multiple conformations in model training could be a reasonable solution to the above problem. In this study, we implemented several multi-instance algorithms, both conventional and based on deep learning, and investigated their performance. We compared the performance of MI-QSAR models with those based on the classical single-instance QSAR (SI-QSAR) approach in which each molecule is encoded by either 2D descriptors computed for the corresponding molecular graph or 3D descriptors issued for a single lowest energy conformation. The calculations were carried out on 175 data sets extracted from the ChEMBL23 database. It is demonstrated that (i) MI-QSAR outperforms SI-QSAR in numerous cases and (ii) MI algorithms can automatically identify plausible bioactive conformations.
- Published
- 2021
- Full Text
- View/download PDF
13. Predicting S. aureusantimicrobial resistance with interpretable genomic space maps
- Author
-
Pikalyova, Karina, Orlov, Alexey, Horvath, Dragos, Marcou, Gilles, and Varnek, Alexandre
- Abstract
Increasing antimicrobial resistance (AMR) represents a global healthcare threat. To decrease the spread of AMR and associated mortality, methods for rapid selection of optimal antibiotic treatment are urgently needed. Machine learning (ML) models based on genomic data to predict resistant phenotypes can serve as a fast screening tool prior to phenotypic testing. Nonetheless, many existing ML methods lack interpretability. Therefore, we present a methodology for visualization of sequence space and AMR prediction based on the non‐linear dimensionality reduction method – generative topographic mapping (GTM). This approach, applied to AMR data of >5000 S. aureusisolates retrieved from the PATRIC database, yielded GTM models with reasonable accuracy for all drugs (balanced accuracy values ≥0.75). The Generative Topographic Maps (GTMs) represent data in the form of illustrative maps of the genomic space and allow for antibiotic‐wise comparison of resistant phenotypes. The maps were also found to be useful for the analysis of genetic determinants responsible for drug resistance. Overall, the GTM‐based methodology is a useful tool for both the illustrative exploration of the genomic sequence space and AMR prediction.
- Published
- 2024
- Full Text
- View/download PDF
14. Computer-Aided Design of New Physical Solvents for Hydrogen Sulfide Absorption
- Author
-
Orlov, Alexey A., Marcou, Gilles, Horvath, Dragos, Cabodevilla, Alvaro Echeverria, Varnek, Alexandre, and Meyer, Frédérick de
- Abstract
Treatment of hydrogen sulfide (H2S) is important in many industrial processes including oil refineries, natural and biogas processing, and coal gasification. The most mature technology for the selective capture of H2S is based on its absorption by chemical or physical solvents. However, only several compounds are currently used as physical (co)solvents in industry, and the search for new ones is an important task. The experimental screening of physical (co)solvents requires much time and many resources, while solubility modeling might enable one to reduce the number of solvents for the experimental evaluation. In this study, a workflow for the in silicodiscovery of new physical solvents for H2S absorption was suggested and experimentally validated. A data set composed of 99 H2S physical solvents was collected and predictive quantitative structure–property relationships for H2S solubility were built using a random forest algorithm and two types of molecular descriptors: ISIDA fragments and quantum-chemical descriptors. Virtual screening of industrially produced chemicals and their structural analogues enabled identification of the ones with predicted high solubility values. They can be suggested as starting points for further exploration of the H2S physical solvents chemical space. The predicted solubility value for one of the compounds found in virtual screening, 1,3-dimethyl-2-imidazolidinone, was confirmed experimentally.
- Published
- 2021
- Full Text
- View/download PDF
15. Combined Graph/Relational Database Management System for Calculated Chemical Reaction Pathway Data
- Author
-
Gimadiev, Timur, Nugmanov, Ramil, Batyrshin, Dinar, Madzhidov, Timur, Maeda, Satoshi, Sidorov, Pavel, and Varnek, Alexandre
- Abstract
Presently, quantum chemical calculations are widely used to generate extensive data sets for machine learning applications; however, generally, these sets only include information on equilibrium structures and some close conformers. Exploration of potential energy surfaces provides important information on ground and transition states, but analysis of such data is complicated due to the number of possible reaction pathways. Here, we present RePathDB, a database system for managing 3D structural data for both ground and transition states resulting from quantum chemical calculations. Our tool allows one to store, assemble, and analyze reaction pathway data. It combines relational database CGR DB for handling compounds and reactions as molecular graphs with a graph database architecture for pathway analysis by graph algorithms. Original condensed graph of reaction technology is used to store any chemical reaction as a single graph.
- Published
- 2021
- Full Text
- View/download PDF
16. Chemography: Searching for Hidden Treasures
- Author
-
Zabolotna, Yuliana, Lin, Arkadii, Horvath, Dragos, Marcou, Gilles, Volochnyuk, Dmitriy M., and Varnek, Alexandre
- Abstract
The days when medicinal chemistry was limited to a few series of compounds of therapeutic interest are long gone. Nowadays, no human may succeed to acquire a complete overview of more than a billion existing or feasible compounds within which the potential “blockbuster drugs” are well hidden and yet only a few mouse clicks away. To reach these “hidden treasures”, we adapted the generative topographic mapping method to enable efficient navigation through the chemical space, from a global overview to a structural pattern detection, covering, for the first time, the complete ZINC library of purchasable compounds, relative to 1.6 million biologically relevant ChEMBL molecules. About 40 000 hierarchical maps of the chemical space were constructed. Structural motifs inherent to only one library were identified. Roughly 20 000 off-market ChEMBL compound families represent incentives to enrich commercial catalogs. Alternatively, 125 000 ZINC-specific compound classes, absent in structure–activity bases, are novel paths to explore in medicinal chemistry. The complete list of these chemotypes can be downloaded using the link https://forms.gle/B6bUJj82t9EfmttV6.
- Published
- 2021
- Full Text
- View/download PDF
17. Trustworthiness, the Key to Grid-Based Map-Driven Predictive Model Enhancement and Applicability Domain Control
- Author
-
Horvath, Dragos, Marcou, Gilles, and Varnek, Alexandre
- Abstract
In chemography, grid-based maps sample molecular descriptor space by injecting a set of nodes, and then linking them to some regular 2D grid representing the map. They include self-organizing maps (SOMs) and generative topographic maps (GTMs). Grid-based maps are predictive because any compound thereupon projected can “inherit” the properties of its residence node(s)—node properties themselves “inherited” from node-neighboring training set compounds. This Article proposes a formalism to define the trustworthiness of these nodes as “providers” of structure–activity information captured from training compounds. An empirical four-parameter node trustworthiness (NT) function of density (sparsely populated nodes are less trustworthy) and coherence (nodes with training set residents of divergent properties are less trustworthy) is proposed. Based upon it, a trustworthiness score Tis used to delimit the applicability domain (AD) by means of a trustworthiness threshold TT. For each parameter setup, success of ensuing inside-AD predictions is monitored. It is seen that setup-specific success levels (averaged over large pools of prediction challenges) are highly covariant, irrespectively of the targets of prediction challenges, of the (classification or regression) type of problems, of the specific parametrization, and even of the nature (GTM or SOM) of underlying maps. Thus, success levels determined on the basis of regression problems (445 target-specific affinity QSAR sets) on GTMs and levels returned by completely unrelated classification problems (319 target-specific active-/inactive-labeled sets) on SOMs were seen to correlate to a degree of 70%. Therefore, a common, general-purpose setup of the herein proposed parametric AD definition was shown to generally apply to grid-based map-driven property prediction problems.
- Published
- 2020
- Full Text
- View/download PDF
18. “Big Data” Fast Chemoinformatics Model to Predict Generalized Born Radius and Solvent Accessibility as a Function of Geometry
- Author
-
Horvath, Dragos, Marcou, Gilles, and Varnek, Alexandre
- Abstract
The Generalized Born (GB) solvent model is offering the best accuracy/computing effort ratio yet requires drastic simplifications to estimate of the Effective Born Radii (EBR) in bypassing a too expensive volume integration step. EBRs are a measure of the degree of burial of an atom and not very sensitive to small changes of geometry: in molecular dynamics, the costly EBR update procedure is not mandatory at every step. This work however aims at implementing a GB model into the Sampler for Multiple Protein–Ligand Entities (S4MPLE) evolutionary algorithm with mandatory EBR updates at each step triggering arbitrarily large geometric changes. Therefore, a quantitative structure–property relationship has been developed in order to express the EBRs as a linear function of both the topological neighborhood and geometric occupancy of the space around atoms. A training set of 810 molecular systems, starting from fragment-like to drug-like compounds, proteins, host–guest systems, and ligand–protein complexes, has been compiled. For each species, S4MPLE generated several hundreds of random conformers. For each atom in each geometry of each species, its “standard” EBR was calculated by numeric integration and associated to topological and geometric descriptors of the atom neighborhood. This training set (EBR, atom descriptors) involving >5 M entries was subjected to a boot-strapping multilinear regression process with descriptor selection. In parallel, the strategy was repurposed to also learn atomic solvent-accessible areas (SA) based on the same descriptors. Resulting linear equations were challenged to predict EBR and SA values for a similarly compiled external set of >2000 new molecular systems. Solvation energies calculated with estimated EBR and SA match “standard” energies within the typical error of a force-field-based approach (a few kilocalories per mole). Given the extreme diversity of molecular systems covered by the model, this simple EBR/SA estimator covers a vast applicability domain.
- Published
- 2020
- Full Text
- View/download PDF
19. Kinetic solubility: Experimental and machine‐learning modeling perspectives
- Author
-
Baybekov, Shamkhal, Llompart, Pierre, Marcou, Gilles, Gizzi, Patrick, Galzi, Jean‐Luc, Ramos, Pascal, Saurel, Olivier, Bourban, Claire, Minoletti, Claire, and Varnek, Alexandre
- Abstract
Kinetic aqueous or buffer solubility is important parameter measuring suitability of compounds for high throughput assays in early drug discovery while thermodynamic solubility is reserved for later stages of drug discovery and development. Kinetic solubility is also considered to have low inter‐laboratory reproducibility because of its sensitivity to protocol parameters [1]. Presumably, this is why little efforts have been put to build QSPR models for kinetic in comparison to thermodynamic aqueous solubility. Here, we investigate the reproducibility and modelability of kinetic solubility assays. We first analyzed the relationship between kinetic and thermodynamic solubility data, and then examined the consistency of data from different kinetic assays. In this contribution, we report differences between kinetic and thermodynamic solubility data that are consistent with those reported by others [1, 2] and good agreement between data from different kinetic solubility campaigns in contrast to general expectations. The latter is confirmed by achieving high performing QSPR models trained on merged kinetic solubility datasets. The poor performance of QSPR model trained on thermodynamic solubility when applied to kinetic solubility dataset reinforces the conclusion that kinetic and thermodynamic solubilities do not correlate: one cannot be used as an ersatz for the other. This encourages for building predictive models for kinetic solubility. The kinetic solubility QSPR model developed in this study is freely accessible through the Predictor web service of the Laboratory of Chemoinformatics (https://chematlas.chimie.unistra.fr/cgi‐bin/predictor2.cgi). This contribution presents a new publicly available dataset of kinetic solubility for 56k compounds, a comparison of kinetic and thermodynamic measurements and new publicly available QSPR models.
- Published
- 2024
- Full Text
- View/download PDF
20. A community effort in SARS‐CoV‐2 drug discovery
- Author
-
Schimunek, Johannes, Seidl, Philipp, Elez, Katarina, Hempel, Tim, Le, Tuan, Noé, Frank, Olsson, Simon, Raich, Lluís, Winter, Robin, Gokcan, Hatice, Gusev, Filipp, Gutkin, Evgeny M., Isayev, Olexandr, Kurnikova, Maria G., Narangoda, Chamali H., Zubatyuk, Roman, Bosko, Ivan P., Furs, Konstantin V., Karpenko, Anna D., Kornoushenko, Yury V., Shuldau, Mikita, Yushkevich, Artsemi, Benabderrahmane, Mohammed B., Bousquet‐Melou, Patrick, Bureau, Ronan, Charton, Beatrice, Cirou, Bertrand C., Gil, Gérard, Allen, William J., Sirimulla, Suman, Watowich, Stanley, Antonopoulos, Nick, Epitropakis, Nikolaos, Krasoulis, Agamemnon, Itsikalis, Vassilis, Theodorakis, Stavros, Kozlovskii, Igor, Maliutin, Anton, Medvedev, Alexander, Popov, Petr, Zaretckii, Mark, Eghbal‐Zadeh, Hamid, Halmich, Christina, Hochreiter, Sepp, Mayr, Andreas, Ruch, Peter, Widrich, Michael, Berenger, Francois, Kumar, Ashutosh, Yamanishi, Yoshihiro, Zhang, Kam Y. J., Bengio, Emmanuel, Bengio, Yoshua, Jain, Moksh J., Korablyov, Maksym, Liu, Cheng‐Hao, Marcou, Gilles, Glaab, Enrico, Barnsley, Kelly, Iyengar, Suhasini M., Ondrechen, Mary Jo, Haupt, V. Joachim, Kaiser, Florian, Schroeder, Michael, Pugliese, Luisa, Albani, Simone, Athanasiou, Christina, Beccari, Andrea, Carloni, Paolo, D'Arrigo, Giulia, Gianquinto, Eleonora, Goßen, Jonas, Hanke, Anton, Joseph, Benjamin P., Kokh, Daria B., Kovachka, Sandra, Manelfi, Candida, Mukherjee, Goutam, Muñiz‐Chicharro, Abraham, Musiani, Francesco, Nunes‐Alves, Ariane, Paiardi, Giulia, Rossetti, Giulia, Sadiq, S. Kashif, Spyrakis, Francesca, Talarico, Carmine, Tsengenes, Alexandros, Wade, Rebecca C., Copeland, Conner, Gaiser, Jeremiah, Olson, Daniel R., Roy, Amitava, Venkatraman, Vishwesh, Wheeler, Travis J., Arthanari, Haribabu, Blaschitz, Klara, Cespugli, Marco, Durmaz, Vedat, Fackeldey, Konstantin, Fischer, Patrick D., Gorgulla, Christoph, Gruber, Christian, Gruber, Karl, Hetmann, Michael, Kinney, Jamie E., Padmanabha Das, Krishna M., Pandita, Shreya, Singh, Amit, Steinkellner, Georg, Tesseyre, Guilhem, Wagner, Gerhard, Wang, Zi‐Fu, Yust, Ryan J., Druzhilovskiy, Dmitry S., Filimonov, Dmitry A., Pogodin, Pavel V., Poroikov, Vladimir, Rudik, Anastassia V., Stolbov, Leonid A., Veselovsky, Alexander V., De Rosa, Maria, De Simone, Giada, Gulotta, Maria R., Lombino, Jessica, Mekni, Nedra, Perricone, Ugo, Casini, Arturo, Embree, Amanda, Gordon, D. Benjamin, Lei, David, Pratt, Katelin, Voigt, Christopher A., Chen, Kuang‐Yu, Jacob, Yves, Krischuns, Tim, Lafaye, Pierre, Zettor, Agnès, Rodríguez, M. Luis, White, Kris M., Fearon, Daren, Von Delft, Frank, Walsh, Martin A., Horvath, Dragos, Brooks, Charles L., Falsafi, Babak, Ford, Bryan, García‐Sastre, Adolfo, Yup Lee, Sang, Naffakh, Nadia, Varnek, Alexandre, Klambauer, Günter, and Hermans, Thomas M.
- Abstract
The COVID‐19 pandemic continues to pose a substantial threat to human lives and is likely to do so for years to come. Despite the availability of vaccines, searching for efficient small‐molecule drugs that are widely available, including in low‐ and middle‐income countries, is an ongoing challenge. In this work, we report the results of an open science community effort, the “Billion molecules against COVID‐19 challenge”, to identify small‐molecule inhibitors against SARS‐CoV‐2 or relevant human receptors. Participating teams used a wide variety of computational methods to screen a minimum of 1 billion virtual molecules against 6 protein targets. Overall, 31 teams participated, and they suggested a total of 639,024 molecules, which were subsequently ranked to find ‘consensus compounds’. The organizing team coordinated with various contract research organizations (CROs) and collaborating institutions to synthesize and test 878 compounds for biological activity against proteases (Nsp5, Nsp3, TMPRSS2), nucleocapsid N, RdRP (only the Nsp12 domain), and (alpha) spike protein S. Overall, 27 compounds with weak inhibition/binding were experimentally identified by binding‐, cleavage‐, and/or viral suppression assays and are presented here. Open science approaches such as the one presented here contribute to the knowledge base of future drug discovery efforts in finding better SARS‐CoV‐2 treatments.
- Published
- 2024
- Full Text
- View/download PDF
21. Conjugated Quantitative Structure–Property Relationship Models: Application to Simultaneous Prediction of Tautomeric Equilibrium Constants and Acidity of Molecules
- Author
-
Zankov, Dmitry V., Madzhidov, Timur I., Rakhimbekova, Assima, Gimadiev, Timur R., Nugmanov, Ramil I., Kazymova, Marina A., Baskin, Igor I., and Varnek, Alexandre
- Abstract
Here, we describe a concept of conjugated models for several properties (activities) linked by a strict mathematical relationship. This relationship can be directly integrated analytically into the ridge regression (RR) algorithm or accounted for in a special case of “twin” neural networks (NN). Developed approaches were applied to the modeling of the logarithm of the prototropic tautomeric constant (logKT) which can be expressed as the difference between the acidity constants (pKa) of two related tautomers. Both conjugated and individual RR and NN models for logKTand pKa were developed. The modeling set included 639 tautomeric constants and 2371 acidity constants of organic molecules in various solvents. A descriptor vector for each reaction resulted from the concatenation of structural descriptors and some parameters for reaction conditions. For the former, atom-centered substructural fragments describing acid sites in tautomer molecules were used. The latter were automatically identified using the condensed graph of reaction approach. Conjugated models performed similarly to the best individual models for logKTand pKa. At the same time, the physically grounded relationship between logKTand pKa was respected only for conjugated but not individual models.
- Published
- 2019
- Full Text
- View/download PDF
22. CGRtools: Python Library for Molecule, Reaction, and Condensed Graph of Reaction Processing
- Author
-
Nugmanov, Ramil I., Mukhametgaleev, Ravil N., Akhmetshin, Tagir, Gimadiev, Timur R., Afonina, Valentina A., Madzhidov, Timur I., and Varnek, Alexandre
- Abstract
CGRtools is an open-source Python library aimed to handle molecular and reaction information. It is the sole library developed so far which can process condensed graph of reaction (CGR) handling. CGR provides the possibility for advanced operations with reaction information and could be used for reaction descriptor calculation, structure–reactivity modeling, atom-to-atom mapping comparison and correction, reaction center extraction, reaction balancing, and some other related tasks. Unlike other popular libraries, CGRtools is fully written in Python with minor dependencies on other libraries and cross-platform. Reaction, molecule, and CGR objects in CGRtools support native Python methods and are comparable with the help of operations “equal to”, “less than”, and “bigger than”. CGRtools supports common structural formats. CGRtools is distributed via an L-GPL license and available on https://github.com/cimm-kzn/CGRtools.
- Published
- 2019
- Full Text
- View/download PDF
23. CovaDOTS: In SilicoChemistry-Driven Tool to Design Covalent Inhibitors Using a Linking Strategy
- Author
-
Hoffer, Laurent, Saez-Ayala, Magali, Horvath, Dragos, Varnek, Alexandre, Morelli, Xavier, and Roche, Philippe
- Abstract
We recently reported an integrated fragment-based optimization strategy called DOTS (Diversity Oriented Target-focused Synthesis) that combines automated virtual screening (VS) with semirobotized organic synthesis coupled to in vitroevaluation. The molecular modeling part consists of hit-to-lead chemistry, based on the growing paradigm. Here, we have extended the applicability of the DOTS strategy by adding new functionalities, allowing a generic chemistry-driven linking approach with a particular emphasis on covalent drugs. Indeed, the covalent mode of action can be described as a specific case of linking, where suitable linkers are sought to fuse a bound organic compound with a nucleophilic protein side chain. The proof of concept is established using three retrospective study cases in which known noncovalent inhibitors have been converted to covalent inhibitors. Our method is able to automatically design reference covalent inhibitors (and/or analogs) from an initial activated substructure and predict their binding mode. More importantly, the reference compounds are ranked high among several hundred putative adducts, demonstrating the utility of the approach to design covalent inhibitors.
- Published
- 2019
- Full Text
- View/download PDF
24. Virtual Screening with Generative Topographic Maps: How Many Maps Are Required?
- Author
-
Casciuc, Iuri, Zabolotna, Yuliana, Horvath, Dragos, Marcou, Gilles, Bajorath, Jürgen, and Varnek, Alexandre
- Abstract
Universal generative topographic maps (GTMs) provide two-dimensional representations of chemical space selected for their “polypharmacological competence”, that is, the ability to simultaneously represent meaningful activity and property landscapes, associated with many distinct targets and properties. Several such GTMs can be generated, each based on a different initial descriptor vector, encoding distinct structural features. While their average polypharmacological competence may indeed be equivalent, they nevertheless significantly diverge with respect to the quality of each property-specific landscape. In this work, we show that distinct universal maps represent complementary and strongly synergistic views of biologically relevant chemical space. Eight universal GTMs were employed as support for predictive classification landscapes, using more than 600 active/inactive ligand series associated with as many targets from the ChEMBL database (v.23). For nine of these targets, it was possible to extract, from the Directory of Useful Decoys (DUD), truly external sets featuring sufficient “actives” and “decoys” not present in the landscape-defining ChEMBL ligand sets. For each such molecule, projected on every class landscape of a particular universal map, a probability of activity was estimated, in analogy to a virtual screening (VS) experiment. Cross-validated (CV) balanced accuracy on landscape-defining ChEMBL data was unable to predict the success of that landscape in VS. Thus, the universal map with best CV results for a given property should not be prioritized as the implicitly best predictor. For a given map, predictions for many DUD compounds are not trustworthy, according to applicability domain considerations. By contrast, simultaneous application of all universal maps, and rating of the likelihood of activity as the mean returned by all applicable maps, significantly improved prediction results. Performance measures in consensus VS using multiple maps were always superior or similar to those of the best individual map.
- Published
- 2019
- Full Text
- View/download PDF
25. Conjugated quantitative structure‐property relationship models: Prediction of kinetic characteristics linked by the Arrhenius equation
- Author
-
Zankov, Dmitry, Madzhidov, Timur, Baskin, Igor, and Varnek, Alexandre
- Abstract
Conjugated QSPR models for reactions integrate fundamental chemical laws expressed by mathematical equations with machine learning algorithms. Herein we present a methodology for building conjugated QSPR models integrated with the Arrhenius equation. Conjugated QSPR models were used to predict kinetic characteristics of cycloaddition reactions related by the Arrhenius equation: rate constant logk${{\rm l}{\rm o}{\rm g}k}$ , pre‐exponential factor logA${{\rm l}{\rm o}{\rm g}A}$ , and activation energy Ea${{E}_{{\rm a}}}$ . They were benchmarked against single‐task (individual and equation‐based models) and multi‐task models. In individual models, all characteristics were modeled separately, while in multi‐task models logk${{\rm l}{\rm o}{\rm g}k}$ , logA${{\rm l}{\rm o}{\rm g}A}$ and Ea${{E}_{{\rm a}}}$ were treated cooperatively. An equation‐based model assessed logk${{\rm l}{\rm o}{\rm g}k}$ using the Arrhenius equation and logA${{\rm l}{\rm o}{\rm g}A}$ and Ea${{E}_{{\rm a}}}$ values predicted by individual models. It has been demonstrated that the conjugated QSPR models can accurately predict the reaction rate constants at extreme temperatures, at which reaction rate constants hardly can be measured experimentally. Also, in the case of small training sets conjugated models are more robust than related single‐task approaches.
- Published
- 2023
- Full Text
- View/download PDF
26. Anti-HIV Activity of HEPT, TIBO, and Cyclic Urea Derivatives: Structure−Property Studies, Focused Combinatorial Library Generation, and Hits Selection Using Substructural Molecular Fragments Method
- Author
-
P. Solov'ev, V. and Varnek, A.
- Abstract
Substructural molecular fragments (SMF) method Solov'ev, V. P.; Varnek, A.; Wipff, G. J.Chem. Inf. Comput. Sci.2000, 40, 847−858 was applied to assess anti-HIV activity for large data sets for three families of compounds: 1-2-hydroxyethoxy)methyl-6-(phenylthio)thymine (HEPT) derivatives, tetrahydroimidazobenzodiazepinone (TIBO) derivatives, and cyclic urea (CU) derivatives. The SMF method uses 49 types of topological descriptors (atom/bond sequences and “augmented atoms”) which, being coupled with 3 linear and nonlinear fitting equations, allows the user to generate up to 147 structure−property models. For each family of compounds, the modeling was performed on several training sets followed by the validation calculations where three best fit models were applied. Calculated activities well reproduce available experimental data. On the basis of the “optimal” molecular fragments, the focused combinatorial library containing 252 virtual HEPT derivatives has been generated. Its filtering led to several hits potentially possessing anti-HIV activity.
- Published
- 2003
- Full Text
- View/download PDF
27. Assessment of the Macrocyclic Effect for the Complexation of Crown-Ethers with Alkali Cations Using the Substructural Molecular Fragments Method
- Author
-
Varnek, A., Wipff, G., P. Solov'ev, V., and F. Solotnov, A.
- Abstract
The Substructural Molecular Fragments method (Solov'ev, V. P.; Varnek, A. A.; Wipff, G. J. Chem. Inf. Comput. Sci.2000, 40, 847−858) was applied to assess stability constants (logK) of the complexes of crown-ethers, polyethers, and glymes with Na, K, and Csin methanol. One hundred forty-seven computational models including different fragment sets coupled with linear or nonlinear fitting equations were applied for the data sets containing 69 (Na), 123 (K), and 31 (Cs) compounds. To account for the “macrocyclic effect” for crown-ethers, an additional “cyclicity”descriptor was used. “Predicted” stability constants both for macrocyclic compounds and for their open-chain analogues are in good agreement with the experimental data reported earlier and with those studied experimentally in this work. The macrocyclic effect as a function of cation and ligand is quantitatively estimated for all studied crown-ethers.
- Published
- 2002
- Full Text
- View/download PDF
28. De Novo Molecular Design by Combining Deep Autoencoder Recurrent Neural Networks with Generative Topographic Mapping
- Author
-
Sattarov, Boris, Baskin, Igor I., Horvath, Dragos, Marcou, Gilles, Bjerrum, Esben Jannik, and Varnek, Alexandre
- Abstract
Here we show that Generative Topographic Mapping (GTM) can be used to explore the latent space of the SMILES-based autoencoders and generate focused molecular libraries of interest. We have built a sequence-to-sequence neural network with Bidirectional Long Short-Term Memory layers and trained it on the SMILES strings from ChEMBL23. Very high reconstruction rates of the test set molecules were achieved (>98%), which are comparable to the ones reported in related publications. Using GTM, we have visualized the autoencoder latent space on the two-dimensional topographic map. Targeted map zones can be used for generating novel molecular structures by sampling associated latent space points and decoding them to SMILES. The sampling method based on a genetic algorithm was introduced to optimize compound properties “on the fly”. The generated focused molecular libraries were shown to contain original and a priorifeasible compounds which, pending actual synthesis and testing, showed encouraging behavior in independent structure-based affinity estimation procedures (pharmacophore matching, docking).
- Published
- 2018
- Full Text
- View/download PDF
29. Prediction of Aromatic Hydroxylation Sites for Human CYP1A2 Substrates Using Condensed Graph of Reactions
- Author
-
Madzhidov, T., Khakimova, A., Nugmanov, R., Muller, C., Marcou, G., and Varnek, A.
- Abstract
In this paper, support vector machine and condensed graph of reaction (CGR) approaches have been used to predict the regioselectivity of aromatic hydroxylation for human CYP1A2 substrates. Experimental data on aromatic hydroxylation for human cytochrome CYP1A2 (observed molecular or “real” transformations) used in the modeling were extracted from the Metabolite database and the XenoSite database. In addition, all potential but unobserved (“unreal”) transformations were generated. The dataset containing “real” and “unreal” transformations was converted into an ensemble of CGRs representing pseudomolecules with conventional (single, double, aromatic, etc.) bonds and dynamic bonds characterizing chemical transformations. ISIDA fragment descriptors generated for CGRs were used for the modeling. The models have been validated in three times repeated fivefold cross-validation on the training set and then on an external set. The final model was constructed by consensus over models built on different descriptors sets. Predictive performance of our model on the external test set was similar to that of XenoSite and Way2Drug tools. Unlike previously used atom labeling-based approaches, the proposed CGR-based representation of metabolic transformations could be applied to different types of reactions catalyzed by the same enzyme and therefore, it is more suitable for automatized handling of metabolic data.
- Published
- 2018
- Full Text
- View/download PDF
30. Artificial intelligence in synthetic chemistry: achievements and prospects
- Author
-
Baskin, I I, Madzhidov, T I, Antipin, I S, and Varnek, A A
- Abstract
The review is devoted to the achievements in analysis of information on chemical reactions using machine learning methods. Four large areas that actively use these methods are outlined: computer-assisted planning of synthesis, analysis and visualization of chemical reaction data, prediction of the quantitative characteristics of reactions and computer-aided design of catalysts. The bibliography includes 346 references.
- Published
- 2017
31. French dispatch: GTM‐based analysis of the Chimiothèque Nationale Chemical Space
- Author
-
Oleneva, Polina, Zabolotna, Yuliana, Horvath, Dragos, Marcou, Gilles, Bonachera, Fanny, and Varnek, Alexandre
- Abstract
In order to analyze the Chimiothèque Nationale (CN) – The French National Compound Library – in the context of screening and biologically relevant compounds, the library was compared with ZINC in‐stock collection and ChEMBL. This includes the study of chemical space coverage, physicochemical properties and Bemis‐Murcko (BM) scaffold populations. More than 5 K CN‐unique scaffolds (relative to ZINC and ChEMBL collections) were identified. Generative Topographic Maps (GTMs) accommodating those libraries were generated and used to compare the compound populations. Hierarchical GTM («zooming») was applied to generate an ensemble of maps at various resolution levels, from global overview to precise mapping of individual structures. The respective maps were added to the ChemSpace Atlas website. The analysis of synthetic accessibility in the context of combinatorial chemistry showed that only 29,7 % of CN compounds can be fully synthesized using commercially available building blocks.
- Published
- 2023
- Full Text
- View/download PDF
32. Multi-Instance Learning Approach to the Modeling of Enantioselectivity of Conformationally Flexible Organic Catalysts
- Author
-
Zankov, Dmitry, Madzhidov, Timur, Polishchuk, Pavel, Sidorov, Pavel, and Varnek, Alexandre
- Abstract
Computational design of chiral organic catalysts for asymmetric synthesis is a promising technology that can significantly reduce the material and human resources required for the preparation of enantiopure compounds. Herein, for the modeling of catalysts’ enantioselectivity, we propose to use the multi-instance learning approach accounting for multiple catalyst conformers and requiring neither conformer selection nor their spatial alignment. A catalyst was represented by an ensemble of conformers, each encoded by three-dimesinonal (3D) pmapper descriptors. A catalyzed reactant transformation was converted into a single molecular graph, a condensed graph of reaction, encoded by 2D fragment descriptors. A whole chemical reaction was finally encoded by concatenated 3D catalyst and 2D transformation descriptors. The performance of the proposed method was demonstrated in the modeling of the enantioselectivity of homogeneous and phase-transfer reactions and compared with the state-of-the-art approaches.
- Published
- 2023
- Full Text
- View/download PDF
33. Chemical Library Space: Definition and DNA-Encoded Library Comparison Study Case
- Author
-
Pikalyova, Regina, Zabolotna, Yuliana, Horvath, Dragos, Marcou, Gilles, and Varnek, Alexandre
- Abstract
The development of DNA-encoded library (DEL) technology introduced new challenges for the analysis of chemical libraries. It is often useful to consider a chemical library as a stand-alone chemoinformatic object─represented both as a collection of independent molecules, and yet an individual entity─in particular, when they are inseparable mixtures, like DELs. Herein, we introduce the concept of chemical library space (CLS), in which resident items are individual chemical libraries. We define and compare four vectorial library representations obtained using generative topographic mapping. These allow for an effective comparison of libraries, with the ability to tune and chemically interpret the similarity relationships. In particular, property-tuned CLS encodings enable to simultaneously compare libraries with respect to both property and chemotype distributions. We apply the various CLS encodings for the selection problem of DELs that optimally “match” a reference collection (here ChEMBL28), showing how the choice of the CLS descriptors may help to fine-tune the “matching” (overlap) criteria. Hence, the proposed CLS may represent a new efficient way for polyvalent analysis of thousands of chemical libraries. Selection of an easily accessible compound collection for drug discovery, as a substitute for a difficult to produce reference library, can be tuned for either primary or target-focused screening, also considering property distributions of compounds. Alternatively, selection of libraries covering novel regions of the chemical space with respect to a reference compound subspace may serve for library portfolio enrichment.
- Published
- 2023
- Full Text
- View/download PDF
34. GENERA: A Combined Genetic/Deep-Learning Algorithm for Multiobjective Target-Oriented De Novo Design
- Author
-
Lamanna, Giuseppe, Delre, Pietro, Marcou, Gilles, Saviano, Michele, Varnek, Alexandre, Horvath, Dragos, and Mangiatordi, Giuseppe Felice
- Abstract
This study introduces a new de novo design algorithm called GENERAthat combines the capabilities of a deep-learning algorithm for automated drug-like analogue design, called DeLA-Drug, with a genetic algorithm for generating molecules with desired target-oriented properties. Specifically, GENERAwas applied to the angiotensin-converting enzyme 2 (ACE2) target, which is implicated in many pathological conditions, including COVID-19. The ability of GENERAto de novo design promising candidates for a specific target was assessed using two docking programs, PLANTS and GLIDE. A fitness function based on the Pareto dominance resulting from computed PLANTS and GLIDE scores was applied to demonstrate the algorithm’s ability to perform multiobjective optimizations effectively. GENERA can quickly generate focused libraries that produce better scores compared to a starting set of known ACE-2 binders. This study is the first to utilize a DL-based algorithm designed for analogue generation as a mutational operator within a GA framework, representing an innovative approach to target-oriented de novo design.
- Published
- 2023
- Full Text
- View/download PDF
35. Predictive Models for the Free Energy of Hydrogen Bonded Complexes with Single and Cooperative Hydrogen Bonds
- Author
-
Glavatskikh, Marta, Madzhidov, Timur, Solov'ev, Vitaly, Marcou, Gilles, Horvath, Dragos, and Varnek, Alexandre
- Abstract
In this work, we report QSPR modeling of the free energy ΔGof 1 : 1 hydrogen bond complexes of different H‐bond acceptors and donors. The modeling was performed on a large and structurally diverse set of 3373 complexes featuring a single hydrogen bond, for which ΔGwas measured at 298 K in CCl4. The models were prepared using Support Vector Machine and Multiple Linear Regression, with ISIDA fragment descriptors. The marked atoms strategy was applied at fragmentation stage, in order to capture the location of H‐bond donor and acceptor centers. Different strategies of model validation have been suggested, including the targeted omission of individual H‐bond acceptors and donors from the training set, in order to check whether the predictive ability of the model is not limited to the interpolation of H‐bond strength between two already encountered partners. Successfully cross‐validating individual models were combined into a consensus model, and challenged to predict external test sets of 629 and 12 complexes, in which donor and acceptor formed single and cooperative H‐bonds, respectively. In all cases, SVM models outperform MLR. The SVM consensus model performs well both in 3‐fold cross‐validation (RMSE=1.50 kJ/mol), and on the external test sets containing complexes with single (RMSE=3.20 kJ/mol) and cooperative H‐bonds (RMSE=1.63 kJ/mol).
- Published
- 2016
- Full Text
- View/download PDF
36. Generative Topographic Mapping Approach to Modeling and Chemical Space Visualization of Human Intestinal Transporters
- Author
-
Gimadiev, Timur, Madzhidov, Timur, Marcou, Gilles, and Varnek, Alexandre
- Abstract
The generative topographic mapping (GTM) approach has been used both to build predictive models linking chemical structure of molecules and their ability to bind some membrane transport proteins (transporters) and to visualize a chemical space of transporters’ binders on two-dimensional maps. For this purpose, experimental data on 2958 molecules active against up to 11 transporters have been used. It has been shown that GTM-based classification (active/inactive) models display reasonable predictive performance, comparable with that of such popular machine-learning methods as Random Forest, SVM, or k-NN. Moreover, GTM offers its own models applicability domain definition which may significantly improve the models performance. GTM maps themselves represent an interesting tool of the chemical space analysis of the transporters’ ligands. Thus, with the help of class landscapes, they identify distinct zones populated by active or inactive molecules with respect to a given transporter. As demonstrated in this paper, the superposition of class landscapes describing different activities delineates the areas mostly populated by the compounds of desired pharmacological profile.
- Published
- 2016
- Full Text
- View/download PDF
37. Predictive Models for Halogen‐bond Basicity of Binding Sites of Polyfunctional Molecules
- Author
-
Glavatskikh, Marta, Madzhidov, Timur, Solov'ev, Vitaly, Marcou, Gilles, Horvath, Dragos, Graton, Jérôme, Le Questel, Jean‐Yves, and Varnek, Alexandre
- Abstract
Halogen bonding (XB) strength assesses the ability of an electron‐enriched group to be involved in complexes with polarizable electrophilic halogenated or diatomic halogen molecules. Here, we report QSPR models of XB of particular relevance for an efficient screening of large sets of compounds. The basicity is described by pKBI2, the decimal logarithm of the experimental 1 : 1 (B : I2) complexation constant Kof organic compounds (B) with diiodine (I2) as a reference halogen‐bond donor in alkanes at 298 K. Modeling involved ISIDA fragment descriptors, using SVM and MLR methods on a set of 598 organic compounds. Developed models were then challenged to make predictions for an external test set of 11 polyfunctional compounds for which unambiguous assignment of the measured effective complexation constant to specific groups out of the putative acceptor sites is not granted. At this stage, developed models were used to predict pKBI2of all putative acceptor sites, followed by an estimation of the predicted effective complexation constant using the ChemEqui program. The best consensus models perform well both in cross‐validation (root mean squared error RMSE=0.39–0.47 logKBI2units) and external predictions (RMSE=0.49). The SVM models are implemented on our website (http://infochim.u‐strasbg.fr/webserv/VSEngine.html) together with the estimation of their applicability domain and an automatic detection of potential halogen‐bond acceptor atoms.
- Published
- 2016
- Full Text
- View/download PDF
38. Prediction of Optimal Salinities for Surfactant Formulations Using a Quantitative Structure-Property Relationships Approach.
- Author
-
Muller, Christophe, Maldonado, Ana G., Varnek, Alexandre, and Creton, Benoit
- Published
- 2015
- Full Text
- View/download PDF
39. GTM-Based QSAR Models and Their Applicability Domains
- Author
-
Gaspar, H. A., Baskin, I. I., Marcou, G., Horvath, D., and Varnek, A.
- Abstract
In this paper we demonstrate that Generative Topographic Mapping (GTM), a machine learning method traditionally used for data visualisation, can be efficiently applied to QSAR modelling using probability distribution functions (PDF) computed in the latent 2-dimensional space. Several different scenarios of the activity assessment were considered: (i) the “activity landscape” approach based on direct use of PDF, (ii) QSAR models involving GTM-generated on descriptors derived from PDF, and, (iii) the k-Nearest Neighbours approach in 2D latent space. Benchmarking calculations were performed on five different datasets: stability constants of metal cations Ca2+, Gd3+and Lu3+complexes with organic ligands in water, aqueous solubility and activity of thrombin inhibitors. It has been shown that the performance of GTM-based regression models is similar to that obtained with some popular machine-learning methods (random forest, k-NN, M5P regression tree and PLS) and ISIDA fragment descriptors. By comparing GTM activity landscapes built both on predicted and experimental activities, we may visually assess the model’s performance and identify the areas in the chemical space corresponding to reliable predictions. The applicability domain used in this work is based on data likelihood. Its application has significantly improved the model performances for 4 out of 5 datasets.
- Published
- 2015
- Full Text
- View/download PDF
40. Prediction of Drug Induced Liver Injury Using Molecular and Biological Descriptors
- Author
-
Muller, Christophe, Pekthong, Dumrongsak, Alexandre, Eliane, Marcou, Gilles, Horvath, Dragos, Richert, Lysiane, and Varnek, Alexandre
- Abstract
In this paper we report quantitative structure-activity models linking in vivo Drug-Induced Liver Injury (DILI) of organic molecules with some parameters both measured experimentally in vitro and calculated theoretically from the molecular structure. At the first step, a small database containing information of DILI in humans was created and annotated by experimentally observed information concerning hepatotoxic effects. Thus, for each compound a binary annotation “yes/no” was applied to DILI and seven endpoints causing different liver pathologies in humans: Cholestasis (CH), Oxidative Stress (OS), Mitochondrial injury (MT), Cirrhosis and Steatosis (CS), Hepatitis (HS), Hepatocellular (HC), and Reactive Metabolite (RM). Different machine-learning methods were used to build classification models linking DILI with molecular structure: Support Vector Machines, Artificial Neural Networks and Random Forests. Three types of models were developed: (i) involving molecular descriptors calculated directly from chemical structure, (ii) involving selected endpoints as “biological” descriptors, and (iii) involving both types of descriptors. It has been found that the models based solely on molecular descriptors have much weaker prediction performance than those involving in vivo measured endpoints. Taking into account difficulties in obtaining of in vivo data, at the validation stage we used instead five endpoints (CH, CS, HC, MT and OS) measured in vitro in human hepatocyte cultures. The models involving either some of experimental in vitro endpoints or their combination with theoretically calculated ones correctly predict DILI for 9 out of 10 reference compounds of the external test set. This opens an interesting perspective to use for DILI predictions a combination of theoretically calculated parameters and measured in vitro biological data.
- Published
- 2015
41. On the nature of the isomer shift in tin(IV) tetrachloride complexes with organic ligands from quantum chemical data
- Author
-
Varnek, V. A. and Varnek, A. A.
- Published
- 1997
- Full Text
- View/download PDF
42. Editorial: Chemical Reactions Mining
- Author
-
Madzhidov, Timur I. and Varnek, Alexandre
- Published
- 2022
- Full Text
- View/download PDF
43. Individual Hydrogen‐Bond Strength QSPR Modelling with ISIDA Local Descriptors: a Step Towards Polyfunctional Molecules
- Author
-
Ruggiu, Fiorella, Solov'ev, Vitaly, Marcou, Gilles, Horvath, Dragos, Graton, Jérôme, Le Questel, Jean‐Yves, and Varnek, Alexandre
- Abstract
Here, we introduce new ISIDA fragment descriptors able to describe “local” properties related to selected atoms or molecular fragments. These descriptors have been applied for QSPR modelling of the H‐bond basicity scale pKBHX, measured by the 1 : 1 complexation constant of a series of organic acceptors (H‐bond bases) with 4‐fluorophenol as the reference H‐bond donor in CCl4at 298 K. Unlike previous QSPR studies of H‐bond complexation, the models based on these new descriptors are able to predict the H‐bond basicity of different acceptor centres on the same polyfunctional molecule. QSPR models were obtained using support vector machine and ensemble multiple linear regression methods on a set of 537 organic compounds including 5 bifunctional molecules. They were validated with cross‐validation procedures and with two external test sets. The best model displays good predictive performance on a large test set of 451 mono‐ and bifunctional molecules: a root‐mean squared error RMSE=0.26 and a determination coefficient R2=0.91. It is implemented on our website (http://infochim.u‐strasbg.fr/webserv/VSEngine.html) together with the estimation of its applicability domain and an automatic detection of potential H‐bond acceptors.
- Published
- 2014
- Full Text
- View/download PDF
44. Exploration of the Chemical Space of DNA‐encoded Libraries
- Author
-
Pikalyova, Regina, Zabolotna, Yuliana, Volochnyuk, Dmitriy M., Horvath, Dragos, Marcou, Gilles, and Varnek, Alexandre
- Abstract
DNA‐Encoded Library (DEL) technology has emerged as an alternative method for bioactive molecules discovery in medicinal chemistry. It enables the simple synthesis and screening of compound libraries of enormous size. Even though it gains more and more popularity each day, there are almost no reports of chemoinformatics analysis of DEL chemical space. Therefore, in this project, we aimed to generate and analyze the ultra‐large chemical space of DEL. Around 2500 DELs were designed using commercially available building blocks resulting in 2,5B DEL compounds that were compared to biologically relevant compounds from ChEMBL using Generative Topographic Mapping. This allowed to choose several optimal DELs covering the chemical space of ChEMBL to the highest extent and thus containing the maximum possible percentage of biologically relevant chemotypes. Different combinations of DELs were also analyzed to identify a set of mutually complementary libraries allowing to attain even higher coverage of ChEMBL than it is possible with one single DEL.
- Published
- 2022
- Full Text
- View/download PDF
45. Atom‐to‐atom Mapping: A Benchmarking Study of Popular Mapping Algorithms and Consensus Strategies
- Author
-
Lin, Arkadii, Dyubankova, Natalia, Madzhidov, Timur I., Nugmanov, Ramil I., Verhoeven, Jonas, Gimadiev, Timur R., Afonina, Valentina A., Ibragimova, Zarina, Rakhimbekova, Assima, Sidorov, Pavel, Gedich, Andrei, Suleymanov, Rail, Mukhametgaleev, Ravil, Wegner, Joerg, Ceulemans, Hugo, and Varnek, Alexandre
- Abstract
In this paper, we compare the most popular Atom‐to‐Atom Mapping (AAM) tools: ChemAxon,[1]Indigo,[2]RDTool,[3]NameRXN (NextMove),[4]and RXNMapper[5]which implement different AAM algorithms. An open‐source RDTool program was optimized, and its modified version (“new RDTool”) was considered together with several consensus mapping strategies. The Condensed Graph of Reaction approach was used to calculate chemical distances and develop the “AAM fixer” algorithm for an automatized correction of erroneous mapping. The benchmarking calculations were performed on a Golden datasetcontaining 1851 manually mapped and curated reactions. The best performing RXNMapper program together with the AMM Fixer was applied to map the USPTO database. The Golden dataset, mapped USPTO and optimized RDTool are available in the GitHub repository https://github.com/Laboratoire‐de‐Chemoinformatique.
- Published
- 2022
- Full Text
- View/download PDF
46. Transductive Support Vector Machines: Promising Approach to Model Small and Unbalanced Datasets
- Author
-
Kondratovich, Evgeny, Baskin, Igor I., and Varnek, Alexandre
- Abstract
Semi‐supervised methods dealing with a combination of labeled and unlabeled data become more and more popular in machine‐learning area, but not still used in chemoinformatics. Here, we demonstrate that Transductive Support Vector Machines (TSVM) – a semi‐supervised large‐margin classification method – can be particularly useful to build the models on small and unbalanced datasets which often represent a difficult problem in QSAR. Both TSVM and ordinary SVM have been applied to build classification models on 10 DUD datasets. The “transductive effect” (the difference in predictive performance between transductive and ordinary support vector machines) was investigated as a function of: (a) active/inactive ratio, (b) descriptor weighting, and (c) the training and test sets size and composition.
- Published
- 2013
- Full Text
- View/download PDF
47. Interpretability of SAR/QSAR Models of any Complexity by Atomic Contributions
- Author
-
Marcou, G., Horvath, D., Solov'ev, V., Arrault, A., Vayer, P., and Varnek, A.
- Published
- 2012
- Full Text
- View/download PDF
48. QSPR Approach to Predict Nonadditive Properties of Mixtures. Application to Bubble Point Temperatures of Binary Mixtures of Liquids
- Author
-
Oprisiu, I., Varlamova, E., Muratov, E., Artemenko, A., Marcou, G., Polishchuk, P., Kuz'min, V., and Varnek, A.
- Abstract
This paper is devoted to the development of methodology for QSPR modeling of mixtures and its application to vapor/liquid equilibrium diagrams for bubble point temperatures of binary liquid mixtures. Two types of special mixture descriptors based on SiRMS and ISIDA approaches were developed. SiRMS‐based fragment descriptors involve atoms belonging to both components of the mixture, whereas the ISIDA fragments belong only to one of these components. The models were built on the data set containing the phase diagrams for 167 mixtures represented by different combinations of 67 pure liquids. Consensus models were developed using nonlinear Support Vector Machine (SVM), Associative Neural Networks (ASNN), and Random Forest (RF) approaches. For SVM and ASNN calculations, the ISIDA fragment descriptors were used, whereas Simplex descriptors were employed in RF models. The models have been validated using three different protocols: “Points out”, “Mixtures out” and “Compounds out”, based on the specific rules to form training/test sets in each fold of cross‐validation. A final validation of the models has been performed on an additional set of 94 mixtures represented by combinations of novel 34 compounds and modeling set chemicals with each other. The root mean squared error of predictions for new mixtures of already known liquids does not exceed 5.7 K, which outperforms COSMO‐RS models. Developed QSAR methodology can be applied to the modeling of any nonadditive property of binary mixtures (antiviral activities, drug formulation, etc.)
- Published
- 2012
- Full Text
- View/download PDF
49. Generative Topographic Mapping (GTM): Universal Tool for Data Visualization, Structure‐Activity Modeling and Dataset Comparison
- Author
-
Kireeva, N., Baskin, I. I., Gaspar, H. A., Horvath, D., Marcou, G., and Varnek, A.
- Abstract
Here, the utility of Generative Topographic Maps (GTM) for data visualization, structure‐activity modeling and database comparison is evaluated, on hand of subsets of the Database of Useful Decoys (DUD). Unlike other popular dimensionality reduction approaches like Principal Component Analysis, Sammon Mapping or Self‐Organizing Maps, the great advantage of GTMs is providing data probability distribution functions (PDF), both in the high‐dimensional space defined by molecular descriptors and in 2D latent space. PDFs for the molecules of different activity classes were successfully used to build classification models in the framework of the Bayesian approach. Because PDFs are represented by a mixture of Gaussian functions, the Bhattacharyya kernel has been proposed as a measure of the overlap of datasets, which leads to an elegant method of global comparison of chemical libraries.
- Published
- 2012
- Full Text
- View/download PDF
50. Chemoinformatics as a Theoretical Chemistry Discipline
- Author
-
Varnek, Alexandre and Baskin, Igor I.
- Abstract
Here, chemoinformatics is considered as a theoretical chemistry discipline complementary to quantum chemistry and force‐field molecular modeling. These three fields are compared with respect to molecular representation, inference mechanisms, basic concepts and application areas. A chemical space, a fundamental concept of chemoinformatics, is considered with respect to complex relations between chemical objects (graphs or descriptor vectors). Statistical Learning Theory, one of the main mathematical approaches in structure‐property modeling, is briefly reviewed. Links between chemoinformatics and its “sister” fields – machine learning, chemometrics and bioinformatics are discussed.
- Published
- 2011
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.