117 results on '"Marcou, G"'
Search Results
2. The freedom space - a new set of commercially available molecules for hit discovery.
- Author
-
Protopopov MV, Tararina VV, Bonachera F, Dzyuba IM, Kapeliukha A, Hlotov S, Chuk O, Marcou G, Klimchuk O, Horvath D, Yeghyan E, Savych O, Tarkhanova OO, Varnek A, and Moroz YS
- Subjects
- Databases, Chemical, Small Molecule Libraries chemistry, Small Molecule Libraries pharmacology, Drug Discovery methods
- Abstract
The advent of high-performance virtual screening techniques nowadays allows drug designers to explore ultra-large sets of candidate compounds in search of molecules predicted to have desired properties. However, the success of such an endeavor heavily relies on the pertinence (drug-likeness and, foremost, chemical feasibility) of these candidates, or otherwise, virtual screening will return valueless "hits", by the garbage in/garbage out principle. The huge popularity of the judiciously enumerated Enamine REAL Space is clear proof of the strength of this Big Data trend in drug discovery. Here we describe a new dataset of make-on-demand compounds called the Freedom space. It follows the principles of Enamine REAL Space and contains highly feasible molecules (synthesis success rate over 75 percent). However, the scaffold and chemography analysis revealed significant differences to both the REAL and biologically annotated compounds from the ChEMBL database. The Freedom Space is a significant extension of the REAL Space and can be utilized for a more comprehensive exploration of the synthetically feasible chemical space in hit finding and hit-to-lead campaigns., (© 2024 Wiley-VCH GmbH.)
- Published
- 2024
- Full Text
- View/download PDF
3. Assessment of the concomitant action of XBD173 and interferon β in a mouse model of multiple sclerosis using infrared marker bands.
- Author
-
Sirinukunwattana K, Klein C, Clarke PFA, Marcou G, Meyer L, Collongues N, de Sèze J, Hellwig P, Patte-Mensah C, El Khoury Y, and Mensah-Nyagan AG
- Subjects
- Animals, Mice, Spectroscopy, Fourier Transform Infrared, Female, Multiple Sclerosis drug therapy, Biomarkers blood, Multiple Sclerosis, Relapsing-Remitting drug therapy, Multiple Sclerosis, Relapsing-Remitting blood, Receptors, GABA, Interferon-beta metabolism, Disease Models, Animal
- Abstract
Disease modifying therapies including interferon-β (IFNβ) effectively counteract the inflammatory component in relapsing-remitting multiple sclerosis (RRMS) but this action, generally associated with severe side effects, does not prevent axonal/neuronal damages. Hence, axonal neuroprotection, which is pivotal for MS effective treatment, remains a difficult clinical challenge. Growing evidence suggested as promising candidate for neuroprotection, Emapunil (AC-5216) or XBD173, a ligand of the mitochondrial translocator protein highly expressed in glial cells and neurons. Indeed, elegant studies previously showed that low and well tolerated doses of XBD173 efficiently improved clinical symptoms and neuropathological markers in MS mice. Here we combined clinical scoring in vivo with Fourier transform infrared spectroscopy of sera samples to investigate the hypothesis that the concomitant treatment of RRMS mice with low doses of IFNβ and XBD173 may increase their beneficial effects against MS symptoms and additionally decrease IFNβ-induced side effects. Our results show a significant alteration of the composition of serum protein and lipids in the spectra of the sera of RRMS mice. While the signature of proteins remains altered upon treatment, the signature of lipids is recovered comparatively well with 20 kIU IFNβ and upon concomitant treatment with a low dose of XBD173 (10 mg/kg) and IFNβ (10 kIU), but not with 10 kIU of IFNβ alone. The concomitant therapy with XBD173 (10 mg/kg) and IFNβ (10 kIU), devoid of side effects, exhibited at least equal or even better efficacy than IFNβ (20 kIU) treatment against RRMS symptoms., Competing Interests: Declaration of competing interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper., (Copyright © 2024 The Authors. Published by Elsevier B.V. All rights reserved.)
- Published
- 2025
- Full Text
- View/download PDF
4. Implementation of a soft grading system for chemistry in a Moodle plugin: reaction handling.
- Author
-
Plyer L, Marcou G, Perves C, Bonachera F, and Varnek A
- Abstract
Here, we present a new method for evaluating questions on chemical reactions in the context of remote education. This method can be used when binary grading is not sufficient as some tolerance may be acceptable. In order to determine a grade, the developed workflow uses the pairwise similarity assessment of two considered reactions, each encoded by a single molecular graph with the help of the Condensed Graph of Reaction (CGR) approach. This workflow is part of the ChemMoodle project and is implemented as a Moodle Plugin. It uses the Chemdoodle engine for reaction drawing and visualization and communicates with a REST server calculating the similarity score using ISIDA fragment descriptors. The plugin is open-source, accessible in GitHub ( https://github.com/Laboratoire-de-Chemoinformatique/moodle-qtype_reacsimilarity ) and on the Moodle plugin store ( https://moodle.org/plugins/qtype_reacsimilarity?lang=en ). Both similarity measures and fragmentation can be configured.Scientific contribution This work introduces an open-source method for evaluating chemical reaction questions within Moodle using the CGR approach. Our contribution provides a nuanced grading mechanism that accommodates acceptable tolerances in reaction assessments, enhancing the accuracy and flexibility of the grading process., (© 2024. The Author(s).)
- Published
- 2024
- Full Text
- View/download PDF
5. Predicting S. aureus antimicrobial resistance with interpretable genomic space maps.
- Author
-
Pikalyova K, Orlov A, Horvath D, Marcou G, and Varnek A
- Subjects
- Genome, Bacterial, Genomics methods, Humans, Staphylococcus aureus drug effects, Staphylococcus aureus genetics, Anti-Bacterial Agents pharmacology, Drug Resistance, Bacterial genetics, Drug Resistance, Bacterial drug effects, Machine Learning
- Abstract
Increasing antimicrobial resistance (AMR) represents a global healthcare threat. To decrease the spread of AMR and associated mortality, methods for rapid selection of optimal antibiotic treatment are urgently needed. Machine learning (ML) models based on genomic data to predict resistant phenotypes can serve as a fast screening tool prior to phenotypic testing. Nonetheless, many existing ML methods lack interpretability. Therefore, we present a methodology for visualization of sequence space and AMR prediction based on the non-linear dimensionality reduction method - generative topographic mapping (GTM). This approach, applied to AMR data of >5000 S. aureus isolates retrieved from the PATRIC database, yielded GTM models with reasonable accuracy for all drugs (balanced accuracy values ≥0.75). The Generative Topographic Maps (GTMs) represent data in the form of illustrative maps of the genomic space and allow for antibiotic-wise comparison of resistant phenotypes. The maps were also found to be useful for the analysis of genetic determinants responsible for drug resistance. Overall, the GTM-based methodology is a useful tool for both the illustrative exploration of the genomic sequence space and AMR prediction., (© 2024 The Authors. Molecular Informatics published by Wiley-VCH GmbH.)
- Published
- 2024
- Full Text
- View/download PDF
6. Benchmarking of BMDC assay and related QSAR study for identifying sensitizing chemicals.
- Author
-
Chedik L, Baybekov S, Marcou G, Cosnier F, Mourot-Bousquenaud M, Jacquenet S, Varnek A, and Battais F
- Subjects
- Humans, Animals, Support Vector Machine, Computer Simulation, Dermatitis, Allergic Contact, Allergens toxicity, Animal Testing Alternatives methods, Bone Marrow Cells drug effects, Local Lymph Node Assay, Mice, Quantitative Structure-Activity Relationship, Benchmarking, Dendritic Cells drug effects
- Abstract
The Bone-Marrow derived Dendritic Cell (BMDC) test is a promising assay for identifying sensitizing chemicals based on the 3Rs (Replace, Reduce, Refine) principle. This study expanded the BMDC benchmarking to various in vitro, in chemico, and in silico assays targeting different key events (KE) in the skin sensitization pathway, using common substances datasets. Additionally, a Quantitative Structure-Activity Relationship (QSAR) model was developed to predict the BMDC test outcomes for sensitizing or non-sensitizing chemicals. The modeling workflow involved ISIDA (In Silico Design and Data Analysis) molecular fragment descriptors and the SVM (Support Vector Machine) machine-learning method. The BMDC model's performance was at least comparable to that of all ECVAM-validated models regardless of the KE considered. Compared with other tests targeting KE3, related to dendritic cell activation, BMDC assay was shown to have higher balanced accuracy and sensitivity concerning both the Local Lymph Node Assay (LLNA) and human labels, providing additional evidence for its reliability. The consensus QSAR model exhibits promising results, correlating well with observed sensitization potential. Integrated into a publicly available web service, the BMDC-based QSAR model may serve as a cost-effective and rapid alternative to lab experiments, providing preliminary screening for sensitization potential, compound prioritization, optimization and risk assessment., Competing Interests: Declaration of competing interest The authors declare no conflicts of interest., (Copyright © 2024 Elsevier Inc. All rights reserved.)
- Published
- 2024
- Full Text
- View/download PDF
7. Will we ever be able to accurately predict solubility?
- Author
-
Llompart P, Minoletti C, Baybekov S, Horvath D, Marcou G, and Varnek A
- Abstract
Accurate prediction of thermodynamic solubility by machine learning remains a challenge. Recent models often display good performances, but their reliability may be deceiving when used prospectively. This study investigates the origins of these discrepancies, following three directions: a historical perspective, an analysis of the aqueous solubility dataverse and data quality. We investigated over 20 years of published solubility datasets and models, highlighting overlooked datasets and the overlaps between popular sets. We benchmarked recently published models on a novel curated solubility dataset and report poor performances. We also propose a workflow to cure aqueous solubility data aiming at producing useful models for bench chemist. Our results demonstrate that some state-of-the-art models are not ready for public usage because they lack a well-defined applicability domain and overlook historical data sources. We report the impact of factors influencing the utility of the models: interlaboratory standard deviation, ionic state of the solute and data sources. The herein obtained models, and quality-assessed datasets are publicly available., (© 2024. The Author(s).)
- Published
- 2024
- Full Text
- View/download PDF
8. An update of skin permeability data based on a systematic review of recent research.
- Author
-
Chedik L, Baybekov S, Cosnier F, Marcou G, Varnek A, and Champmartin C
- Subjects
- Permeability, Datasets as Topic, Humans, Skin metabolism, Skin Absorption, Xenobiotics metabolism
- Abstract
The cutaneous absorption parameters of xenobiotics are crucial for the development of drugs and cosmetics, as well as for assessing environmental and occupational chemical risks. Despite the great variability in the design of experimental conditions due to uncertain international guidelines, datasets like HuskinDB have been created to report skin absorption endpoints. This review updates available skin permeability data by rigorously compiling research published between 2012 and 2021. Inclusion and exclusion criteria have been selected to build the most harmonized and reusable dataset possible. The Generative Topographic Mapping method was applied to the present dataset and compared to HuskinDB to monitor the progress in skin permeability research and locate chemotypes of particular concern. The open-source dataset (SkinPiX) includes steady-state flux, maximum flux, lag time and permeability coefficient results for the substances tested, as well as relevant information on experimental parameters that can impact the data. It can be used to extract subsets of data for comparisons and to build predictive models., (© 2024. The Author(s).)
- Published
- 2024
- Full Text
- View/download PDF
9. Kinetic solubility: Experimental and machine-learning modeling perspectives.
- Author
-
Baybekov S, Llompart P, Marcou G, Gizzi P, Galzi JL, Ramos P, Saurel O, Bourban C, Minoletti C, and Varnek A
- Subjects
- Solubility, Reproducibility of Results, Water, Machine Learning, Drug Discovery, High-Throughput Screening Assays
- Abstract
Kinetic aqueous or buffer solubility is important parameter measuring suitability of compounds for high throughput assays in early drug discovery while thermodynamic solubility is reserved for later stages of drug discovery and development. Kinetic solubility is also considered to have low inter-laboratory reproducibility because of its sensitivity to protocol parameters [1]. Presumably, this is why little efforts have been put to build QSPR models for kinetic in comparison to thermodynamic aqueous solubility. Here, we investigate the reproducibility and modelability of kinetic solubility assays. We first analyzed the relationship between kinetic and thermodynamic solubility data, and then examined the consistency of data from different kinetic assays. In this contribution, we report differences between kinetic and thermodynamic solubility data that are consistent with those reported by others [1, 2] and good agreement between data from different kinetic solubility campaigns in contrast to general expectations. The latter is confirmed by achieving high performing QSPR models trained on merged kinetic solubility datasets. The poor performance of QSPR model trained on thermodynamic solubility when applied to kinetic solubility dataset reinforces the conclusion that kinetic and thermodynamic solubilities do not correlate: one cannot be used as an ersatz for the other. This encourages for building predictive models for kinetic solubility. The kinetic solubility QSPR model developed in this study is freely accessible through the Predictor web service of the Laboratory of Chemoinformatics (https://chematlas.chimie.unistra.fr/cgi-bin/predictor2.cgi)., (© 2024 Wiley-VCH GmbH.)
- Published
- 2024
- Full Text
- View/download PDF
10. A community effort in SARS-CoV-2 drug discovery.
- Author
-
Schimunek J, Seidl P, Elez K, Hempel T, Le T, Noé F, Olsson S, Raich L, Winter R, Gokcan H, Gusev F, Gutkin EM, Isayev O, Kurnikova MG, Narangoda CH, Zubatyuk R, Bosko IP, Furs KV, Karpenko AD, Kornoushenko YV, Shuldau M, Yushkevich A, Benabderrahmane MB, Bousquet-Melou P, Bureau R, Charton B, Cirou BC, Gil G, Allen WJ, Sirimulla S, Watowich S, Antonopoulos N, Epitropakis N, Krasoulis A, Itsikalis V, Theodorakis S, Kozlovskii I, Maliutin A, Medvedev A, Popov P, Zaretckii M, Eghbal-Zadeh H, Halmich C, Hochreiter S, Mayr A, Ruch P, Widrich M, Berenger F, Kumar A, Yamanishi Y, Zhang KYJ, Bengio E, Bengio Y, Jain MJ, Korablyov M, Liu CH, Marcou G, Glaab E, Barnsley K, Iyengar SM, Ondrechen MJ, Haupt VJ, Kaiser F, Schroeder M, Pugliese L, Albani S, Athanasiou C, Beccari A, Carloni P, D'Arrigo G, Gianquinto E, Goßen J, Hanke A, Joseph BP, Kokh DB, Kovachka S, Manelfi C, Mukherjee G, Muñiz-Chicharro A, Musiani F, Nunes-Alves A, Paiardi G, Rossetti G, Sadiq SK, Spyrakis F, Talarico C, Tsengenes A, Wade RC, Copeland C, Gaiser J, Olson DR, Roy A, Venkatraman V, Wheeler TJ, Arthanari H, Blaschitz K, Cespugli M, Durmaz V, Fackeldey K, Fischer PD, Gorgulla C, Gruber C, Gruber K, Hetmann M, Kinney JE, Padmanabha Das KM, Pandita S, Singh A, Steinkellner G, Tesseyre G, Wagner G, Wang ZF, Yust RJ, Druzhilovskiy DS, Filimonov DA, Pogodin PV, Poroikov V, Rudik AV, Stolbov LA, Veselovsky AV, De Rosa M, De Simone G, Gulotta MR, Lombino J, Mekni N, Perricone U, Casini A, Embree A, Gordon DB, Lei D, Pratt K, Voigt CA, Chen KY, Jacob Y, Krischuns T, Lafaye P, Zettor A, Rodríguez ML, White KM, Fearon D, Von Delft F, Walsh MA, Horvath D, Brooks CL 3rd, Falsafi B, Ford B, García-Sastre A, Yup Lee S, Naffakh N, Varnek A, Klambauer G, and Hermans TM
- Subjects
- Humans, Pandemics, Biological Assay, Drug Discovery, SARS-CoV-2, COVID-19
- Abstract
The COVID-19 pandemic continues to pose a substantial threat to human lives and is likely to do so for years to come. Despite the availability of vaccines, searching for efficient small-molecule drugs that are widely available, including in low- and middle-income countries, is an ongoing challenge. In this work, we report the results of an open science community effort, the "Billion molecules against COVID-19 challenge", to identify small-molecule inhibitors against SARS-CoV-2 or relevant human receptors. Participating teams used a wide variety of computational methods to screen a minimum of 1 billion virtual molecules against 6 protein targets. Overall, 31 teams participated, and they suggested a total of 639,024 molecules, which were subsequently ranked to find 'consensus compounds'. The organizing team coordinated with various contract research organizations (CROs) and collaborating institutions to synthesize and test 878 compounds for biological activity against proteases (Nsp5, Nsp3, TMPRSS2), nucleocapsid N, RdRP (only the Nsp12 domain), and (alpha) spike protein S. Overall, 27 compounds with weak inhibition/binding were experimentally identified by binding-, cleavage-, and/or viral suppression assays and are presented here. Open science approaches such as the one presented here contribute to the knowledge base of future drug discovery efforts in finding better SARS-CoV-2 treatments., (© 2023 The Authors. Molecular Informatics published by Wiley-VCH GmbH.)
- Published
- 2024
- Full Text
- View/download PDF
11. Meta-GTM: Visualization and Analysis of the Chemical Library Space.
- Author
-
Pikalyova R, Zabolotna Y, Horvath D, Marcou G, and Varnek A
- Subjects
- Gene Library, Small Molecule Libraries
- Abstract
In chemical library analysis, it may be useful to describe libraries as individual items rather than collections of compounds. This is particularly true for ultra-large noncherry-pickable compound mixtures, such as DNA-encoded libraries (DELs). In this sense, the chemical library space (CLS) is useful for the management of a portfolio of libraries, just like chemical space (CS) helps manage a portfolio of molecules. Several possible CLSs were previously defined using vectorial library representations obtained from generative topographic mapping (GTM). Given the steadily growing number of DEL designs, the CLS becomes "crowded" and requires analysis tools beyond pairwise library comparison. Therefore, herein, we investigate the cartography of CLS on meta-(μ)GTMs─"meta" to remind that these are maps of the CLS, itself based on responsibility vectors issued by regular CS GTMs. 2,5 K DELs and ChEMBL (reference) were projected on the μGTM, producing landscapes of library-specific properties. These describe both interlibrary similarity and intrinsic library characteristics in the same view, herewith facilitating the selection of the best project-specific libraries.
- Published
- 2023
- Full Text
- View/download PDF
12. GENERA: A Combined Genetic/Deep-Learning Algorithm for Multiobjective Target-Oriented De Novo Design.
- Author
-
Lamanna G, Delre P, Marcou G, Saviano M, Varnek A, Horvath D, and Mangiatordi GF
- Subjects
- Humans, Algorithms, Drug Design, Deep Learning, COVID-19
- Abstract
This study introduces a new de novo design algorithm called GENERA that combines the capabilities of a deep-learning algorithm for automated drug-like analogue design, called DeLA-Drug , with a genetic algorithm for generating molecules with desired target-oriented properties. Specifically, GENERA was applied to the angiotensin-converting enzyme 2 (ACE2) target, which is implicated in many pathological conditions, including COVID-19. The ability of GENERA to de novo design promising candidates for a specific target was assessed using two docking programs, PLANTS and GLIDE. A fitness function based on the Pareto dominance resulting from computed PLANTS and GLIDE scores was applied to demonstrate the algorithm's ability to perform multiobjective optimizations effectively. GENERA can quickly generate focused libraries that produce better scores compared to a starting set of known ACE-2 binders. This study is the first to utilize a DL-based algorithm designed for analogue generation as a mutational operator within a GA framework, representing an innovative approach to target-oriented de novo design.
- Published
- 2023
- Full Text
- View/download PDF
13. Chemical Library Space: Definition and DNA-Encoded Library Comparison Study Case.
- Author
-
Pikalyova R, Zabolotna Y, Horvath D, Marcou G, and Varnek A
- Subjects
- Gene Library, Drug Discovery methods, Small Molecule Libraries chemistry, DNA chemistry
- Abstract
The development of DNA-encoded library (DEL) technology introduced new challenges for the analysis of chemical libraries. It is often useful to consider a chemical library as a stand-alone chemoinformatic object─represented both as a collection of independent molecules, and yet an individual entity─in particular, when they are inseparable mixtures, like DELs. Herein, we introduce the concept of chemical library space (CLS), in which resident items are individual chemical libraries. We define and compare four vectorial library representations obtained using generative topographic mapping. These allow for an effective comparison of libraries, with the ability to tune and chemically interpret the similarity relationships. In particular, property-tuned CLS encodings enable to simultaneously compare libraries with respect to both property and chemotype distributions. We apply the various CLS encodings for the selection problem of DELs that optimally "match" a reference collection (here ChEMBL28), showing how the choice of the CLS descriptors may help to fine-tune the "matching" (overlap) criteria. Hence, the proposed CLS may represent a new efficient way for polyvalent analysis of thousands of chemical libraries. Selection of an easily accessible compound collection for drug discovery, as a substitute for a difficult to produce reference library, can be tuned for either primary or target-focused screening, also considering property distributions of compounds. Alternatively, selection of libraries covering novel regions of the chemical space with respect to a reference compound subspace may serve for library portfolio enrichment.
- Published
- 2023
- Full Text
- View/download PDF
14. French dispatch: GTM-based analysis of the Chimiothèque Nationale Chemical Space.
- Author
-
Oleneva P, Zabolotna Y, Horvath D, Marcou G, Bonachera F, and Varnek A
- Subjects
- Databases, Chemical
- Abstract
In order to analyze the Chimiothèque Nationale (CN) - The French National Compound Library - in the context of screening and biologically relevant compounds, the library was compared with ZINC in-stock collection and ChEMBL. This includes the study of chemical space coverage, physicochemical properties and Bemis-Murcko (BM) scaffold populations. More than 5 K CN-unique scaffolds (relative to ZINC and ChEMBL collections) were identified. Generative Topographic Maps (GTMs) accommodating those libraries were generated and used to compare the compound populations. Hierarchical GTM («zooming») was applied to generate an ensemble of maps at various resolution levels, from global overview to precise mapping of individual structures. The respective maps were added to the ChemSpace Atlas website. The analysis of synthetic accessibility in the context of combinatorial chemistry showed that only 29,7 % of CN compounds can be fully synthesized using commercially available building blocks., (© 2023 The Authors. Electroanalysis published by Wiley-VCH Verlag GmbH.)
- Published
- 2023
- Full Text
- View/download PDF
15. Inverse QSAR: Reversing Descriptor-Driven Prediction Pipeline Using Attention-Based Conditional Variational Autoencoder.
- Author
-
Bort W, Mazitov D, Horvath D, Bonachera F, Lin A, Marcou G, Baskin I, Madzhidov T, and Varnek A
- Subjects
- Molecular Docking Simulation, Quantitative Structure-Activity Relationship
- Abstract
In order to better foramize it, the notorious inverse-QSAR problem (finding structures of given QSAR-predicted properties) is considered in this paper as a two-step process including (i) finding "seed" descriptor vectors corresponding to user-constrained QSAR model output values and (ii) identifying the chemical structures best matching the "seed" vectors. The main development effort here was focused on the latter stage, proposing a new attention-based conditional variational autoencoder neural-network architecture based on recent developments in attention-based methods. The obtained results show that this workflow was capable of generating compounds predicted to display desired activity while being completely novel compared to the training database (ChEMBL). Moreover, the generated compounds show acceptable druglikeness and synthetic accessibility. Both pharmacophore and docking studies were carried out as "orthogonal" in silico validation methods, proving that some of de novo structures are, beyond being predicted active by 2D-QSAR models, clearly able to match binding 3D pharmacophores and bind the protein pocket.
- Published
- 2022
- Full Text
- View/download PDF
16. Implementation of a soft grading system for chemistry in a Moodle plugin.
- Author
-
Plyer L, Marcou G, Perves C, Schurhammer R, and Varnek A
- Abstract
We report a novel approach for grading chemical structure drawings for remote teaching, integrated into the Moodle platform. Typically, existing online platforms use a binary grading system, which often fails to give a nuanced evaluation of the answers given by the students. Therefore, such platforms are unevenly adapted to different disciplines. This is particularly true in the case of chemical structures, where most questions simply cannot be evaluated on a true/false basis. Specifically, a strict comparison of candidate and expected chemical structures is not sufficient when some tolerance is deemed acceptable. To overcome this limitation, we have developed a grading workflow based on the pairwise similarity score of two considered chemical structures. This workflow is implemented as a Moodle plugin, using the Chemdoodle engine for drawing structures and communicating with a REST server to compute the similarity score using molecular descriptors. The plugin ( https://github.com/Laboratoire-de-Chemoinformatique/moodle-qtype_molsimilarity ) is easily adaptable to any academic user; both embedding and similarity measures can be configured., (© 2022. The Author(s).)
- Published
- 2022
- Full Text
- View/download PDF
17. Chemspace Atlas: Multiscale Chemography of Ultralarge Libraries for Drug Discovery.
- Author
-
Zabolotna Y, Bonachera F, Horvath D, Lin A, Marcou G, Klimchuk O, and Varnek A
- Subjects
- DNA chemistry, Gene Library, Zinc, Drug Discovery, Small Molecule Libraries chemistry, Small Molecule Libraries pharmacology
- Abstract
Nowadays, drug discovery is inevitably intertwined with the usage of large compound collections. Understanding of their chemotype composition and physicochemical property profiles is of the highest importance for successful hit identification. Efficient polyfunctional tools allowing multifaceted analysis of constantly growing chemical libraries must be Big Data-compatible. Here, we present the freely accessible ChemSpace Atlas (https://chematlas.chimie.unistra.fr), which includes almost 40K hierarchically organized Generative Topographic Maps (GTM) accommodating up to 500 M compounds covering fragment-like, lead-like, drug-like, PPI-like, and NP-like chemical subspaces. They allow users to navigate and analyze ZINC, ChEMBL, and COCONUT from multiple perspectives on different scales: from a bird's eye view of the entire library to structural pattern detection in small clusters. Around 20 physicochemical properties and almost 750 biological activities can be visualized (associated with map zones), supporting activity profiling and analogue search. Moreover, ChemScape Atlas will be extended toward new chemical subspaces (e.g., DNA-encoded libraries and synthons) and functionalities (ADMETox profiling and property-guided de novo compound generation).
- Published
- 2022
- Full Text
- View/download PDF
18. In Vitro Evaluation of In Silico Screening Approaches in Search for Selective ACE2 Binding Chemical Probes.
- Author
-
Rayevsky AV, Poturai AS, Kravets IO, Pashenko AE, Borisova TA, Tolstanova GM, Volochnyuk DM, Borysko PO, Vadzyuk OB, Alieksieieva DO, Zabolotna Y, Klimchuk O, Horvath D, Marcou G, Ryabukhin SV, and Varnek A
- Subjects
- Humans, Molecular Docking Simulation, Peptidyl-Dipeptidase A metabolism, RNA, Viral, SARS-CoV-2, Angiotensin-Converting Enzyme 2, COVID-19
- Abstract
New models for ACE2 receptor binding, based on QSAR and docking algorithms were developed, using XRD structural data and ChEMBL 26 database hits as training sets. The selectivity of the potential ACE2-binding ligands towards Neprilysin (NEP) and ACE was evaluated. The Enamine screening collection (3.2 million compounds) was virtually screened according to the above models, in order to find possible ACE2-chemical probes, useful for the study of SARS-CoV2-induced neurological disorders. An enzymology inhibition assay for ACE2 was optimized, and the combined diversified set of predicted selective ACE2-binding molecules from QSAR modeling, docking, and ultrafast docking was screened in vitro. The in vitro hits included two novel chemotypes suitable for further optimization.
- Published
- 2022
- Full Text
- View/download PDF
19. Toward in Silico Modeling of Dynamic Combinatorial Libraries.
- Author
-
Casciuc I, Osypenko A, Kozibroda B, Horvath D, Marcou G, Bonachera F, Varnek A, and Lehn JM
- Abstract
Dynamic combinatorial libraries (DCLs) display adaptive behavior, enabled by the reversible generation of their molecular constituents from building blocks, in response to external effectors, e.g., protein receptors. So far, chemoinformatics has not yet been used for the design of DCLs-which comprise a radically different set of challenges compared to classical library design. Here, we propose a chemoinformatic model for theoretically assessing the composition of DCLs in the presence and the absence of an effector. An imine-based DCL in interaction with the effector human carbonic anhydrase II (CA II) served as a case study. Support vector regression models for the imine formation constants and imine-CA II binding were derived from, respectively, a set of 276 imines synthesized and experimentally studied in this work and 4350 inhibitors of CA II from ChEMBL. These models predict constants for all DCL constituents, to feed software assessing equilibrium concentrations. They are publicly available on the dedicated website. Models rationally selected two amines and two aldehydes predicted to yield stable imines with high affinity for CA II and provided a virtual illustration on how effector affinity regulates DCL members., Competing Interests: The authors declare no competing financial interest., (© 2022 The Authors. Published by American Chemical Society.)
- Published
- 2022
- Full Text
- View/download PDF
20. Exploration of the Chemical Space of DNA-encoded Libraries.
- Author
-
Pikalyova R, Zabolotna Y, Volochnyuk DM, Horvath D, Marcou G, and Varnek A
- Subjects
- Cheminformatics, Chemistry, Pharmaceutical, DNA chemistry, Drug Discovery methods, Small Molecule Libraries chemistry
- Abstract
DNA-Encoded Library (DEL) technology has emerged as an alternative method for bioactive molecules discovery in medicinal chemistry. It enables the simple synthesis and screening of compound libraries of enormous size. Even though it gains more and more popularity each day, there are almost no reports of chemoinformatics analysis of DEL chemical space. Therefore, in this project, we aimed to generate and analyze the ultra-large chemical space of DEL. Around 2500 DELs were designed using commercially available building blocks resulting in 2,5B DEL compounds that were compared to biologically relevant compounds from ChEMBL using Generative Topographic Mapping. This allowed to choose several optimal DELs covering the chemical space of ChEMBL to the highest extent and thus containing the maximum possible percentage of biologically relevant chemotypes. Different combinations of DELs were also analyzed to identify a set of mutually complementary libraries allowing to attain even higher coverage of ChEMBL than it is possible with one single DEL., (© 2022 Wiley-VCH GmbH.)
- Published
- 2022
- Full Text
- View/download PDF
21. Molecular Similarity Perception Based on Machine-Learning Models.
- Author
-
Gandini E, Marcou G, Bonachera F, Varnek A, Pieraccini S, and Sironi M
- Subjects
- Humans, Perception, Structure-Activity Relationship, Machine Learning, Receptors, Drug
- Abstract
Molecular similarity is an impressively broad topic with many implications in several areas of chemistry. Its roots lie in the paradigm that 'similar molecules have similar properties'. For this reason, methods for determining molecular similarity find wide application in pharmaceutical companies, e.g., in the context of structure-activity relationships. The similarity evaluation is also used in the field of chemical legislation, specifically in the procedure to judge if a new molecule can obtain the status of orphan drug with the consequent financial benefits. For this procedure, the European Medicines Agency uses experts' judgments. It is clear that the perception of the similarity depends on the observer, so the development of models to reproduce the human perception is useful. In this paper, we built models using both 2D fingerprints and 3D descriptors, i.e., molecular shape and pharmacophore descriptors. The proposed models were also evaluated by constructing a dataset of pairs of molecules which was submitted to a group of experts for the similarity judgment. The proposed machine-learning models can be useful to reduce or assist human efforts in future evaluations. For this reason, the new molecules dataset and an online tool for molecular similarity estimation have been made freely available.
- Published
- 2022
- Full Text
- View/download PDF
22. SynthI: A New Open-Source Tool for Synthon-Based Library Design.
- Author
-
Zabolotna Y, Volochnyuk DM, Ryabukhin SV, Gavrylenko K, Horvath D, Klimchuk O, Oksiuta O, Marcou G, and Varnek A
- Subjects
- Indicators and Reagents
- Abstract
Most of the existing computational tools for de novo library design are focused on the generation, rational selection, and combination of promising structural motifs to form members of the new library. However, the absence of a direct link between the chemical space of the retrosynthetically generated fragments and the pool of available reagents makes such approaches appear as rather theoretical and reality-disconnected. In this context, here we present Synthons Interpreter ( SynthI ), a new open-source toolkit for de novo library design that allows merging those two chemical spaces into a single synthons space. Here synthons are defined as actual fragments with valid valences and special labels, specifying the position and the nature of reactive centers. They can be issued from either the "breakup" of reference compounds according to 38 retrosynthetic rules or real reagents, after leaving group withdrawal or transformation. Such an approach not only enables the design of synthetically accessible libraries and analog generation but also facilitates reagents (building blocks) analysis in the medicinal chemistry context. SynthI code is publicly available at https://github.com/Laboratoire-de-Chemoinformatique/SynthI.
- Published
- 2022
- Full Text
- View/download PDF
23. A Close-up Look at the Chemical Space of Commercially Available Building Blocks for Medicinal Chemistry.
- Author
-
Zabolotna Y, Volochnyuk DM, Ryabukhin SV, Horvath D, Gavrilenko KS, Marcou G, Moroz YS, Oksiuta O, and Varnek A
- Subjects
- Indicators and Reagents, Chemistry, Pharmaceutical, Drug Discovery methods
- Abstract
The ability to efficiently synthesize desired compounds can be a limiting factor for chemical space exploration in drug discovery. This ability is conditioned not only by the existence of well-studied synthetic protocols but also by the availability of corresponding reagents, so-called building blocks (BBs). In this work, we present a detailed analysis of the chemical space of 400 000 purchasable BBs. The chemical space was defined by corresponding synthons─fragments contributed to the final molecules upon reaction. They allow an analysis of BB physicochemical properties and diversity, unbiased by the leaving and protective groups in actual reagents. The main classes of BBs were analyzed in terms of their availability, rule-of-two-defined quality, and diversity. Available BBs were eventually compared to a reference set of biologically relevant synthons derived from ChEMBL fragmentation, in order to illustrate how well they cover the actual medicinal chemistry needs. This was performed on a newly constructed universal generative topographic map of synthon chemical space that enables visualization of both libraries and analysis of their overlapped and library-specific regions.
- Published
- 2022
- Full Text
- View/download PDF
24. Computational screening methodology identifies effective solvents for CO 2 capture.
- Author
-
Orlov AA, Valtz A, Coquelet C, Rozanska X, Wimmer E, Marcou G, Horvath D, Poulain B, Varnek A, and de Meyer F
- Abstract
Carbon capture and storage technologies are projected to increasingly contribute to cleaner energy transitions by significantly reducing CO
2 emissions from fossil fuel-driven power and industrial plants. The industry standard technology for CO2 capture is chemical absorption with aqueous alkanolamines, which are often being mixed with an activator, piperazine, to increase the overall CO2 absorption rate. Inefficiency of the process due to the parasitic energy required for thermal regeneration of the solvent drives the search for new tertiary amines with better kinetics. Improving the efficiency of experimental screening using computational tools is challenging due to the complex nature of chemical absorption. We have developed a novel computational approach that combines kinetic experiments, molecular simulations and machine learning for the in silico screening of hundreds of prospective candidates and identify a class of tertiary amines that absorbs CO2 faster than a typical commercial solvent when mixed with piperazine, which was confirmed experimentally., (© 2022. The Author(s).)- Published
- 2022
- Full Text
- View/download PDF
25. Rapid Discrimination of Neuromyelitis Optica Spectrum Disorder and Multiple Sclerosis Using Machine Learning on Infrared Spectra of Sera.
- Author
-
El Khoury Y, Gebelin M, de Sèze J, Patte-Mensah C, Marcou G, Varnek A, Mensah-Nyagan AG, Hellwig P, and Collongues N
- Subjects
- Aquaporin 4, Autoantibodies, Humans, Machine Learning, Myelin-Oligodendrocyte Glycoprotein, Multiple Sclerosis diagnosis, Neuromyelitis Optica
- Abstract
Neuromyelitis optica spectrum disorder (NMOSD) and multiple sclerosis (MS) are both autoimmune inflammatory and demyelinating diseases of the central nervous system. NMOSD is a highly disabling disease and rapid introduction of the appropriate treatment at the acute phase is crucial to prevent sequelae. Specific criteria were established in 2015 and provide keys to distinguish NMOSD and MS. One of the most reliable criteria for NMOSD diagnosis is detection in patient's serum of an antibody that attacks the water channel aquaporin-4 (AQP-4). Another target in NMOSD is myelin oligodendrocyte glycoprotein (MOG), delineating a new spectrum of diseases called MOG-associated diseases. Lastly, patients with NMOSD can be negative for both AQP-4 and MOG antibodies. At disease onset, NMOSD symptoms are very similar to MS symptoms from a clinical and radiological perspective. Thus, at first episode, given the urgency of starting the anti-inflammatory treatment, there is an unmet need to differentiate NMOSD subtypes from MS. Here, we used Fourier transform infrared spectroscopy in combination with a machine learning algorithm with the aim of distinguishing the infrared signatures of sera of a first episode of NMOSD from those of a first episode of relapsing-remitting MS, as well as from those of healthy subjects and patients with chronic inflammatory demyelinating polyneuropathy. Our results showed that NMOSD patients were distinguished from MS patients and healthy subjects with a sensitivity of 100% and a specificity of 100%. We also discuss the distinction between the different NMOSD serostatuses. The coupling of infrared spectroscopy of sera to machine learning is a promising cost-effective, rapid and reliable differential diagnosis tool capable of helping to gain valuable time in patients' treatment.
- Published
- 2022
- Full Text
- View/download PDF
26. Comprehensive analysis of commercial fragment libraries.
- Author
-
Revillo Imbernon J, Jacquemard C, Bret G, Marcou G, and Kellenberger E
- Abstract
Screening of fragment libraries is a valuable approach to the drug discovery process. The quality of the library is one of the keys to success, and more particularly the design or choice of a library has to meet the specificities of the research program. In this study, we made an inventory of the commercial fragment libraries and we established a methodology which allows any library to be positioned in relation to the complete offer currently on the market, by addressing the following questions: does this chemical library look like another chemical library? What is the coverage of the current chemical space by this chemical library? What are the characteristic structural features of the fragments of this chemical library? We based our analysis on 2D and 3D chemical descriptors, framework class generation and the generative topographic map. We identified 59 270 scaffolds, which can be searched in a dedicated web site (https://gtmfrag.drugdesign.unistra.fr) and developed a model which accounts for fragment diversity while being easy to interpret (download at 10.5281/zenodo.5534434)., Competing Interests: There are no conflicts to declare., (This journal is © The Royal Society of Chemistry.)
- Published
- 2021
- Full Text
- View/download PDF
27. Chemoinformatics-Driven Design of New Physical Solvents for Selective CO 2 Absorption.
- Author
-
Orlov AA, Demenko DY, Bignaud C, Valtz A, Marcou G, Horvath D, Coquelet C, Varnek A, and de Meyer F
- Subjects
- Gases, Solubility, Solvents, Carbon Dioxide, Cheminformatics
- Abstract
The removal of CO
2 from gases is an important industrial process in the transition to a low-carbon economy. The use of selective physical (co-)solvents is especially perspective in cases when the amount of CO2 is large as it enables one to lower the energy requirements for solvent regeneration. However, only a few physical solvents have found industrial application and the design of new ones can pave the way to more efficient gas treatment techniques. Experimental screening of gas solubility is a labor-intensive process, and solubility modeling is a viable strategy to reduce the number of solvents subject to experimental measurements. In this paper, a chemoinformatics-based modeling workflow was applied to build a predictive model for the solubility of CO2 and four other industrially important gases (CO, CH4 , H2 , and N2 ). A dataset containing solubilities of gases in 280 solvents was collected from literature sources and supplemented with the new data for six solvents measured in the present study. A modeling workflow based on the usage of several state-of-the-art machine learning algorithms was applied to establish quantitative structure-solubility relationships. The best models were used to perform virtual screening of the industrially produced chemicals. It enabled the identification of compounds with high predicted CO2 solubility and selectivity toward other gases. The prediction for one of the compounds, 4-methylmorpholine, was confirmed experimentally.- Published
- 2021
- Full Text
- View/download PDF
28. Erratum: CATMoS: Collaborative Acute Toxicity Modeling Suite.
- Author
-
Mansouri K, Karmaus AL, Fitzpatrick J, Patlewicz G, Pradeep P, Alberga D, Alepee N, Allen TEH, Allen D, Alves VM, Andrade CH, Auernhammer TR, Ballabio D, Bell S, Benfenati E, Bhattacharya S, Bastos JV, Boyd S, Brown JB, Capuzzi SJ, Chushak Y, Ciallella H, Clark AM, Consonni V, Daga PR, Ekins S, Farag S, Fedorov M, Fourches D, Gadaleta D, Gao F, Gearhart JM, Goh G, Goodman JM, Grisoni F, Grulke CM, Hartung T, Hirn M, Karpov P, Korotcov A, Lavado GJ, Lawless M, Li X, Luechtefeld T, Lunghini F, Mangiatordi GF, Marcou G, Marsh D, Martin T, Mauri A, Muratov EN, Myatt GJ, Nguyen DT, Nicolotti O, Note R, Pande P, Parks AK, Peryea T, Polash AH, Rallo R, Roncaglioni A, Rowlands C, Ruiz P, Russo DP, Sayed A, Sayre R, Sheils T, Siegel C, Silva AC, Simeonov A, Sosnin S, Southall N, Strickland J, Tang Y, Teppen B, Tetko IV, Thomas D, Tkachenko V, Todeschini R, Toma C, Tripodi I, Trisciuzzi D, Tropsha A, Varnek A, Vukovic K, Wang Z, Wang L, Waters KM, Wedlake AJ, Wijeyesakere SJ, Wilson D, Xiao Z, Yang H, Zahoranszky-Kohalmi G, Zakharov AV, Zhang FF, Zhang Z, Zhao T, Zhu H, Zorn KM, Casey W, and Kleinstreuer NC
- Published
- 2021
- Full Text
- View/download PDF
29. NP Navigator: A New Look at the Natural Product Chemical Space.
- Author
-
Zabolotna Y, Ertl P, Horvath D, Bonachera F, Marcou G, and Varnek A
- Subjects
- Combinatorial Chemistry Techniques, Macromolecular Substances analysis, Zinc chemistry, Biological Products
- Abstract
Natural products (NPs), being evolutionary selected over millions of years to bind to biological macromolecules, remained an important source of inspiration for medicinal chemists even after the advent of efficient drug discovery technologies such as combinatorial chemistry and high-throughput screening. Thus, there is a strong demand for efficient and user-friendly computational tools that allow to analyze large libraries of NPs. In this context, we introduce NP Navigator - a freely available intuitive online tool for visualization and navigation through the chemical space of NPs and NP-like molecules. It is based on the hierarchical ensemble of generative topographic maps, featuring NPs from the COlleCtion of Open NatUral producTs (COCONUT), bioactive compounds from ChEMBL and commercially available molecules from ZINC. NP Navigator allows to efficiently analyze different aspects of NPs - chemotype distribution, physicochemical properties, biological activity and commercial availability of NPs. The latter concerns not only purchasable NPs but also their close analogs that can be considered as synthetic mimetics of NPs or pseudo-NPs., (© 2021 Wiley-VCH GmbH.)
- Published
- 2021
- Full Text
- View/download PDF
30. Erratum: CATMoS: Collaborative Acute Toxicity Modeling Suite.
- Author
-
Mansouri K, Karmaus A, Fitzpatrick J, Patlewicz G, Pradeep P, Alberga D, Alepee N, Allen TEH, Allen D, Alves VM, Andrade CH, Auernhammer TR, Ballabio D, Bell S, Benfenati E, Bhattacharya S, Bastos JV, Boyd S, Brown JB, Capuzzi SJ, Chushak Y, Ciallella H, Clark AM, Consonni V, Daga PR, Ekins S, Farag S, Fedorov M, Fourches D, Gadaleta D, Gao F, Gearhart JM, Goh G, Goodman JM, Grisoni F, Grulke CM, Hartung T, Hirn M, Karpov P, Korotcov A, Lavado GJ, Lawless M, Li X, Luechtefeld T, Lunghini F, Mangiatordi GF, Marcou G, Marsh D, Martin T, Mauri A, Muratov EN, Myatt GJ, Nguyen DT, Nicolotti O, Note R, Pande P, Parks AK, Peryea T, Polash A, Rallo R, Roncaglioni A, Rowlands C, Ruiz P, Russo D, Sayed A, Sayre R, Sheils T, Siegel C, Silva AC, Simeonov A, Sosnin S, Southall N, Strickland J, Tang Y, Teppen B, Tetko IV, Thomas D, Tkachenko V, Todeschini R, Toma C, Tripodi I, Trisciuzzi D, Tropsha A, Varnek A, Vukovic K, Wang Z, Wang L, Waters KM, Wedlake AJ, Wijeyesakere SJ, Wilson D, Xiao Z, Yang H, Zahoranszky-Kohalmi G, Zakharov AV, Zhang FF, Zhang Z, Zhao T, Zhu H, Zorn KM, Casey W, and Kleinstreuer NC
- Published
- 2021
- Full Text
- View/download PDF
31. DMSO Solubility Assessment for Fragment-Based Screening.
- Author
-
Baybekov S, Marcou G, Ramos P, Saurel O, Galzi JL, and Varnek A
- Abstract
In this paper, we report comprehensive experimental and chemoinformatics analyses of the solubility of small organic molecules ("fragments") in dimethyl sulfoxide (DMSO) in the context of their ability to be tested in screening experiments. Here, DMSO solubility of 939 fragments has been measured experimentally using an NMR technique. A Support Vector Classification model was built on the obtained data using the ISIDA fragment descriptors. The analysis revealed 34 outliers: experimental issues were retrospectively identified for 28 of them. The updated model performs well in 5-fold cross-validation (balanced accuracy = 0.78). The datasets are available on the Zenodo platform (DOI:10.5281/zenodo.4767511) and the model is available on the website of the Laboratory of Chemoinformatics.
- Published
- 2021
- Full Text
- View/download PDF
32. CATMoS: Collaborative Acute Toxicity Modeling Suite.
- Author
-
Mansouri K, Karmaus AL, Fitzpatrick J, Patlewicz G, Pradeep P, Alberga D, Alepee N, Allen TEH, Allen D, Alves VM, Andrade CH, Auernhammer TR, Ballabio D, Bell S, Benfenati E, Bhattacharya S, Bastos JV, Boyd S, Brown JB, Capuzzi SJ, Chushak Y, Ciallella H, Clark AM, Consonni V, Daga PR, Ekins S, Farag S, Fedorov M, Fourches D, Gadaleta D, Gao F, Gearhart JM, Goh G, Goodman JM, Grisoni F, Grulke CM, Hartung T, Hirn M, Karpov P, Korotcov A, Lavado GJ, Lawless M, Li X, Luechtefeld T, Lunghini F, Mangiatordi GF, Marcou G, Marsh D, Martin T, Mauri A, Muratov EN, Myatt GJ, Nguyen DT, Nicolotti O, Note R, Pande P, Parks AK, Peryea T, Polash AH, Rallo R, Roncaglioni A, Rowlands C, Ruiz P, Russo DP, Sayed A, Sayre R, Sheils T, Siegel C, Silva AC, Simeonov A, Sosnin S, Southall N, Strickland J, Tang Y, Teppen B, Tetko IV, Thomas D, Tkachenko V, Todeschini R, Toma C, Tripodi I, Trisciuzzi D, Tropsha A, Varnek A, Vukovic K, Wang Z, Wang L, Waters KM, Wedlake AJ, Wijeyesakere SJ, Wilson D, Xiao Z, Yang H, Zahoranszky-Kohalmi G, Zakharov AV, Zhang FF, Zhang Z, Zhao T, Zhu H, Zorn KM, Casey W, and Kleinstreuer NC
- Subjects
- Animals, Computer Simulation, Rats, Toxicity Tests, Acute, United States, United States Environmental Protection Agency, Government Agencies
- Abstract
Background: Humans are exposed to tens of thousands of chemical substances that need to be assessed for their potential toxicity. Acute systemic toxicity testing serves as the basis for regulatory hazard classification, labeling, and risk management. However, it is cost- and time-prohibitive to evaluate all new and existing chemicals using traditional rodent acute toxicity tests. In silico models built using existing data facilitate rapid acute toxicity predictions without using animals., Objectives: The U.S. Interagency Coordinating Committee on the Validation of Alternative Methods (ICCVAM) Acute Toxicity Workgroup organized an international collaboration to develop in silico models for predicting acute oral toxicity based on five different end points: Lethal Dose 50 ( LD 50 value, U.S. Environmental Protection Agency hazard (four) categories, Globally Harmonized System for Classification and Labeling hazard (five) categories, very toxic chemicals [ LD 50 ( LD 50 ≤ 50 mg / kg )], and nontoxic chemicals ( L D 50 > 2,000 mg / kg )., Methods: An acute oral toxicity data inventory for 11,992 chemicals was compiled, split into training and evaluation sets, and made available to 35 participating international research groups that submitted a total of 139 predictive models. Predictions that fell within the applicability domains of the submitted models were evaluated using external validation sets. These were then combined into consensus models to leverage strengths of individual approaches., Results: The resulting consensus predictions, which leverage the collective strengths of each individual model, form the Collaborative Acute Toxicity Modeling Suite (CATMoS). CATMoS demonstrated high performance in terms of accuracy and robustness when compared with in vivo results., Discussion: CATMoS is being evaluated by regulatory agencies for its utility and applicability as a potential replacement for in vivo rat acute oral toxicity studies. CATMoS predictions for more than 800,000 chemicals have been made available via the National Toxicology Program's Integrated Chemical Environment tools and data sets (ice.ntp.niehs.nih.gov). The models are also implemented in a free, standalone, open-source tool, OPERA, which allows predictions of new and untested chemicals to be made. https://doi.org/10.1289/EHP8495.
- Published
- 2021
- Full Text
- View/download PDF
33. Discovery of novel chemical reactions by deep generative recurrent neural network.
- Author
-
Bort W, Baskin II, Gimadiev T, Mukanov A, Nugmanov R, Sidorov P, Marcou G, Horvath D, Klimchuk O, Madzhidov T, and Varnek A
- Abstract
The "creativity" of Artificial Intelligence (AI) in terms of generating de novo molecular structures opened a novel paradigm in compound design, weaknesses (stability & feasibility issues of such structures) notwithstanding. Here we show that "creative" AI may be as successfully taught to enumerate novel chemical reactions that are stoichiometrically coherent. Furthermore, when coupled to reaction space cartography, de novo reaction design may be focused on the desired reaction class. A sequence-to-sequence autoencoder with bidirectional Long Short-Term Memory layers was trained on on-purpose developed "SMILES/CGR" strings, encoding reactions of the USPTO database. The autoencoder latent space was visualized on a generative topographic map. Novel latent space points were sampled around a map area populated by Suzuki reactions and decoded to corresponding reactions. These can be critically analyzed by the expert, cleaned of irrelevant functional groups and eventually experimentally attempted, herewith enlarging the synthetic purpose of popular synthetic pathways.
- Published
- 2021
- Full Text
- View/download PDF
34. Endocrine disruption: the noise in available data adversely impacts the models' performance.
- Author
-
Lunghini F, Marcou G, Azam P, Bonachera F, Enrici MH, Van Miert E, and Varnek A
- Subjects
- Humans, Models, Theoretical, Endocrine Disruptors chemistry, Quantitative Structure-Activity Relationship, Receptors, Androgen chemistry, Receptors, Estrogen chemistry
- Abstract
This paper is devoted to the analysis of available experimental data and preparation of predictive models for binding affinity of molecules with respect to two nuclear receptors involved in endocrine disruption (ED): the oestrogen (ER) and the androgen (AR) receptors. The ED-relevant data were retrieved from multiple sources, including the CERAPP, CoMPARA, and the Tox21 projects as well as ChEMBL and PubChem databases. Data analysis performed with the help of generative topographic mapping revealed the problem of low agreement between experimental values from different sources. Collected data were used to train both classification models for ER and AR binding activities and regression models for relative binding affinity (RBA) and median inhibition concentration (IC50). These models displayed relatively poor performance in classification (sensitivities ER = 0.34, AR = 0.49) and in regression (determination coefficient r
2 for the RBA and IC50 models in external validation varied from 0.44 to 0.76). Our analysis demonstrates that low models' performance resulted from misinterpreted experimental endpoints or wrongly reported values, thus confirming the observations reported in CERAPP and CoMPARA studies. Developed models and collected data sets included of 6215 (ER) and 3789 (AR) unique compounds, which are freely available.- Published
- 2021
- Full Text
- View/download PDF
35. Chemography: Searching for Hidden Treasures.
- Author
-
Zabolotna Y, Lin A, Horvath D, Marcou G, Volochnyuk DM, and Varnek A
- Subjects
- Chemistry, Pharmaceutical
- Abstract
The days when medicinal chemistry was limited to a few series of compounds of therapeutic interest are long gone. Nowadays, no human may succeed to acquire a complete overview of more than a billion existing or feasible compounds within which the potential "blockbuster drugs" are well hidden and yet only a few mouse clicks away. To reach these "hidden treasures", we adapted the generative topographic mapping method to enable efficient navigation through the chemical space, from a global overview to a structural pattern detection, covering, for the first time, the complete ZINC library of purchasable compounds, relative to 1.6 million biologically relevant ChEMBL molecules. About 40 000 hierarchical maps of the chemical space were constructed. Structural motifs inherent to only one library were identified. Roughly 20 000 off-market ChEMBL compound families represent incentives to enrich commercial catalogs. Alternatively, 125 000 ZINC-specific compound classes, absent in structure-activity bases, are novel paths to explore in medicinal chemistry. The complete list of these chemotypes can be downloaded using the link https://forms.gle/B6bUJj82t9EfmttV6.
- Published
- 2021
- Full Text
- View/download PDF
36. Trustworthiness, the Key to Grid-Based Map-Driven Predictive Model Enhancement and Applicability Domain Control.
- Author
-
Horvath D, Marcou G, and Varnek A
- Subjects
- Algorithms
- Abstract
In chemography, grid-based maps sample molecular descriptor space by injecting a set of nodes, and then linking them to some regular 2D grid representing the map. They include self-organizing maps (SOMs) and generative topographic maps (GTMs). Grid-based maps are predictive because any compound thereupon projected can "inherit" the properties of its residence node(s)-node properties themselves "inherited" from node-neighboring training set compounds. This Article proposes a formalism to define the trustworthiness of these nodes as "providers" of structure-activity information captured from training compounds. An empirical four-parameter node trustworthiness (NT) function of density (sparsely populated nodes are less trustworthy) and coherence (nodes with training set residents of divergent properties are less trustworthy) is proposed. Based upon it, a trustworthiness score T is used to delimit the applicability domain (AD) by means of a trustworthiness threshold TT. For each parameter setup, success of ensuing inside-AD predictions is monitored. It is seen that setup-specific success levels (averaged over large pools of prediction challenges) are highly covariant, irrespectively of the targets of prediction challenges, of the (classification or regression) type of problems, of the specific parametrization, and even of the nature (GTM or SOM) of underlying maps. Thus, success levels determined on the basis of regression problems (445 target-specific affinity QSAR sets) on GTMs and levels returned by completely unrelated classification problems (319 target-specific active-/inactive-labeled sets) on SOMs were seen to correlate to a degree of 70%. Therefore, a common, general-purpose setup of the herein proposed parametric AD definition was shown to generally apply to grid-based map-driven property prediction problems.
- Published
- 2020
- Full Text
- View/download PDF
37. A Chemographic Audit of anti-Coronavirus Structure-activity Information from Public Databases (ChEMBL).
- Author
-
Horvath D, Orlov A, Osolodkin DI, Ishmukhametov AA, Marcou G, and Varnek A
- Subjects
- Animals, Antiviral Agents chemistry, Coronavirus drug effects, Humans, SARS-CoV-2 drug effects, COVID-19 Drug Treatment, Antiviral Agents pharmacology, Computer Simulation, Coronavirus Infections drug therapy, Drug Discovery, Quantitative Structure-Activity Relationship, Viral Proteins chemistry
- Abstract
Discovery of drugs against newly emerged pathogenic agents like the SARS-CoV-2 coronavirus (CoV) must be based on previous research against related species. Scientists need to get acquainted with and develop a global oversight over so-far tested molecules. Chemography (herein used Generative Topographic Mapping, in particular) places structures on a human-readable 2D map (obtained by dimensionality reduction of the chemical space of molecular descriptors) and is thus well suited for such an audit. The goal is to map medicinal chemistry efforts so far targeted against CoVs. This includes comparing libraries tested against various virus species/genera, predicting their polypharmacological profiles and highlighting often encountered chemotypes. Maps are challenged to provide predictive activity landscapes against viral proteins. Definition of "anti-CoV" map zones led to selection of therein residing 380 potential anti-CoV agents, out of a vast pool of 800 M organic compounds., (© 2020 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.)
- Published
- 2020
- Full Text
- View/download PDF
38. Parallel Generative Topographic Mapping: An Efficient Approach for Big Data Handling.
- Author
-
Lin A, Baskin II, Marcou G, Horvath D, Beck B, and Varnek A
- Subjects
- Benchmarking, Databases, Chemical, Entropy, Algorithms, Big Data
- Abstract
Generative Topographic Mapping (GTM) can be efficiently used to visualize, analyze and model large chemical data. The GTM manifold needs to span the chemical space deemed relevant for a given problem. Therefore, the Frame set (FS) of compounds used for the manifold construction must well cover a given chemical space. Intuitively, the FS size must raise with the size and diversity of the target library. At the same time, the GTM training can be very slow or even becomes technically impossible at FS sizes of the order of 10
5 compounds - which is a very small number compared to today's commercially accessible compounds, and, especially, to the theoretically feasible molecules. In order to solve this problem, we propose a Parallel GTM algorithm based on the merging of "intermediate" manifolds constructed in parallel for different subsets of molecules. An ensemble of these subsets forms a FS for the "final" manifold. In order to assess the efficiency of the new algorithm, 80 GTMs were built on the FSs of different sizes ranging from 10 to 1.8 M compounds selected from the ChEMBL database. Each GTM was challenged to build classification models for up to 712 biological activities (depending on the FS size). With the novel parallel GTM procedure, we could thus cover the entire spectrum of possible FS sizes, whereas previous studies were forced to rely on the working hypothesis that FS sizes of few thousands of compounds are sufficient to describe the ChEMBL chemical space. In fact, this study formally proves this to be true: a FS containing only 5000 randomly picked compounds is sufficient to represent the entire ChEMBL collection (1.8 M molecules), in the sense that a further increase of FS compound numbers has no benefice impact on the predictive propensity of the above-mentioned 712 activity classification models. Parallel GTM may, however, be required to generate maps based on very large FS, that might improve chemical space cartography of big commercial and virtual libraries, approaching billions of compounds., (© 2020 The Authors. Published by Wiley-VCH Verlag GmbH & Co. KGaA.)- Published
- 2020
- Full Text
- View/download PDF
39. Consensus QSAR models estimating acute toxicity to aquatic organisms from different trophic levels: algae, Daphnia and fish.
- Author
-
Lunghini F, Marcou G, Azam P, Enrici MH, Van Miert E, and Varnek A
- Subjects
- Animals, Support Vector Machine, Daphnia drug effects, Fishes, Microalgae drug effects, Quantitative Structure-Activity Relationship, Toxicity Tests, Acute, Water Pollutants, Chemical toxicity
- Abstract
We report new consensus models estimating acute toxicity for algae, Daphnia and fish endpoints. We assembled a large collection of 3680 public unique compounds annotated by, at least, one experimental value for the given endpoint. Support Vector Machine models were internally and externally validated following the OECD principles. Reasonable predictive performances were achieved (RMSE
ext = 0.56-0.78) which are in line with those of state-of-the-art models. The known structural alerts are compared with analysis of the atomic contributions to these models obtained using the ISIDA/ColorAtom utility. A benchmarking against existing tools has been carried out on a set of compounds considered more representative and relevant for the chemical space of the current chemical industry. Our model scored one of the best accuracy and data coverage. Nevertheless, industrial data performances were noticeably lower than those on public data, indicating that existing models fail to meet the industrial needs. Thus, final models were updated with the inclusion of new industrial compounds, extending the applicability domain and relevance for application in an industrial context. Generated models and collected public data are made freely available.- Published
- 2020
- Full Text
- View/download PDF
40. Autoignition temperature: comprehensive data analysis and predictive models.
- Author
-
Baskin II, Lozano S, Durot M, Marcou G, Horvath D, and Varnek A
- Subjects
- Chemical Phenomena, Data Analysis, Models, Chemical, Fires, Quantitative Structure-Activity Relationship, Temperature
- Abstract
Here we report a new predictive model for autoignition temperature (AIT), an important physical parameter widely used to assess potential safety hazards of combustible materials. Available structure-AIT data extracted from different sources were critically analysed. Support vector regression (SVR) models on different data subsets were built in order to identify a reliable compound set on which a realistic model could be built. This led to a selection of the dataset containing 875 compounds annotated with AIT values. The thereupon-based SVR model performs reasonably well in cross-validation with the determination coefficient r
2 = 0.77 and mean absolute error MAE = 37.8°C. External validation on 20 industrial compounds missing in the training set confirmed its good predictive power ( MAE = 28.7°C).- Published
- 2020
- Full Text
- View/download PDF
41. Publicly available QSPR models for environmental media persistence.
- Author
-
Lunghini F, Marcou G, Azam P, Enrici MH, Van Miert E, and Varnek A
- Subjects
- Bayes Theorem, Computer Simulation, Environmental Pollutants chemistry, Half-Life, Models, Chemical, Support Vector Machine, Environmental Pollutants pharmacology, Quantitative Structure-Activity Relationship
- Abstract
The evaluation of persistency of chemicals in environmental media (water, soil, sediment) is included in European Regulations, in the context of the Persistence, Bioaccumulation and Toxicity (PBT) assessment. In silico predictions are valuable alternatives for compounds screening and prioritization. However, already existing prediction tools have limitations: narrow applicability domains due to their relatively small training sets, and lack of medium-specific models. A dataset of 1579 unique compounds has been collected, merging several persistence data sources annotated by, at least, one experimental dissipation half-life value for the given environmental medium. This dataset was used to train binary classification models discriminating persistent/non-persistent (P/nP) compounds based on REACH half-life thresholds on sediment, water and soil compartments. Models were built using ISIDA (In SIlico design and Data Analysis) fragment descriptors and support vector regression, random forest and naïve Bayesian machine-learning methods. All models scored satisfactory performances: sediment being the most performing one (BA
ext = 0.91), followed by water (BAext = 0.77) and soil (BAext = 0.76). The latter suffer from low detection of persistent ('P') compounds (Snext = 0.50), reflecting discrepancies in reported half-life measurements among the different data sources. Generated models and collected data are made publicly available.- Published
- 2020
- Full Text
- View/download PDF
42. Diversifying chemical libraries with generative topographic mapping.
- Author
-
Lin A, Beck B, Horvath D, Marcou G, and Varnek A
- Subjects
- Algorithms, Computer-Aided Design statistics & numerical data, Databases, Chemical statistics & numerical data, Databases, Pharmaceutical statistics & numerical data, Drug Design, Drug Development statistics & numerical data, Drug Discovery statistics & numerical data, Humans, Molecular Structure, Software, User-Computer Interface, Drug Discovery methods, Small Molecule Libraries
- Abstract
Generative topographic mapping was used to investigate the possibility to diversify the in-house compounds collection of Boehringer Ingelheim (BI). For this purpose, a 2D map covering the relevant chemical space was trained, and the BI compound library was compared to the Aldrich-Market Select (AMS) database of more than 8M purchasable compounds. In order to discover new (sub)structures, the "AutoZoom" tool was developed and applied in order to analyze chemotypes of molecules residing in heavily populated zones of a map and to extract the corresponding maximum common substructures. A set of 401K new structures from the AMS database was retrieved and checked for drug-likeness and biological activity.
- Published
- 2020
- Full Text
- View/download PDF
43. "Big Data" Fast Chemoinformatics Model to Predict Generalized Born Radius and Solvent Accessibility as a Function of Geometry.
- Author
-
Horvath D, Marcou G, and Varnek A
- Subjects
- Proteins, Solvents, Thermodynamics, Cheminformatics, Radius
- Abstract
The Generalized Born (GB) solvent model is offering the best accuracy/computing effort ratio yet requires drastic simplifications to estimate of the Effective Born Radii (EBR) in bypassing a too expensive volume integration step. EBRs are a measure of the degree of burial of an atom and not very sensitive to small changes of geometry: in molecular dynamics, the costly EBR update procedure is not mandatory at every step. This work however aims at implementing a GB model into the Sampler for Multiple Protein-Ligand Entities (S4MPLE) evolutionary algorithm with mandatory EBR updates at each step triggering arbitrarily large geometric changes. Therefore, a quantitative structure-property relationship has been developed in order to express the EBRs as a linear function of both the topological neighborhood and geometric occupancy of the space around atoms. A training set of 810 molecular systems, starting from fragment-like to drug-like compounds, proteins, host-guest systems, and ligand-protein complexes, has been compiled. For each species, S4MPLE generated several hundreds of random conformers. For each atom in each geometry of each species, its "standard" EBR was calculated by numeric integration and associated to topological and geometric descriptors of the atom neighborhood. This training set (EBR, atom descriptors) involving >5 M entries was subjected to a boot-strapping multilinear regression process with descriptor selection. In parallel, the strategy was repurposed to also learn atomic solvent-accessible areas (SA) based on the same descriptors. Resulting linear equations were challenged to predict EBR and SA values for a similarly compiled external set of >2000 new molecular systems. Solvation energies calculated with estimated EBR and SA match "standard" energies within the typical error of a force-field-based approach (a few kilocalories per mole). Given the extreme diversity of molecular systems covered by the model, this simple EBR/SA estimator covers a vast applicability domain.
- Published
- 2020
- Full Text
- View/download PDF
44. Application of the mol2vec Technology to Large-size Data Visualization and Analysis.
- Author
-
Shibayama S, Marcou G, Horvath D, Baskin II, Funatsu K, and Varnek A
- Subjects
- Principal Component Analysis, Algorithms, Data Analysis, Data Visualization
- Abstract
Generative Topographic Mapping (GTM) is a dimensionality reduction method, which is widely used for both data visualization and structure-activity modeling. Large dimensionality of the initial data space may require significant computational resources and slow down the GTM construction. Therefore, it may be meaningful to reduce the number of descriptors used for encoding molecular structures. The Principal Component Analysis (PCA), a standard preprocessing tool, suffers from the information loss upon the dimensionality reduction. As an alternative, we propose to use substructure vector embedding provided by the mol2vec technique. In addition to the data dimensionality reduction, this technology also accounts for proximity of substructures in molecular graphs. In this study, dimensionality of large descriptor spaces of ISIDA fragment descriptors or Morgan fingerprints were reduced using either the PCA or the mol2vec method. The latter significantly speeds up GTM training without compromising its predictive power in bioactivity classification tasks., (© 2020 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.)
- Published
- 2020
- Full Text
- View/download PDF
45. Modelling of ready biodegradability based on combined public and industrial data sources.
- Author
-
Lunghini F, Marcou G, Gantzer P, Azam P, Horvath D, Van Miert E, and Varnek A
- Subjects
- Algorithms, Benchmarking, Biodegradation, Environmental, Computer Simulation, Quantitative Structure-Activity Relationship, Reproducibility of Results, Databases, Chemical standards, Environmental Pollutants chemistry, Models, Chemical
- Abstract
The European Registration, Evaluation, Authorization and Restriction of Chemical Substances Regulation, requires marketed chemicals to be evaluated for Ready Biodegradability (RB), considering in silico prediction as valid alternative to experimental testing. However, currently available models may not be relevant to predict compounds of industrial interest, due to accuracy and applicability domain restriction issues. In this work, we present a new and extended RB dataset (2830 compounds), issued by the merging of several public data sources. It was used to train classification models, which were externally validated and benchmarked against already-existing tools on a set of 316 compounds coming from the industrial context. New models showed good performances in terms of predictive power (Balance Accuracy (BA) = 0.74-0.79) and data coverage (83-91%). The Generative Topographic Mapping approach identified several chemotypes and structural motifs unique to the industrial dataset, highlighting for which chemical classes currently available models may have less reliable predictions. Finally, public and industrial data were merged into global dataset containing 3146 compounds. This is the biggest dataset reported in the literature so far, covering some chemotypes absent in the public data. Thus, predictive model developed on the Global dataset has larger applicability domain than the existing ones.
- Published
- 2020
- Full Text
- View/download PDF
46. CoMPARA: Collaborative Modeling Project for Androgen Receptor Activity.
- Author
-
Mansouri K, Kleinstreuer N, Abdelaziz AM, Alberga D, Alves VM, Andersson PL, Andrade CH, Bai F, Balabin I, Ballabio D, Benfenati E, Bhhatarai B, Boyer S, Chen J, Consonni V, Farag S, Fourches D, García-Sosa AT, Gramatica P, Grisoni F, Grulke CM, Hong H, Horvath D, Hu X, Huang R, Jeliazkova N, Li J, Li X, Liu H, Manganelli S, Mangiatordi GF, Maran U, Marcou G, Martin T, Muratov E, Nguyen DT, Nicolotti O, Nikolov NG, Norinder U, Papa E, Petitjean M, Piir G, Pogodin P, Poroikov V, Qiao X, Richard AM, Roncaglioni A, Ruiz P, Rupakheti C, Sakkiah S, Sangion A, Schramm KW, Selvaraj C, Shah I, Sild S, Sun L, Taboureau O, Tang Y, Tetko IV, Todeschini R, Tong W, Trisciuzzi D, Tropsha A, Van Den Driessche G, Varnek A, Wang Z, Wedebye EB, Williams AJ, Xie H, Zakharov AV, Zheng Z, and Judson RS
- Subjects
- Androgens, Databases, Factual, High-Throughput Screening Assays, Humans, Receptors, Androgen, United States, United States Environmental Protection Agency, Computer Simulation, Endocrine Disruptors
- Abstract
Background: Endocrine disrupting chemicals (EDCs) are xenobiotics that mimic the interaction of natural hormones and alter synthesis, transport, or metabolic pathways. The prospect of EDCs causing adverse health effects in humans and wildlife has led to the development of scientific and regulatory approaches for evaluating bioactivity. This need is being addressed using high-throughput screening (HTS) in vitro approaches and computational modeling., Objectives: In support of the Endocrine Disruptor Screening Program, the U.S. Environmental Protection Agency (EPA) led two worldwide consortiums to virtually screen chemicals for their potential estrogenic and androgenic activities. Here, we describe the Collaborative Modeling Project for Androgen Receptor Activity (CoMPARA) efforts, which follows the steps of the Collaborative Estrogen Receptor Activity Prediction Project (CERAPP)., Methods: The CoMPARA list of screened chemicals built on CERAPP's list of 32,464 chemicals to include additional chemicals of interest, as well as simulated ToxCast™ metabolites, totaling 55,450 chemical structures. Computational toxicology scientists from 25 international groups contributed 91 predictive models for binding, agonist, and antagonist activity predictions. Models were underpinned by a common training set of 1,746 chemicals compiled from a combined data set of 11 ToxCast™/Tox21 HTS in vitro assays., Results: The resulting models were evaluated using curated literature data extracted from different sources. To overcome the limitations of single-model approaches, CoMPARA predictions were combined into consensus models that provided averaged predictive accuracy of approximately 80% for the evaluation set., Discussion: The strengths and limitations of the consensus predictions were discussed with example chemicals; then, the models were implemented into the free and open-source OPERA application to enable screening of new chemicals with a defined applicability domain and accuracy assessment. This implementation was used to screen the entire EPA DSSTox database of ∼ 875,000 chemicals, and their predicted AR activities have been made available on the EPA CompTox Chemicals dashboard and National Toxicology Program's Integrated Chemical Environment. https://doi.org/10.1289/EHP5580.
- Published
- 2020
- Full Text
- View/download PDF
47. Generative topographic mapping in drug design.
- Author
-
Horvath D, Marcou G, and Varnek A
- Subjects
- Humans, Drug Design, Drug Discovery, Models, Molecular, Molecular Conformation, Peptide Mapping
- Abstract
This is a review article of Generative Topographic Mapping (GTM) - a non-linear dimensionality reduction technique producing generative 2D maps of high-dimensional vector spaces - and its specific applications in Drug Design (chemical space cartography, compound library design and analysis, virtual screening, pharmacological profiling, de novo drug design, conformational space & docking interaction cartography, etc.) Written by chemoinformaticians for potential users among medicinal chemists and biologists, the article purposely avoids all underlying mathematics. First, the GTM concept is intuitively explained, based on the strong analogies with the rather popular Self-Organizing Maps (SOMs), which are well established library analysis tools. GTM is basically a fuzzy-logics-based generalization of SOMs. The second part of the review, some of published GTM applications in drug design are briefly revisited., (Copyright © 2020 Elsevier Ltd. All rights reserved.)
- Published
- 2019
- Full Text
- View/download PDF
48. Consensus models to predict oral rat acute toxicity and validation on a dataset coming from the industrial context.
- Author
-
Lunghini F, Marcou G, Azam P, Horvath D, Patoux R, Van Miert E, and Varnek A
- Subjects
- Administration, Oral, Animal Testing Alternatives standards, Animals, Computer Simulation, Consensus, Databases, Chemical, Machine Learning, Quantitative Structure-Activity Relationship, Rats, Reproducibility of Results, Toxicity Tests, Acute standards, Animal Testing Alternatives methods, Models, Theoretical, Toxicity Tests, Acute methods
- Abstract
We report predictive models of acute oral systemic toxicity representing a follow-up of our previous work in the framework of the NICEATM project. It includes the update of original models through the addition of new data and an external validation of the models using a dataset relevant for the chemical industry context. A regression model for LD
50 and multi-class classification model for toxicity classes according to the Global Harmonized System categories were prepared. ISIDA descriptors were used to encode molecular structures. Machine learning algorithms included support vector machine (SVM), random forest (RF) and naïve Bayesian. Selected individual models were combined in consensus. The different datasets were compared using the generative topographic mapping approach. It appeared that the NICEATM datasets were lacking some relevant chemotypes for chemical industry. The new models trained on enlarged data sets have applicability domains (AD) sufficiently large to accommodate industrial compounds. The fraction of compounds inside the models' AD increased from 58% (NICEATM model) to 94% (new model). The increase of training sets improved models' prediction performance: RMSE values decreased from 0.56 to 0.47 and balanced accuracies increased from 0.69 to 0.71 for NICEATM and new models, respectively.- Published
- 2019
- Full Text
- View/download PDF
49. In silico Design, Virtual Screening and Synthesis of Novel Electrolytic Solvents.
- Author
-
Marcou G, Flamme B, Beck G, Chagnes A, Mokshyna O, Horvath D, and Varnek A
- Subjects
- Electric Conductivity, Electric Power Supplies, Electrochemical Techniques, Electrolytes analysis, Esters chemical synthesis, Esters chemistry, Lithium chemistry, Models, Molecular, Molecular Structure, Quantitative Structure-Activity Relationship, Software, Solvents analysis, Sulfones chemical synthesis, Sulfones chemistry, Support Vector Machine, Computer Simulation, Drug Evaluation, Preclinical, Electrolytes chemical synthesis, Electrolytes chemistry, Solvents chemical synthesis, Solvents chemistry
- Abstract
We report the building, validation and release of QSPR (Quantitative Structure Property Relationship) models aiming to guide the design of new solvents for the next generation of Li-ion batteries. The dataset compiled from the literature included oxidation potentials (E
ox ), specific ionic conductivities (κ), melting points (Tm ) and boiling points (Tb ) for 103 electrolytes. Each of the resulting consensus models assembled 9-19 individual Support Vector Machine models built on different sets of ISIDA fragment descriptors.(1) They were implemented in the ISIDA/Predictor software. Developed models were used to screen a virtual library of 9965 esters and sulfones. The most promising compounds prioritized according to theoretically estimated properties were synthesized and experimentally tested., (© 2019 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.)- Published
- 2019
- Full Text
- View/download PDF
50. Serum-based differentiation between multiple sclerosis and amyotrophic lateral sclerosis by Random Forest classification of FTIR spectra.
- Author
-
El Khoury Y, Collongues N, De Sèze J, Gulsari V, Patte-Mensah C, Marcou G, Varnek A, Mensah-Nyagan AG, and Hellwig P
- Subjects
- Adult, Aged, Aged, 80 and over, Algorithms, Amyotrophic Lateral Sclerosis drug therapy, Biotin therapeutic use, Decision Trees, Diagnosis, Differential, Female, Humans, Male, Middle Aged, Multiple Sclerosis drug therapy, Pilot Projects, Spectroscopy, Fourier Transform Infrared methods, Amyotrophic Lateral Sclerosis diagnosis, Biomarkers blood, Multiple Sclerosis diagnosis
- Abstract
The challenging diagnosis and differentiation between multiple sclerosis and amyotrophic lateral sclerosis relies on the clinical assessment of the symptoms along with magnetic resonance imaging and sampling cerebrospinal fluid for the search of biomarkers for either disease. Despite the progress made in imaging techniques and biomarker identification, misdiagnosis still occurs. Here we used 2.5 μL of serum samples to obtain the infrared spectroscopic signatures of sera of multiple sclerosis and amyotrophic lateral sclerosis patients and compared them to those of healthy controls. The spectra are then classified with the help of a two-fold Random Forest cross-validation algorithm. This approach shows that infrared spectroscopy is powerful in discriminating between the two diseases and healthy controls by offering high specificity for multiple sclerosis (100%) and amyotrophic lateral sclerosis (98%). In addition, data after six and twelve months of treatment of the multiple sclerosis patients with biotin are discussed.
- Published
- 2019
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.