81 results on '"Baskin II"'
Search Results
2. Modelling the multi-target selectivity: o-phosphorylated oximes as serine hydrolase inhibitors
- Author
-
Makhaeva GF, Baskin II, Radchenko EV, Palyulin VA, and Zefirov NS
- Subjects
Chemistry ,QD1-999 - Published
- 2009
- Full Text
- View/download PDF
3. Modelling the multi-target selectivity: o-phosphorylated oximes as serine hydrolase inhibitors
- Author
-
Palyulin, VA, Radchenko, EV, Baskin, II, Makhaeva, GF, and Zefirov, NS
- Published
- 2009
- Full Text
- View/download PDF
4. Additive inductive learning in QSAR/QSPR studies and molecular modeling
- Author
-
Baskin, II, Zhokhova, NI, Palyulin, VA, and Zefirov, NS
- Published
- 2009
- Full Text
- View/download PDF
5. QSAR Modeling: Where Have You Been? Where Are You Going To?
- Author
-
Cherkasov, A, Muratov, E, Fourches, D, Varnek, A, Baskin, I, Cronin, M, Dearden, J, Gramatica, P, Martin, Y, Todeschini, R, Consonni, V, Kuz’Min, V, Cramer, R, Benigni, R, Yang, C, Rathman, J, Terfloth, L, Gasteiger, J, Richard, A, Tropsha, A, Muratov, EN, Baskin, II, Martin, YC, Kuz’min, VE, Tropsha, A., TODESCHINI, ROBERTO, CONSONNI, VIVIANA, Cherkasov, A, Muratov, E, Fourches, D, Varnek, A, Baskin, I, Cronin, M, Dearden, J, Gramatica, P, Martin, Y, Todeschini, R, Consonni, V, Kuz’Min, V, Cramer, R, Benigni, R, Yang, C, Rathman, J, Terfloth, L, Gasteiger, J, Richard, A, Tropsha, A, Muratov, EN, Baskin, II, Martin, YC, Kuz’min, VE, Tropsha, A., TODESCHINI, ROBERTO, and CONSONNI, VIVIANA
- Abstract
Quantitative structure−activity relationship modeling is one of the major computational tools employed in medicinal chemistry. However, throughout its entire history it has drawn both praise and criticism concerning its reliability, limitations, successes, and failures. In this paper, we discuss (i) the development and evolution of QSAR; (ii) the current trends, unsolved problems, and pressing challenges; and (iii) several novel and emerging applications of QSAR modeling. Throughout this discussion, we provide guidelines for QSAR development, validation, and application, which are summarized in best practices for building rigorously validated and externally predictive QSAR models. We hope that this Perspective will help communications between computational and experimental chemists toward collaborative development and use of QSAR models. We also believe that the guidelines presented here will help journal editors and reviewers apply more stringent scientific standards to manuscripts reporting new QSAR studies, as well as encourage the use of high quality, validated QSARs for regulatory decision making.
- Published
- 2014
6. Inverse problem in structure - property relationships studies for the case of informational topological indices
- Author
-
Skvortsova, Mi, Slovokhotova, Ol, Baskin, Ii, Vladimir Palyulin, and Zefirov, Ns
7. QUANTITATIVE STRUCTURE-ACTIVITY RELATIONSHIP STUDY OF MUTAGENIC ACTIVITY OF CHEMICAL-COMPOUNDS - SUBSTITUTED BIPHENYLS
- Author
-
Baskin, Ii, Lyubimova, Ik, Abilev, Sk, Vladimir Palyulin, and Zefirov, Ns
8. Investigation of the structure-activity relationship of the antimutagens
- Author
-
Mustafaev, On, Abilev, Sk, Lubimova, Ik, Tarasov, Va, Alekperov, Uk, Halberstam, Nm, Baskin, Ii, and Vladimir Palyulin
9. THE INVERSE PROBLEM IN QSAR/QSPR-STUDIES FOR THE CASE OF TOPOLOGICAL INDEXES, CHARACTERIZING MOLECULAR SHAPE (KIER INDEXES)
- Author
-
Skvortsova, Mi, Baskin, Ii, Slovokhotova, Ol, Vladimir Palyulin, and Zefirov, Ns
10. Application of molecular mechanics to the study of regio- and stereoselectivity of cation-dependent [2+2]-photocycloaddition in crown ether styryl dyes
- Author
-
Baskin, Ii, Alexandra Freidzon, Bagatur Yants, Aa, Gromov, Sp, and Alfimov, Mv
11. QSAR Modeling Based on Conformation Ensembles Using a Multi-Instance Learning Approach.
- Author
-
Zankov DV, Matveieva M, Nikonenko AV, Nugmanov RI, Baskin II, Varnek A, Polishchuk P, and Madzhidov TI
- Subjects
- Databases, Factual, Drug Discovery, Molecular Conformation, Algorithms, Quantitative Structure-Activity Relationship
- Abstract
Modern QSAR approaches have wide practical applications in drug discovery for designing potentially bioactive molecules. If such models are based on the use of 2D descriptors, important information contained in the spatial structures of molecules is lost. The major problem in constructing models using 3D descriptors is the choice of a putative bioactive conformation, which affects the predictive performance. The multi-instance (MI) learning approach considering multiple conformations in model training could be a reasonable solution to the above problem. In this study, we implemented several multi-instance algorithms, both conventional and based on deep learning, and investigated their performance. We compared the performance of MI-QSAR models with those based on the classical single-instance QSAR (SI-QSAR) approach in which each molecule is encoded by either 2D descriptors computed for the corresponding molecular graph or 3D descriptors issued for a single lowest energy conformation. The calculations were carried out on 175 data sets extracted from the ChEMBL23 database. It is demonstrated that (i) MI-QSAR outperforms SI-QSAR in numerous cases and (ii) MI algorithms can automatically identify plausible bioactive conformations.
- Published
- 2021
- Full Text
- View/download PDF
12. Practical constraints with machine learning in drug discovery.
- Author
-
Baskin II
- Subjects
- Humans, Drug Discovery, Machine Learning
- Published
- 2021
- Full Text
- View/download PDF
13. Cross-validation strategies in QSPR modelling of chemical reactions.
- Author
-
Rakhimbekova A, Akhmetshin TN, Minibaeva GI, Nugmanov RI, Gimadiev TR, Madzhidov TI, Baskin II, and Varnek A
- Subjects
- Software, Validation Studies as Topic, Models, Chemical, Quantitative Structure-Activity Relationship
- Abstract
In this article, we consider cross-validation of the quantitative structure-property relationship models for reactions and show that the conventional k-fold cross-validation (CV) procedure gives an 'optimistically' biased assessment of prediction performance. To address this issue, we suggest two strategies of model cross-validation, 'transformation-out' CV, and 'solvent-out' CV. Unlike the conventional k-fold cross-validation approach that does not consider the nature of objects, the proposed procedures provide an unbiased estimation of the predictive performance of the models for novel types of structural transformations in chemical reactions and reactions going under new conditions. Both the suggested strategies have been applied to predict the rate constants of bimolecular elimination and nucleophilic substitution reactions, and Diels-Alder cycloaddition. All suggested cross-validation methodologies and tutorial are implemented in the open-source software package CIMtools (https://github.com/cimm-kzn/CIMtools).
- Published
- 2021
- Full Text
- View/download PDF
14. Discovery of novel chemical reactions by deep generative recurrent neural network.
- Author
-
Bort W, Baskin II, Gimadiev T, Mukanov A, Nugmanov R, Sidorov P, Marcou G, Horvath D, Klimchuk O, Madzhidov T, and Varnek A
- Abstract
The "creativity" of Artificial Intelligence (AI) in terms of generating de novo molecular structures opened a novel paradigm in compound design, weaknesses (stability & feasibility issues of such structures) notwithstanding. Here we show that "creative" AI may be as successfully taught to enumerate novel chemical reactions that are stoichiometrically coherent. Furthermore, when coupled to reaction space cartography, de novo reaction design may be focused on the desired reaction class. A sequence-to-sequence autoencoder with bidirectional Long Short-Term Memory layers was trained on on-purpose developed "SMILES/CGR" strings, encoding reactions of the USPTO database. The autoencoder latent space was visualized on a generative topographic map. Novel latent space points were sampled around a map area populated by Suzuki reactions and decoded to corresponding reactions. These can be critically analyzed by the expert, cleaned of irrelevant functional groups and eventually experimentally attempted, herewith enlarging the synthetic purpose of popular synthetic pathways.
- Published
- 2021
- Full Text
- View/download PDF
15. Parallel Generative Topographic Mapping: An Efficient Approach for Big Data Handling.
- Author
-
Lin A, Baskin II, Marcou G, Horvath D, Beck B, and Varnek A
- Subjects
- Benchmarking, Databases, Chemical, Entropy, Algorithms, Big Data
- Abstract
Generative Topographic Mapping (GTM) can be efficiently used to visualize, analyze and model large chemical data. The GTM manifold needs to span the chemical space deemed relevant for a given problem. Therefore, the Frame set (FS) of compounds used for the manifold construction must well cover a given chemical space. Intuitively, the FS size must raise with the size and diversity of the target library. At the same time, the GTM training can be very slow or even becomes technically impossible at FS sizes of the order of 10
5 compounds - which is a very small number compared to today's commercially accessible compounds, and, especially, to the theoretically feasible molecules. In order to solve this problem, we propose a Parallel GTM algorithm based on the merging of "intermediate" manifolds constructed in parallel for different subsets of molecules. An ensemble of these subsets forms a FS for the "final" manifold. In order to assess the efficiency of the new algorithm, 80 GTMs were built on the FSs of different sizes ranging from 10 to 1.8 M compounds selected from the ChEMBL database. Each GTM was challenged to build classification models for up to 712 biological activities (depending on the FS size). With the novel parallel GTM procedure, we could thus cover the entire spectrum of possible FS sizes, whereas previous studies were forced to rely on the working hypothesis that FS sizes of few thousands of compounds are sufficient to describe the ChEMBL chemical space. In fact, this study formally proves this to be true: a FS containing only 5000 randomly picked compounds is sufficient to represent the entire ChEMBL collection (1.8 M molecules), in the sense that a further increase of FS compound numbers has no benefice impact on the predictive propensity of the above-mentioned 712 activity classification models. Parallel GTM may, however, be required to generate maps based on very large FS, that might improve chemical space cartography of big commercial and virtual libraries, approaching billions of compounds., (© 2020 The Authors. Published by Wiley-VCH Verlag GmbH & Co. KGaA.)- Published
- 2020
- Full Text
- View/download PDF
16. Comprehensive Analysis of Applicability Domains of QSPR Models for Chemical Reactions.
- Author
-
Rakhimbekova A, Madzhidov TI, Nugmanov RI, Gimadiev TR, Baskin II, and Varnek A
- Subjects
- Chemical Phenomena, Kinetics, Models, Molecular, Cheminformatics trends, Protein Domains, Quantitative Structure-Activity Relationship, Thermodynamics
- Abstract
Nowadays, the problem of the model's applicability domain (AD) definition is an active research topic in chemoinformatics. Although many various AD definitions for the models predicting properties of molecules (Quantitative Structure-Activity/Property Relationship (QSAR/QSPR) models) were described in the literature, no one for chemical reactions (Quantitative Reaction-Property Relationships (QRPR)) has been reported to date. The point is that a chemical reaction is a much more complex object than an individual molecule, and its yield, thermodynamic and kinetic characteristics depend not only on the structures of reactants and products but also on experimental conditions. The QRPR models' performance largely depends on the way that chemical transformation is encoded. In this study, various AD definition methods extensively used in QSAR/QSPR studies of individual molecules, as well as several novel approaches suggested in this work for reactions, were benchmarked on several reaction datasets. The ability to exclude wrong reaction types, increase coverage, improve the model performance and detect Y-outliers were tested. As a result, several "best" AD definitions for the QRPR models predicting reaction characteristics have been revealed and tested on a previously published external dataset with a clear AD definition problem.
- Published
- 2020
- Full Text
- View/download PDF
17. Autoignition temperature: comprehensive data analysis and predictive models.
- Author
-
Baskin II, Lozano S, Durot M, Marcou G, Horvath D, and Varnek A
- Subjects
- Chemical Phenomena, Data Analysis, Models, Chemical, Fires, Quantitative Structure-Activity Relationship, Temperature
- Abstract
Here we report a new predictive model for autoignition temperature (AIT), an important physical parameter widely used to assess potential safety hazards of combustible materials. Available structure-AIT data extracted from different sources were critically analysed. Support vector regression (SVR) models on different data subsets were built in order to identify a reliable compound set on which a realistic model could be built. This led to a selection of the dataset containing 875 compounds annotated with AIT values. The thereupon-based SVR model performs reasonably well in cross-validation with the determination coefficient r
2 = 0.77 and mean absolute error MAE = 37.8°C. External validation on 20 industrial compounds missing in the training set confirmed its good predictive power ( MAE = 28.7°C).- Published
- 2020
- Full Text
- View/download PDF
18. The power of deep learning to ligand-based novel drug discovery.
- Author
-
Baskin II
- Subjects
- Algorithms, Drug Design, Humans, Ligands, Deep Learning, Drug Discovery methods, Neural Networks, Computer
- Abstract
Introduction: Deep discriminative and generative neural-network models are becoming an integral part of the modern approach to ligand-based novel drug discovery. The variety of different architectures of neural networks, the methods of their training, and the procedures of generating new molecules require expert knowledge to choose the most suitable approach., Areas Covered: Three different approaches to deep learning use in ligand-based drug discovery are considered: virtual screening, neural generative models, and mutation-based structure generation. Several architectures of neural networks for building either discriminative or generative models are considered in this paper, including deep multilayer neural networks, different kinds of convolutional neural networks, recurrent neural networks, and several types of autoencoders. Several kinds of learning frameworks are also considered, including adversarial learning and reinforcement learning. Different types of representations for generating molecules, including SMILES, graphs, and several alternative string representations are also considered., Expert Opinion: Two kinds of problem should be solved in order to make the models built using deep neural networks, especially generative models, a valuable option in ligand-based drug discovery: the issue of interpretability and explainability of deep-learning models and the issue of synthetic accessibility of novel compounds designed by deep-learning algorithms.
- Published
- 2020
- Full Text
- View/download PDF
19. Correction: QSAR without borders.
- Author
-
Muratov EN, Bajorath J, Sheridan RP, Tetko IV, Filimonov D, Poroikov V, Oprea TI, Baskin II, Varnek A, Roitberg A, Isayev O, Curtarolo S, Fourches D, Cohen Y, Aspuru-Guzik A, Winkler DA, Agrafiotis D, Cherkasov A, and Tropsha A
- Abstract
Correction for 'QSAR without borders' by Eugene N. Muratov et al., Chem. Soc. Rev., 2020, DOI: 10.1039/d0cs00098a.
- Published
- 2020
- Full Text
- View/download PDF
20. QSAR without borders.
- Author
-
Muratov EN, Bajorath J, Sheridan RP, Tetko IV, Filimonov D, Poroikov V, Oprea TI, Baskin II, Varnek A, Roitberg A, Isayev O, Curtarolo S, Fourches D, Cohen Y, Aspuru-Guzik A, Winkler DA, Agrafiotis D, Cherkasov A, and Tropsha A
- Subjects
- Algorithms, Animals, Artificial Intelligence, Databases, Factual, Drug Design, History, 20th Century, History, 21st Century, Humans, Models, Molecular, Quantitative Structure-Activity Relationship, Quantum Theory, Reproducibility of Results, Chemistry, Pharmaceutical methods, Drug-Related Side Effects and Adverse Reactions metabolism, Pharmaceutical Preparations chemistry
- Abstract
Prediction of chemical bioactivity and physical properties has been one of the most important applications of statistical and more recently, machine learning and artificial intelligence methods in chemical sciences. This field of research, broadly known as quantitative structure-activity relationships (QSAR) modeling, has developed many important algorithms and has found a broad range of applications in physical organic and medicinal chemistry in the past 55+ years. This Perspective summarizes recent technological advances in QSAR modeling but it also highlights the applicability of algorithms, modeling methods, and validation practices developed in QSAR to a wide range of research areas outside of traditional QSAR boundaries including synthesis planning, nanotechnology, materials science, biomaterials, and clinical informatics. As modern research methods generate rapidly increasing amounts of data, the knowledge of robust data-driven modelling methods professed within the QSAR field can become essential for scientists working both within and outside of chemical research. We hope that this contribution highlighting the generalizable components of QSAR modeling will serve to address this challenge.
- Published
- 2020
- Full Text
- View/download PDF
21. Application of the mol2vec Technology to Large-size Data Visualization and Analysis.
- Author
-
Shibayama S, Marcou G, Horvath D, Baskin II, Funatsu K, and Varnek A
- Subjects
- Principal Component Analysis, Algorithms, Data Analysis, Data Visualization
- Abstract
Generative Topographic Mapping (GTM) is a dimensionality reduction method, which is widely used for both data visualization and structure-activity modeling. Large dimensionality of the initial data space may require significant computational resources and slow down the GTM construction. Therefore, it may be meaningful to reduce the number of descriptors used for encoding molecular structures. The Principal Component Analysis (PCA), a standard preprocessing tool, suffers from the information loss upon the dimensionality reduction. As an alternative, we propose to use substructure vector embedding provided by the mol2vec technique. In addition to the data dimensionality reduction, this technology also accounts for proximity of substructures in molecular graphs. In this study, dimensionality of large descriptor spaces of ISIDA fragment descriptors or Morgan fingerprints were reduced using either the PCA or the mol2vec method. The latter significantly speeds up GTM training without compromising its predictive power in bioactivity classification tasks., (© 2020 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.)
- Published
- 2020
- Full Text
- View/download PDF
22. Conjugated Quantitative Structure-Property Relationship Models: Application to Simultaneous Prediction of Tautomeric Equilibrium Constants and Acidity of Molecules.
- Author
-
Zankov DV, Madzhidov TI, Rakhimbekova A, Gimadiev TR, Nugmanov RI, Kazymova MA, Baskin II, and Varnek A
- Subjects
- Acids chemistry, Algorithms, Drug Discovery, Models, Chemical, Molecular Structure, Neural Networks, Computer, Quantitative Structure-Activity Relationship, Solvents chemistry, Stereoisomerism, Organic Chemicals chemistry, Pharmaceutical Preparations chemistry
- Abstract
Here, we describe a concept of conjugated models for several properties (activities) linked by a strict mathematical relationship. This relationship can be directly integrated analytically into the ridge regression (RR) algorithm or accounted for in a special case of "twin" neural networks (NN). Developed approaches were applied to the modeling of the logarithm of the prototropic tautomeric constant (logK
T ) which can be expressed as the difference between the acidity constants (pKa) of two related tautomers. Both conjugated and individual RR and NN models for logKT and pKa were developed. The modeling set included 639 tautomeric constants and 2371 acidity constants of organic molecules in various solvents. A descriptor vector for each reaction resulted from the concatenation of structural descriptors and some parameters for reaction conditions. For the former, atom-centered substructural fragments describing acid sites in tautomer molecules were used. The latter were automatically identified using the condensed graph of reaction approach. Conjugated models performed similarly to the best individual models for logKT and pKa. At the same time, the physically grounded relationship between logKT and pKa was respected only for conjugated but not individual models.- Published
- 2019
- Full Text
- View/download PDF
23. Continuous molecular fields and the concept of molecular co-fields in structure-activity studies.
- Author
-
Baskin II and Zhokhova NI
- Subjects
- Hydrogen Bonding, Hydrophobic and Hydrophilic Interactions, Models, Molecular, Structure-Activity Relationship, Molecular Structure
- Abstract
The analysis of information on the spatial structure of molecules and the physical fields of their interactions with biological targets is extremely important for solving various problems in drug discovery. This mini-review article surveys the main features of the continuous molecular fields approach and its use for analyzing structure-activity relationships in 3D space, building 3D quantitative structure-activity models and conducting similarity based virtual screening. Particular attention is paid to the consideration of the concept of molecular co-fields and their use for the interpretation of 3D structure-activity models. The principles of molecular design based on the overlapping and the similarity of molecular fields with corresponding co-fields are formulated.
- Published
- 2019
- Full Text
- View/download PDF
24. Is one-shot learning a viable option in drug discovery?
- Author
-
Baskin II
- Subjects
- Humans, Neural Networks, Computer, Drug Development methods, Drug Discovery methods, Machine Learning
- Published
- 2019
- Full Text
- View/download PDF
25. De Novo Molecular Design by Combining Deep Autoencoder Recurrent Neural Networks with Generative Topographic Mapping.
- Author
-
Sattarov B, Baskin II, Horvath D, Marcou G, Bjerrum EJ, and Varnek A
- Subjects
- Catalytic Domain, Drug Evaluation, Preclinical, Ligands, Molecular Docking Simulation, Receptor, Adenosine A2A chemistry, Receptor, Adenosine A2A metabolism, Small Molecule Libraries metabolism, Small Molecule Libraries pharmacology, Deep Learning, Drug Design
- Abstract
Here we show that Generative Topographic Mapping (GTM) can be used to explore the latent space of the SMILES-based autoencoders and generate focused molecular libraries of interest. We have built a sequence-to-sequence neural network with Bidirectional Long Short-Term Memory layers and trained it on the SMILES strings from ChEMBL23. Very high reconstruction rates of the test set molecules were achieved (>98%), which are comparable to the ones reported in related publications. Using GTM, we have visualized the autoencoder latent space on the two-dimensional topographic map. Targeted map zones can be used for generating novel molecular structures by sampling associated latent space points and decoding them to SMILES. The sampling method based on a genetic algorithm was introduced to optimize compound properties "on the fly". The generated focused molecular libraries were shown to contain original and a priori feasible compounds which, pending actual synthesis and testing, showed encouraging behavior in independent structure-based affinity estimation procedures (pharmacophore matching, docking).
- Published
- 2019
- Full Text
- View/download PDF
26. Visualization and Analysis of Complex Reaction Data: The Case of Tautomeric Equilibria.
- Author
-
Glavatskikh M, Madzhidov T, Baskin II, Horvath D, Nugmanov R, Gimadiev T, Marcou G, and Varnek A
- Subjects
- Isomerism, Solvents chemistry, Algorithms, Molecular Dynamics Simulation, Organic Chemicals chemistry
- Abstract
Generative Topographic Mapping (GTM) approach was successfully used to visualize, analyze and model the equilibrium constants (K
T ) of tautomeric transformations as a function of both structure and experimental conditions. The modeling set contained 695 entries corresponding to 350 unique transformations of 10 tautomeric types, for which KT values were measured in different solvents and at different temperatures. Two types of GTM-based classification models were trained: first, a "structural" approach focused on separating tautomeric classes, irrespective of reaction conditions, then a "general" approach accounting for both structure and conditions. In both cases, the cross-validated Balanced Accuracy was close to 1 and the clusters, assembling equilibria of particular classes, were well separated in 2-dimentional GTM latent space. Data points corresponding to similar transformations measured under different experimental conditions, are well separated on the maps. Additionally, GTM-driven regression models were found to have their predictive performance dependent on different scenarios of the selection of local fragment descriptors involving special marked atoms (proton donors or acceptors). The application of local descriptors significantly improves the model performance in 5-fold cross-validation: RMSE=0.63 and 0.82 logKT units with and without local descriptors, respectively. This trend was as well observed for SVR calculations, performed for the comparison purposes., (© 2018 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.)- Published
- 2018
- Full Text
- View/download PDF
27. Assessment of tautomer distribution using the condensed reaction graph approach.
- Author
-
Gimadiev TR, Madzhidov TI, Nugmanov RI, Baskin II, Antipin IS, and Varnek A
- Subjects
- Isomerism, Molecular Structure, Quantitative Structure-Activity Relationship, Thermodynamics, Water chemistry, Computer Simulation, Solvents chemistry, Temperature
- Abstract
We report the first direct QSPR modeling of equilibrium constants of tautomeric transformations (logK
T ) in different solvents and at different temperatures, which do not require intermediate assessment of acidity (basicity) constants for all tautomeric forms. The key step of the modeling consisted in the merging of two tautomers in one sole molecular graph ("condensed reaction graph") which enables to compute molecular descriptors characterizing entire equilibrium. The support vector regression method was used to build the models. The training set consisted of 785 transformations belonging to 11 types of tautomeric reactions with equilibrium constants measured in different solvents and at different temperatures. The models obtained perform well both in cross-validation (Q2 = 0.81 RMSE = 0.7 logKT units) and on two external test sets. Benchmarking studies demonstrate that our models outperform results obtained with DFT B3LYP/6-311 ++ G(d,p) and ChemAxon Tautomerizer applicable only in water at room temperature.- Published
- 2018
- Full Text
- View/download PDF
28. Machine Learning Methods in Computational Toxicology.
- Author
-
Baskin II
- Subjects
- Algorithms, Deep Learning, Linear Models, Neural Networks, Computer, Quantitative Structure-Activity Relationship, Support Vector Machine, Computer Simulation, Machine Learning, Toxicology methods
- Abstract
Various methods of machine learning, supervised and unsupervised, linear and nonlinear, classification and regression, in combination with various types of molecular descriptors, both "handcrafted" and "data-driven," are considered in the context of their use in computational toxicology. The use of multiple linear regression, variants of naïve Bayes classifier, k-nearest neighbors, support vector machine, decision trees, ensemble learning, random forest, several types of neural networks, and deep learning is the focus of attention of this review. The role of fragment descriptors, graph mining, and graph kernels is highlighted. The application of unsupervised methods, such as Kohonen's self-organizing maps and related approaches, which allow for combining predictions with data analysis and visualization, is also considered. The necessity of applying a wide range of machine learning methods in computational toxicology is underlined.
- Published
- 2018
- Full Text
- View/download PDF
29. Energy-based Neural Networks as a Tool for Harmony-based Virtual Screening.
- Author
-
Zhokhova NI and Baskin II
- Subjects
- Algorithms, Neural Networks, Computer
- Abstract
In Energy-Based Neural Networks (EBNNs), relationships between variables are captured by means of a scalar function conventionally called "energy". In this article, we introduce a procedure of "harmony search", which looks for compounds providing the lowest energies for the EBNNs trained on active compounds. It can be considered as a special kind of similarity search that takes into account regularities in the structures of active compounds. In this paper, we show that harmony search can be used for performing virtual screening. The performance of the harmony search based on two types of EBNNs, the Hopfield Networks (HNs) and the Restricted Boltzmann Machines (RBMs), was compared with the performance of the similarity search based on Tanimoto coefficient with "data fusion". The AUC measure for ROC curves and 1 %-enrichment rates for 20 targets were used in the benchmarking. Five different scores were computed: the energy for HNs, the free energy and the reconstruction error for RBMs, the mean and the maximum values of Tanimoto coefficients. The performance of the harmony search was shown to be comparable or even superior (significantly for several targets) to the performance of the similarity search. Important advantages of using the harmony search for virtual screening are very high computational efficiency of prediction, the ability to reveal and take into account regularities in active structures, flexibility and interpretability of models, etc., (© 2017 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.)
- Published
- 2017
- Full Text
- View/download PDF
30. Predictive cartography of metal binders using generative topographic mapping.
- Author
-
Baskin II, Solov'ev VP, Bagatur'yants AA, and Varnek A
- Subjects
- Algorithms, Computer Simulation, Ligands, Likelihood Functions, Molecular Structure, Structure-Activity Relationship, Thermodynamics, Chelating Agents chemistry, Coordination Complexes chemistry, Metals chemistry
- Abstract
Generative topographic mapping (GTM) approach is used to visualize the chemical space of organic molecules (L) with respect to binding a wide range of 41 different metal cations (M) and also to build predictive models for stability constants (logK) of 1:1 (M:L) complexes using "density maps," "activity landscapes," and "selectivity landscapes" techniques. A two-dimensional map describing the entire set of 2962 metal binders reveals the selectivity and promiscuity zones with respect to individual metals or groups of metals with similar chemical properties (lanthanides, transition metals, etc). The GTM-based global (for entire set) and local (for selected subsets) models demonstrate a good predictive performance in the cross-validation procedure. It is also shown that the data likelihood could be used as a definition of the applicability domain of GTM-based models. Thus, the GTM approach represents an efficient tool for the predictive cartography of metal binders, which can both visualize their chemical space and predict the affinity profile of metals for new ligands.
- Published
- 2017
- Full Text
- View/download PDF
31. A renaissance of neural networks in drug discovery.
- Author
-
Baskin II, Winkler D, and Tetko IV
- Subjects
- Computer-Aided Design, Drug Delivery Systems, Humans, Models, Biological, Pharmaceutical Preparations administration & dosage, Pharmaceutical Preparations chemistry, Structure-Activity Relationship, Drug Design, Drug Discovery methods, Neural Networks, Computer
- Abstract
Introduction: Neural networks are becoming a very popular method for solving machine learning and artificial intelligence problems. The variety of neural network types and their application to drug discovery requires expert knowledge to choose the most appropriate approach., Areas Covered: In this review, the authors discuss traditional and newly emerging neural network approaches to drug discovery. Their focus is on backpropagation neural networks and their variants, self-organizing maps and associated methods, and a relatively new technique, deep learning. The most important technical issues are discussed including overfitting and its prevention through regularization, ensemble and multitask modeling, model interpretation, and estimation of applicability domain. Different aspects of using neural networks in drug discovery are considered: building structure-activity models with respect to various targets; predicting drug selectivity, toxicity profiles, ADMET and physicochemical properties; characteristics of drug-delivery systems and virtual screening., Expert Opinion: Neural networks continue to grow in importance for drug discovery. Recent developments in deep learning suggests further improvements may be gained in the analysis of large chemical data sets. It's anticipated that neural networks will be more widely used in drug discovery in the future, and applied in non-traditional areas such as drug delivery systems, biologically compatible materials, and regenerative medicine.
- Published
- 2016
- Full Text
- View/download PDF
32. Stargate GTM: Bridging Descriptor and Activity Spaces.
- Author
-
Gaspar HA, Baskin II, Marcou G, Horvath D, and Varnek A
- Subjects
- Artificial Intelligence, Humans, Probability, Quantitative Structure-Activity Relationship, Algorithms, Computer-Aided Design, Drug Design
- Abstract
Predicting the activity profile of a molecule or discovering structures possessing a specific activity profile are two important goals in chemoinformatics, which could be achieved by bridging activity and molecular descriptor spaces. In this paper, we introduce the "Stargate" version of the Generative Topographic Mapping approach (S-GTM) in which two different multidimensional spaces (e.g., structural descriptor space and activity space) are linked through a common 2D latent space. In the S-GTM algorithm, the manifolds are trained simultaneously in two initial spaces using the probabilities in the 2D latent space calculated as a weighted geometric mean of probability distributions in both spaces. S-GTM has the following interesting features: (1) activities are involved during the training procedure; therefore, the method is supervised, unlike conventional GTM; (2) using molecular descriptors of a given compound as input, the model predicts a whole activity profile, and (3) using an activity profile as input, areas populated by relevant chemical structures can be detected. To assess the performance of S-GTM prediction models, a descriptor space (ISIDA descriptors) of a set of 1325 GPCR ligands was related to a B-dimensional (B = 1 or 8) activity space corresponding to pKi values for eight different targets. S-GTM outperforms conventional GTM for individual activities and performs similarly to the Lasso multitask learning algorithm, although it is still slightly less accurate than the Random Forest method.
- Published
- 2015
- Full Text
- View/download PDF
33. GTM-Based QSAR Models and Their Applicability Domains.
- Author
-
Gaspar HA, Baskin II, Marcou G, Horvath D, and Varnek A
- Subjects
- Databases, Chemical, Humans, Calcium chemistry, Gadolinium chemistry, Lutetium chemistry, Machine Learning, Models, Chemical, Thrombin chemistry
- Abstract
In this paper we demonstrate that Generative Topographic Mapping (GTM), a machine learning method traditionally used for data visualisation, can be efficiently applied to QSAR modelling using probability distribution functions (PDF) computed in the latent 2-dimensional space. Several different scenarios of the activity assessment were considered: (i) the "activity landscape" approach based on direct use of PDF, (ii) QSAR models involving GTM-generated on descriptors derived from PDF, and, (iii) the k-Nearest Neighbours approach in 2D latent space. Benchmarking calculations were performed on five different datasets: stability constants of metal cations Ca(2+) , Gd(3+) and Lu(3+) complexes with organic ligands in water, aqueous solubility and activity of thrombin inhibitors. It has been shown that the performance of GTM-based regression models is similar to that obtained with some popular machine-learning methods (random forest, k-NN, M5P regression tree and PLS) and ISIDA fragment descriptors. By comparing GTM activity landscapes built both on predicted and experimental activities, we may visually assess the model's performance and identify the areas in the chemical space corresponding to reliable predictions. The applicability domain used in this work is based on data likelihood. Its application has significantly improved the model performances for 4 out of 5 datasets., (© 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.)
- Published
- 2015
- Full Text
- View/download PDF
34. Continuous indicator fields: a novel universal type of molecular fields.
- Author
-
Sitnikov GV, Zhokhova NI, Ustynyuk YA, Varnek A, and Baskin II
- Subjects
- Adsorption, Americium chemistry, Antithrombins chemistry, Antithrombins metabolism, Antithrombins pharmacology, Cations chemistry, Coloring Agents chemistry, Europium chemistry, Heterocyclic Compounds chemistry, Phenylalanine chemistry, Quantitative Structure-Activity Relationship, Molecular Conformation, Structure-Activity Relationship
- Abstract
A novel type of molecular fields, Continuous Indicator Fields (CIFs), is suggested to provide 3D structural description of molecules. The values of CIFs are calculated as the degree to which a point with given 3D coordinates belongs to an atom of a certain type. They can be used similarly to standard physicochemical fields for building 3D structure-activity models. One can build CIF-based 3D structure-activity models in the framework of the continuous molecular fields approach described earlier (J Comput-Aided Mol Des 27 (5):427-442, 2013) for the case of physicochemical molecular fields. CIFs are thought to complement and further extend traditional physicochemical fields. The models built with CIFs can be interpreted in terms of preferable and undesirable positions of certain types of atoms in space. This helps to understand which changes in chemical structure should be made in order to design a compound possessing desirable properties. We have demonstrated that CIFs can be considered as 3D analogues of 2D topological molecular fragments. The performance of this approach is demonstrated in structure-activity studies of thrombin inhibitors, multidentate N-heterocyclic ligands for Am(3+)/Eu(3+) separation, and coloring dyes.
- Published
- 2015
- Full Text
- View/download PDF
35. Chemical data visualization and analysis with incremental generative topographic mapping: big data challenge.
- Author
-
Gaspar HA, Baskin II, Marcou G, Horvath D, and Varnek A
- Subjects
- Entropy, Small Molecule Libraries, Solubility, User-Computer Interface, Algorithms, Databases, Chemical
- Abstract
This paper is devoted to the analysis and visualization in 2-dimensional space of large data sets of millions of compounds using the incremental version of generative topographic mapping (iGTM). The iGTM algorithm implemented in the in-house ISIDA-GTM program was applied to a database of more than 2 million compounds combining data sets of 36 chemicals suppliers and the NCI collection, encoded either by MOE descriptors or by MACCS keys. Taking advantage of the probabilistic nature of GTM, several approaches to data analysis were proposed. The chemical space coverage was evaluated using the normalized Shannon entropy. Different views of the data (property landscapes) were obtained by mapping various physical and chemical properties (molecular weight, aqueous solubility, LogP, etc.) onto the iGTM map. The superposition of these views helped to identify the regions in the chemical space populated by compounds with desirable physicochemical profiles and the suppliers providing them. The data sets similarity in the latent space was assessed by applying several metrics (Euclidean distance, Tanimoto and Bhattacharyya coefficients) to data probability distributions based on cumulated responsibility vectors. As a complementary approach, data sets were compared by considering them as individual objects on a meta-GTM map, built on cumulated responsibility vectors or property landscapes produced with iGTM. We believe that the iGTM methodology described in this article represents a fast and reliable way to analyze and visualize large chemical databases.
- Published
- 2015
- Full Text
- View/download PDF
36. QSAR modeling: where have you been? Where are you going to?
- Author
-
Cherkasov A, Muratov EN, Fourches D, Varnek A, Baskin II, Cronin M, Dearden J, Gramatica P, Martin YC, Todeschini R, Consonni V, Kuz'min VE, Cramer R, Benigni R, Yang C, Rathman J, Terfloth L, Gasteiger J, Richard A, and Tropsha A
- Subjects
- Antimicrobial Cationic Peptides chemistry, Artificial Intelligence, Complex Mixtures chemistry, Databases, Factual, History, 20th Century, History, 21st Century, Nanostructures chemistry, Pharmacokinetics, Quantum Theory, Toxicology methods, Drug Design, Models, Molecular, Quantitative Structure-Activity Relationship
- Abstract
Quantitative structure-activity relationship modeling is one of the major computational tools employed in medicinal chemistry. However, throughout its entire history it has drawn both praise and criticism concerning its reliability, limitations, successes, and failures. In this paper, we discuss (i) the development and evolution of QSAR; (ii) the current trends, unsolved problems, and pressing challenges; and (iii) several novel and emerging applications of QSAR modeling. Throughout this discussion, we provide guidelines for QSAR development, validation, and application, which are summarized in best practices for building rigorously validated and externally predictive QSAR models. We hope that this Perspective will help communications between computational and experimental chemists toward collaborative development and use of QSAR models. We also believe that the guidelines presented here will help journal editors and reviewers apply more stringent scientific standards to manuscripts reporting new QSAR studies, as well as encourage the use of high quality, validated QSARs for regulatory decision making.
- Published
- 2014
- Full Text
- View/download PDF
37. The continuous molecular fields approach to building 3D-QSAR models.
- Author
-
Baskin II and Zhokhova NI
- Subjects
- Algorithms, Artificial Intelligence, Drug Design, Humans, Hydrogen Bonding, Hydrophobic and Hydrophilic Interactions, Databases, Protein, Models, Molecular, Quantitative Structure-Activity Relationship, Structure-Activity Relationship
- Abstract
The continuous molecular fields (CMF) approach is based on the application of continuous functions for the description of molecular fields instead of finite sets of molecular descriptors (such as interaction energies computed at grid nodes) commonly used for this purpose. These functions can be encapsulated into kernels and combined with kernel-based machine learning algorithms to provide a variety of novel methods for building classification and regression structure-activity models, visualizing chemical datasets and conducting virtual screening. In this article, the CMF approach is applied to building 3D-QSAR models for 8 datasets through the use of five types of molecular fields (the electrostatic, steric, hydrophobic, hydrogen-bond acceptor and donor ones), the linear convolution molecular kernel with the contribution of each atom approximated with a single isotropic Gaussian function, and the kernel ridge regression data analysis technique. It is shown that the CMF approach even in this simplest form provides either comparable or enhanced predictive performance in comparison with state-of-the-art 3D-QSAR methods.
- Published
- 2013
- Full Text
- View/download PDF
38. Transductive Support Vector Machines: Promising Approach to Model Small and Unbalanced Datasets.
- Author
-
Kondratovich E, Baskin II, and Varnek A
- Abstract
Semi-supervised methods dealing with a combination of labeled and unlabeled data become more and more popular in machine-learning area, but not still used in chemoinformatics. Here, we demonstrate that Transductive Support Vector Machines (TSVM) - a semi-supervised large-margin classification method - can be particularly useful to build the models on small and unbalanced datasets which often represent a difficult problem in QSAR. Both TSVM and ordinary SVM have been applied to build classification models on 10 DUD datasets. The "transductive effect" (the difference in predictive performance between transductive and ordinary support vector machines) was investigated as a function of: (a) active/inactive ratio, (b) descriptor weighting, and (c) the training and test sets size and composition., (Copyright © 2013 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.)
- Published
- 2013
- Full Text
- View/download PDF
39. Combined QSAR studies of inhibitor properties of O-phosphorylated oximes toward serine esterases involved in neurotoxicity, drug metabolism and Alzheimer's disease.
- Author
-
Makhaeva GF, Radchenko EV, Baskin II, Palyulin VA, Richardson RJ, and Zefirov NS
- Subjects
- Enzyme Inhibitors metabolism, Humans, Models, Molecular, Oximes metabolism, Phosphorylation, Quantitative Structure-Activity Relationship, Alzheimer Disease enzymology, Alzheimer Disease physiopathology, Enzyme Inhibitors chemistry, Enzyme Inhibitors pharmacology, Esterases antagonists & inhibitors, Oximes chemistry, Oximes pharmacology
- Abstract
Oxime reactivation of serine esterases (EOHs) inhibited by organophosphorus (OP) compounds can produce O-phosphorylated oximes (POXs). Such oxime derivatives are of interest, because some of them can have greater anti-EOH potencies than the OP inhibitors from which they were derived. Accordingly, inhibitor properties of 58 POXs against four EOHs, along with pair-wise selectivities between them, have been analysed using different QSAR approaches. EOHs (with their abbreviations and consequences of inhibition in parentheses) comprised acetylcholinesterase (AChE: acute neurotoxicity; cognition enhancement), butyrylcholinesterase (BChE: inhibition of drug metabolism or stoichiometric scavenging of EOH inhibitors; cognition enhancement), carboxylesterase (CaE: inhibition of drug metabolism or stoichiometric scavenging of EOH inhibitors), and neuropathy target esterase (NTE: delayed neurotoxicity). QSAR techniques encompassed linear regression and backpropagation neural networks in conjunction with fragmental descriptors containing labelled atoms, Molecular Field Topology Analysis (MFTA), Comparative Molecular Similarity Index Analysis (CoMSIA), and molecular modelling. All methods provided mostly consistent and complementary information, and they revealed structural features controlling the 'esterase profiles', i.e. patterns of anti-EOH activities and selectivities of the compounds of interest. In addition, MFTA models were used to design a library of compounds having a cognition-enhancement esterase profile suitable for potential application to the treatment of Alzheimer's disease.
- Published
- 2012
- Full Text
- View/download PDF
40. Generative Topographic Mapping (GTM): Universal Tool for Data Visualization, Structure-Activity Modeling and Dataset Comparison.
- Author
-
Kireeva N, Baskin II, Gaspar HA, Horvath D, Marcou G, and Varnek A
- Abstract
Here, the utility of Generative Topographic Maps (GTM) for data visualization, structure-activity modeling and database comparison is evaluated, on hand of subsets of the Database of Useful Decoys (DUD). Unlike other popular dimensionality reduction approaches like Principal Component Analysis, Sammon Mapping or Self-Organizing Maps, the great advantage of GTMs is providing data probability distribution functions (PDF), both in the high-dimensional space defined by molecular descriptors and in 2D latent space. PDFs for the molecules of different activity classes were successfully used to build classification models in the framework of the Bayesian approach. Because PDFs are represented by a mixture of Gaussian functions, the Bhattacharyya kernel has been proposed as a measure of the overlap of datasets, which leads to an elegant method of global comparison of chemical libraries., (Copyright © 2012 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.)
- Published
- 2012
- Full Text
- View/download PDF
41. One-class classification as a novel method of ligand-based virtual screening: the case of glycogen synthase kinase 3β inhibitors.
- Author
-
Karpov PV, Osolodkin DI, Baskin II, Palyulin VA, and Zefirov NS
- Subjects
- Computer-Aided Design, Enzyme Inhibitors pharmacology, Glycogen Synthase Kinase 3 metabolism, Glycogen Synthase Kinase 3 beta, Humans, Ligands, Models, Molecular, Neural Networks, Computer, Drug Design, Enzyme Inhibitors chemistry, Glycogen Synthase Kinase 3 antagonists & inhibitors
- Abstract
A virtual screening system based on one-class classification with molecular fingerprints as descriptors is developed and tested on a series of 1226 inhibitors and 209 noninhibitors of glycogen synthase kinase 3β (GSK-3β). The suggested system outperforms the ones based on pharmacophore hypothesis and molecular docking in a retrospective study. However, in a prospective study it should not be used as a sole classifier. The system is exceptionally useful for the identification of new scaffolds among the virtual screening results obtained with other methods., (Copyright © 2011 Elsevier Ltd. All rights reserved.)
- Published
- 2011
- Full Text
- View/download PDF
42. Synthesis and SAR requirements of adamantane-colchicine conjugates with both microtubule depolymerizing and tubulin clustering activities.
- Author
-
Zefirova ON, Nurieva EV, Shishov DV, Baskin II, Fuchs F, Lemcke H, Schröder F, Weiss DG, Zefirov NS, and Kuznetsov SA
- Subjects
- Adamantane chemistry, Antineoplastic Agents chemistry, Cell Proliferation drug effects, Colchicine chemistry, Dose-Response Relationship, Drug, Drug Screening Assays, Antitumor, Humans, Microtubules metabolism, Models, Molecular, Molecular Structure, Paclitaxel chemistry, Paclitaxel pharmacology, Stereoisomerism, Structure-Activity Relationship, Tumor Cells, Cultured, Adamantane pharmacology, Antineoplastic Agents chemical synthesis, Antineoplastic Agents pharmacology, Colchicine pharmacology, Microtubules drug effects, Tubulin metabolism
- Abstract
A series of analogues of conjugate 1, combining an adamantane-based paclitaxel (taxol) mimetic with colchicine was synthesized and tested for cytotoxicity in a cell-based assay with the human lung carcinoma cell line A549. The most active compounds (10 EC(50) 2 ± 1.0 nM, 23 EC(50) 6 ± 1.4 nM, 26 EC(50) 5 ± 1.8 nM, 28 EC(50) 11 ± 1.7 nM, 30 EC(50) 4.8 ± 0.5 nM) were found to interfere with the microtubule dynamics in an interesting manner. Treatment of the cells with these compounds promoted disassembly of microtubules followed by the formation of stable tubulin clusters. Structure-activity relationships for the analogues of 23 revealed the sensitivity of both cytotoxicity and tubulin clustering ability to the linker length. The presence of adamantane (or another bulky hydrophobic and non-aromatic moiety) in 23 was found to play an important role in the formation of tubulin clusters. Structural requirements for optimal activity have been partially explained by molecular modeling., (Copyright © 2011 Elsevier Ltd. All rights reserved.)
- Published
- 2011
- Full Text
- View/download PDF
43. Chemoinformatics as a Theoretical Chemistry Discipline.
- Author
-
Varnek A and Baskin II
- Abstract
Here, chemoinformatics is considered as a theoretical chemistry discipline complementary to quantum chemistry and force-field molecular modeling. These three fields are compared with respect to molecular representation, inference mechanisms, basic concepts and application areas. A chemical space, a fundamental concept of chemoinformatics, is considered with respect to complex relations between chemical objects (graphs or descriptor vectors). Statistical Learning Theory, one of the main mathematical approaches in structure-property modeling, is briefly reviewed. Links between chemoinformatics and its "sister" fields - machine learning, chemometrics and bioinformatics are discussed., (Copyright © 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.)
- Published
- 2011
- Full Text
- View/download PDF
44. The One-Class Classification Approach to Data Description and to Models Applicability Domain.
- Author
-
Baskin II, Kireeva N, and Varnek A
- Abstract
In this paper, we associate an applicability domain (AD) of QSAR/QSPR models with the area in the input (descriptor) space in which the density of training data points exceeds a certain threshold. It could be proved that the predictive performance of the models (built on the training set) is larger for the test compounds inside the high density area, than for those outside this area. Instead of searching a decision surface separating high and low density areas in the input space, the one-class classification 1-SVM approach looks for a hyperplane in the associated feature space. Unlike other reported in the literature AD definitions, this approach: (i) is purely "data-based", i.e. it assigns the same AD to all models built on the same training set, (ii) provides results that depend only on the initial descriptors pool generated for the training set, (iii) can be used for the huge number of descriptors, as well as in the framework of structured kernel-based approaches, e.g., chemical graph kernels. The developed approach has been applied to improve the performance of QSPR models for stability constants of the complexes of organic ligands with alkaline-earth metals in water., (Copyright © 2010 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.)
- Published
- 2010
- Full Text
- View/download PDF
45. Kinetics and mechanism of inhibition of serine esterases by fluorinated aminophosphonates.
- Author
-
Makhaeva GF, Aksinenko AY, Sokolov VB, Baskin II, Palyulin VA, Zefirov NS, Hein ND, Kampf JW, Wijeyesakere SJ, and Richardson RJ
- Subjects
- Animals, Crystallography, X-Ray, Esterases chemistry, Esterases metabolism, Humans, Kinetics, Mass Spectrometry, Models, Molecular, Molecular Conformation, Peptide Mapping, Enzyme Inhibitors chemistry, Enzyme Inhibitors pharmacology, Esterases antagonists & inhibitors, Halogenation, Organophosphonates chemistry, Organophosphonates pharmacology
- Abstract
This paper reviews previously published data and presents new results to address the hypothesis that fluorinated aminophosphonates (FAPs), (RO)(2)P(O)C(CF(3))(2)NHS(O)(2)C(6)H(5), R=alkyl, inhibit serine esterases by scission of the P-C bond. Kinetics studies demonstrated that FAPs are progressive irreversible inhibitors of acetylcholinesterase (AChE, EC 3.1.1.7.), butyrylcholinesterase (BChE, EC 3.1.1.8.), carboxylesterase (CaE, EC 3.1.1.1.), and neuropathy target esterase (NTE, EC 3.1.1.5.), consistent with P-C bond breakage. Chemical reactivity experiments showed that diMe-FAP and diEt-FAP react with water to yield the corresponding dialkylphosphates and (CF(3))(2)CHNHS(O)(2)C(6)H(5), indicating lability of the P-C bond. X-ray crystallography of diEt-FAP revealed an elongated (and therefore weaker) P-C bond (1.8797 (13)A) compared to P-C bonds in dialkylphosphonates lacking alpha-CF(3) groups (1.805-1.822A). Semi-empirical and non-empirical molecular modeling of diEt-FAP and (EtO)(2)P(O)C(CH(3))(2)NHS(O)(2)C(6)H(5) (diEt-AP), which lacks CF(3) groups, indicated lengthening and destabilization of the P-C bond in diEt-FAP compared to diEt-AP. Active site peptide adducts formed by reacting diEt-FAP with BChE and diBu-FAP with NTE catalytic domain (NEST) were identified using peptide mass mapping with mass spectrometry (MS). Mass shifts (mean+/-SE, average mass) for peaks corresponding to active site peptides with diethylphosphoryl and monoethylphosphoryl adducts on BChE were 136.1+/-0.1 and 108.0+/-0.1Da, respectively. Corresponding mass shifts for dibutylphosphoryl and monobutylphosphoryl adducts on NEST were 191.8+/-0.2 and 135.5+/-0.1Da, respectively. Each of these values was statistically identical to the theoretical mass shift for each dialkylphosphoryl and monoalkylphosphoryl species. The MS results demonstrate that inhibition of BChE and NEST by FAPs yields dialkylphosphoryl and monoalkylphosphoryl adducts, consistent with phosphorylation via P-C bond cleavage and aging by net dealkylation. Taken together, predictions from enzyme kinetics, chemical reactivity, X-ray crystallography, and molecular modeling were confirmed by MS and support the hypothesis that FAPs inhibit serine esterases via scission of the P-C bond., (Copyright (c) 2009 Elsevier Ireland Ltd. All rights reserved.)
- Published
- 2010
- Full Text
- View/download PDF
46. Molecular modeling of modified peptides, potent inhibitors of the xWNT8 and hWNT8 proteins.
- Author
-
Voronkov AE, Baskin II, Palyulin VA, and Zefirov NS
- Subjects
- Algorithms, Amino Acid Sequence, Animals, Binding Sites, Computer-Aided Design, Drug Design, Frizzled Receptors chemistry, Frizzled Receptors metabolism, Humans, Ligands, Mice, Molecular Sequence Data, Peptides metabolism, Peptides pharmacology, Protein Binding, Protein Conformation, Receptors, G-Protein-Coupled chemistry, Receptors, G-Protein-Coupled metabolism, Software, Wnt Proteins antagonists & inhibitors, Wnt Proteins metabolism, Xenopus, Xenopus Proteins antagonists & inhibitors, Xenopus Proteins metabolism, Models, Molecular, Peptides chemistry, Wnt Proteins chemistry, Xenopus Proteins chemistry
- Abstract
Signaling pathways of Wnt-proteins and Fzd-receptors play important role in processes of growth and development of stem cells and in many types of cancers. The binding of the Wnt-proteins and Fzd-receptors is a complicated process, in which 19 Wnt-proteins and 10 Fzd-receptors are involved. Such a large number of combinations of Wnt-Fzd pairs leads to many different influences of Fzd-Wnt-complexes on the development and differentiation of stem cells. The molecular models of xWnt8, hWnt8, mFzd8, hFzd8-proteins and their complexes were constructed and studied in the present work. The amino acids of the binding sites of proteins which participate in these complexes formation and the protein-protein interactions were studied. The pharmacophoric model of the binding site on the xWnt8 and hWnt8-proteins was constructed. In this work we suggested the peptidomimetic ligands, which can be used for the inhibition of the xWnt8-mFzd8 and hWnt8-hFzd8 proteins formation. The de novo design method of Allegrow software was used for the predictions of most prospective functional groups of the peptidomimetic ligands. These ligands can be used as inhibitors of xWnt8-mFzd8 and hWnt8-hFzd8 complex formation and also can be used for drug design by other methods.
- Published
- 2008
- Full Text
- View/download PDF
47. Molecular model of the Wnt protein binding site on the surface of dimeric CRD domain of the hFzd8 receptor.
- Author
-
Voronkov AE, Baskin II, Palyulin VA, and Zefirov NS
- Subjects
- Binding Sites, Dimerization, Humans, Protein Structure, Tertiary, Surface Properties, Models, Molecular, Receptors, Cell Surface chemistry, Receptors, Cell Surface metabolism, Wnt Proteins chemistry, Wnt Proteins metabolism
- Published
- 2008
- Full Text
- View/download PDF
48. Neural networks in building QSAR models.
- Author
-
Baskin II, Palyulin VA, and Zefirov NS
- Subjects
- Algorithms, Artificial Intelligence, Biology methods, Chemistry, Physical methods, Cluster Analysis, Computers, Models, Statistical, Models, Theoretical, Neural Networks, Computer, Regression Analysis, Reproducibility of Results, Software, Chemistry Techniques, Analytical methods, Quantitative Structure-Activity Relationship
- Abstract
This chapter critically reviews some of the important methods being used for building quantitative structure-activity relationship (QSAR) models using the artificial neural networks (ANNs). It attends predominantly to the use of multilayer ANNs in the regression analysis of structure-activity data. The highlighted topics cover the approximating ability of ANNs, the interpretability of the resulting models, the issues of generalization and memorization, the problems of overfitting and overtraining, the learning dynamics, regularization, and the use of neural network ensembles. The next part of the chapter focuses attention on the use of descriptors. It reviews different descriptor selection and preprocessing techniques; considers the use of the substituent, substructural, and superstructural descriptors in building common QSAR models; the use of molecular field descriptors in three-dimensional QSAR studies; along with the prospects of "direct" graph-based QSAR analysis. The chapter starts with a short historical survey of the main milestones in this area.
- Published
- 2008
49. Exhaustive QSPR studies of a large diverse set of ionic liquids: how accurately can we predict melting points?
- Author
-
Varnek A, Kireeva N, Tetko IV, Baskin II, and Solov'ev VP
- Abstract
Several popular machine learning methods--Associative Neural Networks (ANN), Support Vector Machines (SVM), k Nearest Neighbors (kNN), modified version of the partial least-squares analysis (PLSM), backpropagation neural network (BPNN), and Multiple Linear Regression Analysis (MLR)--implemented in ISIDA, NASAWIN, and VCCLAB software have been used to perform QSPR modeling of melting point of structurally diverse data set of 717 bromides of nitrogen-containing organic cations (FULL) including 126 pyridinium bromides (PYR), 384 imidazolium and benzoimidazolium bromides (IMZ), and 207 quaternary ammonium bromides (QUAT). Several types of descriptors were tested: E-state indices, counts of atoms determined for E-state atom types, molecular descriptors generated by the DRAGON program, and different types of substructural molecular fragments. Predictive ability of the models was analyzed using a 5-fold external cross-validation procedure in which every compound in the parent set was included in one of five test sets. Among the 16 types of developed structure--melting point models, nonlinear SVM, ASNN, and BPNN techniques demonstrate slightly better performance over other methods. For the full set, the accuracy of predictions does not significantly change as a function of the type of descriptors. For other sets, the performance of descriptors varies as a function of method and data set used. The root-mean squared error (RMSE) of prediction calculated on independent test sets is in the range of 37.5-46.4 degrees C (FULL), 26.2-34.8 degrees C (PYR), 38.8-45.9 degrees C (IMZ), and 34.2-49.3 degrees C (QUAT). The moderate accuracy of predictions can be related to the quality of the experimental data used for obtaining the models as well as to difficulties to take into account the structural features of ionic liquids in the solid state (polymorphic effects, eutectics, glass formation).
- Published
- 2007
- Full Text
- View/download PDF
50. Molecular modeling of the complex between the xWNT8 protein and the CRD domain of the mFZD8 receptor.
- Author
-
Voronkov AE, Baskin II, Palyulin VA, and Zefirov NS
- Subjects
- Animals, Binding Sites, Mice, Protein Structure, Secondary, Protein Structure, Tertiary, Xenopus, Models, Molecular, Receptors, G-Protein-Coupled chemistry, Receptors, G-Protein-Coupled metabolism, Wnt Proteins chemistry, Wnt Proteins metabolism, Xenopus Proteins chemistry, Xenopus Proteins metabolism
- Published
- 2007
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.