63 results on '"Tetko, IV"'
Search Results
2. Calculation of molecular lipophilicity: state of the art and comparison of methods on more than 96000 compounds
- Author
-
Tetko IV, Ostermann C, Poda GI, and Mannhold M
- Subjects
Chemistry ,QD1-999 - Published
- 2009
- Full Text
- View/download PDF
3. Calculation of molecular lipophilicity: state of the art and comparison of methods on more than 96000 compounds
- Author
-
Mannhold, M, Poda, GI, Ostermann, C, and Tetko, IV
- Published
- 2009
- Full Text
- View/download PDF
4. CERAPP: Collaborative estrogen receptor activity prediction project
- Author
-
Mansouri, K, Abdelaziz, A, Rybacka, A, Roncaglioni, A, Tropsha, A, Varnek, A, Zakharov, A, Worth, A, Richard, A, Grulke, C, Trisciuzzi, D, Fourches, D, Horvath, D, Benfenati, E, Muratov, E, Wedebye, E, Grisoni, F, Mangiatordi, G, Incisivo, G, Hong, H, Ng, H, Tetko, I, Balabin, I, Kancherla, J, Shen, J, Burton, J, Nicklaus, M, Cassotti, M, Nikolov, N, Nicolotti, O, Andersson, P, Zang, Q, Politi, R, Beger, R, Todeschini, R, Huang, R, Farag, S, Rosenberg, S, Slavov, S, Hu, X, Judson, R, Richard, AM, Grulke, CM, Wedebye, EB, Mangiatordi, GF, Incisivo, GM, Ng, HW, Tetko, IV, Nikolov, NG, Andersson, PL, Beger, RD, Rosenberg, SA, Judson, RS, Mansouri, K, Abdelaziz, A, Rybacka, A, Roncaglioni, A, Tropsha, A, Varnek, A, Zakharov, A, Worth, A, Richard, A, Grulke, C, Trisciuzzi, D, Fourches, D, Horvath, D, Benfenati, E, Muratov, E, Wedebye, E, Grisoni, F, Mangiatordi, G, Incisivo, G, Hong, H, Ng, H, Tetko, I, Balabin, I, Kancherla, J, Shen, J, Burton, J, Nicklaus, M, Cassotti, M, Nikolov, N, Nicolotti, O, Andersson, P, Zang, Q, Politi, R, Beger, R, Todeschini, R, Huang, R, Farag, S, Rosenberg, S, Slavov, S, Hu, X, Judson, R, Richard, AM, Grulke, CM, Wedebye, EB, Mangiatordi, GF, Incisivo, GM, Ng, HW, Tetko, IV, Nikolov, NG, Andersson, PL, Beger, RD, Rosenberg, SA, and Judson, RS
- Abstract
Background: Humans are exposed to thousands of man-made chemicals in the environment. Some chemicals mimic natural endocrine hormones and, thus, have the potential to be endocrine disruptors. Most of these chemicals have never been tested for their ability to interact with the estrogen receptor (ER). Risk assessors need tools to prioritize chemicals for evaluation in costly in vivo tests, for instance, within the U.S. EPA Endocrine Disruptor Screening Program. oBjectives: We describe a large-scale modeling project called CERAPP (Collaborative Estrogen Receptor Activity Prediction Project) and demonstrate the efficacy of using predictive computational models trained on high-throughput screening data to evaluate thousands of chemicals for ER-related activity and prioritize them for further testing. Methods: CERAPP combined multiple models developed in collaboration with 17 groups in the United States and Europe to predict ER activity of a common set of 32,464 chemical structures. Quantitative structure-activity relationship models and docking approaches were employed, mostly using a common training set of 1,677 chemical structures provided by the U.S. EPA, to build a total of 40 categorical and 8 continuous models for binding, agonist, and antagonist ER activity. All predictions were evaluated on a set of 7,522 chemicals curated from the literature. To overcome the limitations of single models, a consensus was built by weighting models on scores based on their evaluated accuracies. results: Individual model scores ranged from 0.69 to 0.85, showing high prediction reliabilities. Out of the 32,464 chemicals, the consensus model predicted 4,001 chemicals (12.3%) as high priority actives and 6,742 potential actives (20.8%) to be considered for further testing. conclusion: This project demonstrated the possibility to screen large libraries of chemicals using a consensus of different in silico approaches. This concept will be applied in future projects related to other end
- Published
- 2016
5. Experimental and Theoretical Studies in the EU FP7 Marie Curie Initial Training Network Project, Environmental ChemOinformatics (ECO)
- Author
-
Tetko, I, Schramm, K, Knepper, T, Peijnenburg, W, Hendriks, A, Nicholls, I, Oberg, T, Todeschini, R, Schlosser, E, Barndmaier, S, Tetko, IV, Schramm, KW, Knepper,T, Peijnenburg, WJGM, Hendriks, AJ, Nicholls, IA, Oberg,T, Tetko, I, Schramm, K, Knepper, T, Peijnenburg, W, Hendriks, A, Nicholls, I, Oberg, T, Todeschini, R, Schlosser, E, Barndmaier, S, Tetko, IV, Schramm, KW, Knepper,T, Peijnenburg, WJGM, Hendriks, AJ, Nicholls, IA, and Oberg,T
- Published
- 2014
6. Applicability Domains for Classification Problems: Benchmarking of Distance to Models for Ames Mutagenicity Set
- Author
-
Sushko, I, Novotarskyi, S, Körner, R, Pandey, A, Cherkasov, A, Li, J, Gramatica, P, Hansen, K, Schroeter, T, Müller, K, Xi, L, Liu, H, Yao, X, Öberg, T, Hormozdiari, F, Dao, P, Sahinalp, C, Todeschini, R, Polishchuk, P, Artemenko, A, Kuz'Min, V, Martin, T, Young, D, Fourches, D, Tropsha, A, Baskin, I, Horbath, D, Marcou, G, Varnek, A, Prokopenko, V, Tetko, I, Pandey, AK, Müller, KR, Kuz'min, V, Martin, TM, Young, DM, Prokopenko, VV, Tetko, IV, TODESCHINI, ROBERTO, Sushko, I, Novotarskyi, S, Körner, R, Pandey, A, Cherkasov, A, Li, J, Gramatica, P, Hansen, K, Schroeter, T, Müller, K, Xi, L, Liu, H, Yao, X, Öberg, T, Hormozdiari, F, Dao, P, Sahinalp, C, Todeschini, R, Polishchuk, P, Artemenko, A, Kuz'Min, V, Martin, T, Young, D, Fourches, D, Tropsha, A, Baskin, I, Horbath, D, Marcou, G, Varnek, A, Prokopenko, V, Tetko, I, Pandey, AK, Müller, KR, Kuz'min, V, Martin, TM, Young, DM, Prokopenko, VV, Tetko, IV, and TODESCHINI, ROBERTO
- Abstract
The estimation of accuracy and applicability of QSAR and QSPR models for biological and physicochemical properties represents a critical problem. The developed parameter of “distance to model” (DM) is defined as a metric of similarity between the training and test set compounds that have been subjected to QSAR/QSPR modeling. In our previous work, we demonstrated the utility and optimal performance of DM metrics that have been based on the standard deviation within an ensemble of QSAR models. The current study applies such analysis to 30 QSAR models for the Ames mutagenicity data set that were previously reported within the 2009 QSAR challenge. We demonstrate that the DMs based on an ensemble (consensus) model provide systematically better performance than other DMs. The presented approach identifies 30-60% of compounds having an accuracy of prediction similar to the interlaboratory accuracy of the Ames test, which is estimated to be 90%. Thus, the in silico predictions can be used to halve the cost of experimental measurements by providing a similar prediction accuracy. The developed model has been made publicly available at http://ochem.eu/models/1
- Published
- 2010
7. Critical assessment of QSAR models of environmental toxicity against tetrahymena pyriformis: Focusing on applicability domain and overfitting by variable selection
- Author
-
Tetko, I, Sushko, I, Pandey, A, Zhu, H, Tropsha, A, Papa, E, Oberg, T, Todeschini, R, Fourches, D, Varnek, A, Tetko, IV, Pandey, AK, Varnek, A., TODESCHINI, ROBERTO, Tetko, I, Sushko, I, Pandey, A, Zhu, H, Tropsha, A, Papa, E, Oberg, T, Todeschini, R, Fourches, D, Varnek, A, Tetko, IV, Pandey, AK, Varnek, A., and TODESCHINI, ROBERTO
- Abstract
The estimation of the accuracy of predictions is a critical problem in QSAR modeling. The "distance to model" can be defined as a metric that defines the similarity between the training set molecules and the test set compound for the given property in the context of a specific model. It could be expressed in many different ways, e.g., using Tanimoto coefficient, leverage, correlation in space of models, etc. In this paper we have used mixtures of Gaussian distributions as well as statistical tests to evaluate six types of distances to models with respect to their ability to discriminate compounds with small and large prediction errors. The analysis was performed for twelve QSAR models of aqueous toxicity against T. pyriformis obtained with different machine-learning methods and various types of descriptors. The distances to model based on standard deviation of predicted toxicity calculated from the ensemble of models afforded the best results. This distance also successfully discriminated molecules with low and large prediction errors for a mechanism-based model developed using log P and the Maximum Acceptor Superdelocalizability descriptors. Thus, the distance to model metric could also be used to augment mechanistic QSAR models by estimating their prediction errors. Moreover, the accuracy of prediction is mainly determined by the training set data distribution in the chemistry and activity spaces but not by QSAR approaches used to develop the models. We have shown that incorrect validation of a model may result in the wrong estimation of its performance and suggested how this problem could be circumvented. The toxicity of 3182 and 48774 molecules from the EPA High Production Volume (HPV) Challenge Program and EINECS (European chemical Substances Information System), respectively, was predicted, and the accuracy of prediction was estimated. The developed models are available online at http://www.qspr.org site. © 2008 American Chemical Society.
- Published
- 2008
8. Virtual computational chemistry laboratory - design and description
- Author
-
Tetko, I, Gasteiger, J, Todeschini, R, Mauri, A, Livingstone, D, Ertl, P, Palyulin, V, Radchenko, E, Zefirov, N, Makarenko, A, Tanchuk, V, Prokopenko, V, Tetko, IV, Zefirov, NS, Makarenko, AS, Tanchuk, VY, Prokopenko, VV, TODESCHINI, ROBERTO, Tetko, I, Gasteiger, J, Todeschini, R, Mauri, A, Livingstone, D, Ertl, P, Palyulin, V, Radchenko, E, Zefirov, N, Makarenko, A, Tanchuk, V, Prokopenko, V, Tetko, IV, Zefirov, NS, Makarenko, AS, Tanchuk, VY, Prokopenko, VV, and TODESCHINI, ROBERTO
- Abstract
Internet technology offers an excellent opportunity for the development of tools by the cooperative effort of various groups and institutions. We have developed a multi-platform software system, Virtual Computational Chemistry Laboratory, http://www.vcclab.org, allowing the computational chemist to perform a comprehensive series of molecular indices/properties calculations and data analysis. The implemented software is based on a three-tier architecture that is one of the standard technologies to provide client-server services on the Internet. The developed software includes several popular programs, including the indices generation program, DRAGON, a 3D structure generator, CORINA, a program to predict lipophilicity and aqueous solubility of chemicals, ALOGPS and others. All these programs are running at the host institutes located in five countries over Europe. In this article we review the main features and statistics of the developed system that can be used as a prototype for academic and industry models.
- Published
- 2005
9. QSAR-derived affinity fingerprints (part 1): fingerprint construction and modeling performance for similarity searching, bioactivity classification and scaffold hopping
- Author
-
Ctibor Škuta, Igor V. Tetko, Andreas Bender, Pavel Kříž, Daniel Svozil, G. J. P. van Westen, Wim Dehaen, Isidro Cortes-Ciriano, Škuta, C. [0000-0001-5325-4934], Cortés-Ciriano, I. [0000-0002-2036-494X], Dehaen, W. [0000-0002-9597-0629], Kříž, P. [0000-0003-2473-1919], van Westen, G. J. P. [0000-0003-0717-1817], Tetko, I. V. [0000-0002-6855-0012], Bender, A. [0000-0002-6683-7546], Svozil, D. [0000-0003-2577-5163], Apollo - University of Cambridge Repository, Škuta, C [0000-0001-5325-4934], Cortés-Ciriano, I [0000-0002-2036-494X], Dehaen, W [0000-0002-9597-0629], Kříž, P [0000-0003-2473-1919], van Westen, GJP [0000-0003-0717-1817], Tetko, IV [0000-0002-6855-0012], Bender, A [0000-0002-6683-7546], and Svozil, D [0000-0003-2577-5163]
- Subjects
Quantitative structure–activity relationship ,Computer science ,In silico ,Bioactivity modeling ,Library and Information Sciences ,Scaffold hopping ,01 natural sciences ,Biological fingerprint ,lcsh:Chemistry ,03 medical and health sciences ,Similarity (network science) ,Similarity searching ,Research article ,Physical and Theoretical Chemistry ,030304 developmental biology ,0303 health sciences ,lcsh:T58.5-58.64 ,lcsh:Information technology ,business.industry ,QSAR ,Fingerprint (computing) ,Pattern recognition ,chEMBL ,Computer Graphics and Computer-Aided Design ,0104 chemical sciences ,Computer Science Applications ,Random forest ,010404 medicinal & biomolecular chemistry ,lcsh:QD1-999 ,Big Data in Chemistry ,Affinity fingerprint ,Artificial intelligence ,business ,Research Article - Abstract
Funder: FP7 People: Marie-Curie Actions; doi: http://dx.doi.org/10.13039/100011264; Grant(s): 238701, 238701, An affinity fingerprint is the vector consisting of compound’s affinity or potency against the reference panel of protein targets. Here, we present the QAFFP fingerprint, 440 elements long in silico QSAR-based affinity fingerprint, components of which are predicted by Random Forest regression models trained on bioactivity data from the ChEMBL database. Both real-valued (rv-QAFFP) and binary (b-QAFFP) versions of the QAFFP fingerprint were implemented and their performance in similarity searching, biological activity classification and scaffold hopping was assessed and compared to that of the 1024 bits long Morgan2 fingerprint (the RDKit implementation of the ECFP4 fingerprint). In both similarity searching and biological activity classification, the QAFFP fingerprint yields retrieval rates, measured by AUC (~ 0.65 and ~ 0.70 for similarity searching depending on data sets, and ~ 0.85 for classification) and EF5 (~ 4.67 and ~ 5.82 for similarity searching depending on data sets, and ~ 2.10 for classification), comparable to that of the Morgan2 fingerprint (similarity searching AUC of ~ 0.57 and ~ 0.66, and EF5 of ~ 4.09 and ~ 6.41, depending on data sets, classification AUC of ~ 0.87, and EF5 of ~ 2.16). However, the QAFFP fingerprint outperforms the Morgan2 fingerprint in scaffold hopping as it is able to retrieve 1146 out of existing 1749 scaffolds, while the Morgan2 fingerprint reveals only 864 scaffolds.
- Published
- 2020
10. Analysis of dogs’ sleep patterns using convolutional neural networks
- Author
-
Zamansky, A, Sinitca, AM, Kaplun, DI, Plazner, M, Schork, IG, Young, RJ, de Azevedo, CS, Tetko, IV, Kurkova, V, Karpov, P, and Theis, F
- Subjects
GeneralLiterature_MISCELLANEOUS - Abstract
Video-based analysis is one of the most important tools of animal behavior and animal welfare scientists. While automatic analysis systems exist for many species, this problem has not yet been adequately addressed for one of the most studied species in animal science—dogs. In this paper we describe a system developed for analyzing sleeping patterns of kenneled dogs, which may serve as indicator of their welfare. The system combines convolutional neural networks with classical data processing methods, and works with very low quality video from cameras installed in dogs shelters.
- Published
- 2019
11. Overview on the PHRESCO Project: PHotonic REServoir COmputing
- Author
-
Jean-Pierre Locquet, Phresco Partners, Tetko, IV, Kurkova, V, Karpov, P, and Theis, F
- Subjects
Technology ,Science & Technology ,Computer science ,Cognitive computing ,Reservoir computing ,Maturity (finance) ,Computer Science, Artificial Intelligence ,Field (computer science) ,Engineering management ,Work (electrical) ,Computer Science, Theory & Methods ,Computer Science ,Machine learning ,Key (cryptography) - Abstract
PHRESCO is an EU-H2020 funded project that was running for four years and will be ending in September 2019. PHRESCO focused on the development of efficient cognitive computing into a specific silicon-based technology by co-designing a new reservoir computing chip, including innovative electronic and photonic components that will enable major breakthrough in the field. So far, a first-generation reservoir with 18 nodes and integrated readout was designed, fabricated, characterized and a training method has been developed. Additionally, large efforts of the consortium were dedicated to the design of the second-generation chip consisting of larger networks (60 nodes), with an on-chip readout and novel training approaches. This short abstract provides key information on the status of the work achieved and discuss further the potential exploitation routes and the key barriers that still need to be removed to bring the technology to a higher maturity level. A part of the exit strategy of PHRESCO is to identify potential future cooperation with interested stakeholders who are willing to co-develop the PHRESCO technology together with the PHRESCO partners for bringing it to an exploitable or marketable system. This abstract lays down the foundations for potential exploitation activities with interested stakeholders.
- Published
- 2019
- Full Text
- View/download PDF
12. The state-of-the-art machine learning model for plasma protein binding prediction: Computational modeling with OCHEM and experimental validation.
- Author
-
Han Z, Xia Z, Xia J, Tetko IV, and Wu S
- Subjects
- Computer Simulation, Humans, Models, Biological, Pharmaceutical Preparations metabolism, Pharmaceutical Preparations chemistry, Machine Learning, Protein Binding, Blood Proteins metabolism
- Abstract
Plasma protein binding (PPB) is closely related to pharmacokinetics, pharmacodynamics and drug toxicity. Existing models for predicting PPB often suffer from low prediction accuracy and poor interpretability, especially for high PPB compounds, and are most often not experimentally validated. Here, we carried out a strict data curation protocol, and applied consensus modeling to obtain a model with a coefficient of determination of 0.90 and 0.91 on the training set and the test set, respectively. This model (available on the OCHEM platform https://ochem.eu/article/29) was further retrospectively validated for a set of 63 poly-fluorinated molecules and prospectively validated for a set of 25 highly diverse compounds, and its performance for both these sets was superior to that of the other previously reported models. Furthermore, we identified the physicochemical and structural characteristics of high and low PPB molecules for further structural optimization. Finally, we provide practical and detailed recommendations for structural optimization to decrease PPB binding of lead compounds., (Copyright © 2024 The Author(s). Published by Elsevier B.V. All rights reserved.)
- Published
- 2025
- Full Text
- View/download PDF
13. Be aware of overfitting by hyperparameter optimization!
- Author
-
Tetko IV, van Deursen R, and Godin G
- Abstract
Hyperparameter optimization is very frequently employed in machine learning. However, an optimization of a large space of parameters could result in overfitting of models. In recent studies on solubility prediction the authors collected seven thermodynamic and kinetic solubility datasets from different data sources. They used state-of-the-art graph-based methods and compared models developed for each dataset using different data cleaning protocols and hyperparameter optimization. In our study we showed that hyperparameter optimization did not always result in better models, possibly due to overfitting when using the same statistical measures. Similar results could be calculated using pre-set hyperparameters, reducing the computational effort by around 10,000 times. We also extended the previous analysis by adding a representation learning method based on Natural Language Processing of smiles called Transformer CNN. We show that across all analyzed sets using exactly the same protocol, Transformer CNN provided better results than graph-based methods for 26 out of 28 pairwise comparisons by using only a tiny fraction of time as compared to other methods. Last but not least we stressed the importance of comparing calculation results using exactly the same statistical measures.Scientific Contribution We showed that models with pre-optimized hyperparameters can suffer from overfitting and that using pre-set hyperparameters yields similar performances but four orders faster. Transformer CNN provided significantly higher accuracy compared to other investigated methods., Competing Interests: Declarations. Competing interests: The authors declare no competing interests., (© 2024. The Author(s).)
- Published
- 2024
- Full Text
- View/download PDF
14. Using test-time augmentation to investigate explainable AI: inconsistencies between method, model and human intuition.
- Author
-
Hartog PBR, Krüger F, Genheden S, and Tetko IV
- Abstract
Stakeholders of machine learning models desire explainable artificial intelligence (XAI) to produce human-understandable and consistent interpretations. In computational toxicity, augmentation of text-based molecular representations has been used successfully for transfer learning on downstream tasks. Augmentations of molecular representations can also be used at inference to compare differences between multiple representations of the same ground-truth. In this study, we investigate the robustness of eight XAI methods using test-time augmentation for a molecular-representation model in the field of computational toxicity prediction. We report significant differences between explanations for different representations of the same ground-truth, and show that randomized models have similar variance. We hypothesize that text-based molecular representations in this and past research reflect tokenization more than learned parameters. Furthermore, we see a greater variance between in-domain predictions than out-of-domain predictions, indicating XAI measures something other than learned parameters. Finally, we investigate the relative importance given to expert-derived structural alerts and find similar importance given irregardless of applicability domain, randomization and varying training procedures. We therefore caution future research to validate their methods using a similar comparison to human intuition without further investigation. SCIENTIFIC CONTRIBUTION: In this research we critically investigate XAI through test-time augmentation, contrasting previous assumptions about using expert validation and showing inconsistencies within models for identical representations. SMILES augmentation has been used to increase model accuracy, but was here adapted from the field of image test-time augmentation to be used as an independent indication of the consistency within SMILES-based molecular representation models., (© 2024. The Author(s).)
- Published
- 2024
- Full Text
- View/download PDF
15. The openOCHEM consensus model is the best-performing open-source predictive model in the First EUOS/SLAS joint compound solubility challenge.
- Author
-
Hunklinger A, Hartog P, Šícho M, Godin G, and Tetko IV
- Subjects
- Solubility, Consensus, Databases, Chemical, Neural Networks, Computer, Algorithms
- Abstract
The EUOS/SLAS challenge aimed to facilitate the development of reliable algorithms to predict the aqueous solubility of small molecules using experimental data from 100 K compounds. In total, hundred teams took part in the challenge to predict low, medium and highly soluble compounds as measured by the nephelometry assay. This article describes the winning model, which was developed using the publicly available Online CHEmical database and Modeling environment (OCHEM) available on the website https://ochem.eu/article/27. We describe in detail the assumptions and steps used to select methods, descriptors and strategy which contributed to the winning solution. In particular we show that consensus based on 28 models calculated using descriptor-based and representation learning methods allowed us to obtain the best score, which was higher than those based on individual approaches or consensus models developed using each individual approach. A combination of diverse models allowed us to decrease both bias and variance of individual models and to calculate the highest score. The model based on Transformer CNN contributed the best individual score thus highlighting the power of Natural Language Processing (NLP) methods. The inclusion of information about aleatoric uncertainty would be important to better understand and use the challenge data by the contestants., Competing Interests: Declaration of competing interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported., (Copyright © 2024. Published by Elsevier Inc.)
- Published
- 2024
- Full Text
- View/download PDF
16. Therapeutic Potential of Targeting Prokineticin Receptors in Diseases.
- Author
-
Vincenzi M, Kremić A, Jouve A, Lattanzi R, Miele R, Benharouga M, Alfaidy N, Migrenne-Li S, Kanthasamy AG, Porcionatto M, Ferrara N, Tetko IV, Désaubry L, and Nebigil CG
- Subjects
- Humans, Receptors, G-Protein-Coupled metabolism, Peptides, Biomarkers, Neuropeptides metabolism, Neoplasms drug therapy
- Abstract
The prokineticins (PKs) were discovered approximately 20 years ago as small peptides inducing gut contractility. Today, they are established as angiogenic, anorectic, and proinflammatory cytokines, chemokines, hormones, and neuropeptides involved in variety of physiologic and pathophysiological pathways. Their altered expression or mutations implicated in several diseases make them a potential biomarker. Their G-protein coupled receptors, PKR1 and PKR2, have divergent roles that can be therapeutic target for treatment of cardiovascular, metabolic, and neural diseases as well as pain and cancer. This article reviews and summarizes our current knowledge of PK family functions from development of heart and brain to regulation of homeostasis in health and diseases. Finally, the review summarizes the established roles of the endogenous peptides, synthetic peptides and the selective ligands of PKR1 and PKR2, and nonpeptide orthostatic and allosteric modulator of the receptors in preclinical disease models. The present review emphasizes the ambiguous aspects and gaps in our knowledge of functions of PKR ligands and elucidates future perspectives for PK research. SIGNIFICANCE STATEMENT: This review provides an in-depth view of the prokineticin family and PK receptors that can be active without their endogenous ligand and exhibits "constitutive" activity in diseases. Their non- peptide ligands display promising effects in several preclinical disease models. PKs can be the diagnostic biomarker of several diseases. A thorough understanding of the role of prokineticin family and their receptor types in health and diseases is critical to develop novel therapeutic strategies with safety concerns., (Copyright © 2023 by The Author(s).)
- Published
- 2023
- Full Text
- View/download PDF
17. Theoretical and Experimental Studies of Phosphonium Ionic Liquids as Potential Antibacterials of MDR Acinetobacter baumannii .
- Author
-
Metelytsia LO, Hodyna DM, Semenyuta IV, Kovalishyn VV, Rogalsky SP, Derevianko KY, Brovarets VS, and Tetko IV
- Abstract
A previously developed model to predict antibacterial activity of ionic liquids against a resistant A. baumannii strain was used to assess activity of phosphonium ionic liquids. Their antioxidant potential was additionally evaluated with newly developed models, which were based on public data. The accuracy of the models was rigorously evaluated using cross-validation as well as test set prediction. Six alkyl triphenylphosphonium and alkyl tributylphosphonium bromides with the C
8 , C10 , and C12 alkyl chain length were synthesized and tested in vitro. Experimental studies confirmed their activity against A. baumannii as well as showed pronounced antioxidant properties. These results suggest that phosphonium ionic liquids could be promising lead structures against A. baumannii .- Published
- 2022
- Full Text
- View/download PDF
18. More Is Not Always Better: Local Models Provide Accurate Predictions of Spectral Properties of Porphyrins.
- Author
-
Rusanov AI, Dmitrieva OA, Mamardashvili NZ, and Tetko IV
- Subjects
- Models, Molecular, Molecular Structure, Computer Simulation, Porphyrins chemistry, Spectrophotometry methods
- Abstract
The development of new functional materials based on porphyrins requires fast and accurate prediction of their spectral properties. The available models in the literature for absorption wavelength and extinction coefficient of the Soret band have low accuracy for this class of compounds. We collected spectral data for porphyrins to extend the literature set and compared the performance of global and local models for their modelling using different machine learning methods. Interestingly, extension of the public database contributed models with lower accuracies compared to the models, which we built using porphyrins only. The later model calculated acceptable RMSE = 2.61 for prediction of the absorption band of 335 porphyrins synthesized in our laboratory, but had a low accuracy (RMSE = 0.52) for extinction coefficient. A development of models using only compounds from our laboratory significantly decreased errors for these compounds (RMSE = 0.5 and 0.042 for absorption band and extinction coefficient, respectively), but limited their applicability only to these homologous series. When developing models, one should clearly keep in mind their potential use and select a strategy that could contribute the most accurate predictions for the target application. The models and data are publicly available.
- Published
- 2022
- Full Text
- View/download PDF
19. Erratum: CATMoS: Collaborative Acute Toxicity Modeling Suite.
- Author
-
Mansouri K, Karmaus AL, Fitzpatrick J, Patlewicz G, Pradeep P, Alberga D, Alepee N, Allen TEH, Allen D, Alves VM, Andrade CH, Auernhammer TR, Ballabio D, Bell S, Benfenati E, Bhattacharya S, Bastos JV, Boyd S, Brown JB, Capuzzi SJ, Chushak Y, Ciallella H, Clark AM, Consonni V, Daga PR, Ekins S, Farag S, Fedorov M, Fourches D, Gadaleta D, Gao F, Gearhart JM, Goh G, Goodman JM, Grisoni F, Grulke CM, Hartung T, Hirn M, Karpov P, Korotcov A, Lavado GJ, Lawless M, Li X, Luechtefeld T, Lunghini F, Mangiatordi GF, Marcou G, Marsh D, Martin T, Mauri A, Muratov EN, Myatt GJ, Nguyen DT, Nicolotti O, Note R, Pande P, Parks AK, Peryea T, Polash AH, Rallo R, Roncaglioni A, Rowlands C, Ruiz P, Russo DP, Sayed A, Sayre R, Sheils T, Siegel C, Silva AC, Simeonov A, Sosnin S, Southall N, Strickland J, Tang Y, Teppen B, Tetko IV, Thomas D, Tkachenko V, Todeschini R, Toma C, Tripodi I, Trisciuzzi D, Tropsha A, Varnek A, Vukovic K, Wang Z, Wang L, Waters KM, Wedlake AJ, Wijeyesakere SJ, Wilson D, Xiao Z, Yang H, Zahoranszky-Kohalmi G, Zakharov AV, Zhang FF, Zhang Z, Zhao T, Zhu H, Zorn KM, Casey W, and Kleinstreuer NC
- Published
- 2021
- Full Text
- View/download PDF
20. Anti-MRSA drug discovery by ligand-based virtual screening and biological evaluation.
- Author
-
Lian X, Xia Z, Li X, Karpov P, Jin H, Tetko IV, Xia J, and Wu S
- Subjects
- Anti-Bacterial Agents chemical synthesis, Anti-Bacterial Agents metabolism, Anti-Bacterial Agents toxicity, DNA Gyrase metabolism, Drug Evaluation, Preclinical, Hep G2 Cells, Human Umbilical Vein Endothelial Cells, Humans, Ligands, Microbial Sensitivity Tests, Molecular Docking Simulation, Molecular Dynamics Simulation, Protein Binding, Quinoxalines chemical synthesis, Quinoxalines metabolism, Quinoxalines toxicity, Topoisomerase II Inhibitors chemical synthesis, Topoisomerase II Inhibitors metabolism, Topoisomerase II Inhibitors pharmacology, Topoisomerase II Inhibitors toxicity, Anti-Bacterial Agents pharmacology, Methicillin-Resistant Staphylococcus aureus drug effects, Quinoxalines pharmacology
- Abstract
S. aureus resistant to methicillin (MRSA) is one of the most-concerned multidrug resistant bacteria, due to its role in life-threatening infections. There is an urgent need to develop new antibiotics against MRSA. In this study, we firstly compiled a data set of 2,3-diaminoquinoxalines by chemical synthesis and antibacterial screening against S. aureus, and then performed cheminformatics modeling and virtual screening. The compound with the Specs ID of AG-205/33156020 was discovered as a new antibacterial agent, and was further identified as a Gyrase B (GyrB) inhibitor. In light of the common features, we hypothesized that the 6c as the representative of 2,3-diaminoquinoxalines also inhibited GyrB and eventually proved it. Via molecular docking and molecular dynamics simulations, we identified binding modes of AG-205/33156020 and 6c to the ATPase domain of GyrB. Importantly, these GyrB inhibitors inhibited the MRSA strains and showed selectivity to HepG2 and HUVEC. Taken together, this research work provides an effective ligand-based computational workflow for scaffold hopping in anti-MRSA drug discovery, and discovers two new GyrB inhibitors that are worthy of further development., (Copyright © 2021 Elsevier Inc. All rights reserved.)
- Published
- 2021
- Full Text
- View/download PDF
21. CATMoS: Collaborative Acute Toxicity Modeling Suite.
- Author
-
Mansouri K, Karmaus AL, Fitzpatrick J, Patlewicz G, Pradeep P, Alberga D, Alepee N, Allen TEH, Allen D, Alves VM, Andrade CH, Auernhammer TR, Ballabio D, Bell S, Benfenati E, Bhattacharya S, Bastos JV, Boyd S, Brown JB, Capuzzi SJ, Chushak Y, Ciallella H, Clark AM, Consonni V, Daga PR, Ekins S, Farag S, Fedorov M, Fourches D, Gadaleta D, Gao F, Gearhart JM, Goh G, Goodman JM, Grisoni F, Grulke CM, Hartung T, Hirn M, Karpov P, Korotcov A, Lavado GJ, Lawless M, Li X, Luechtefeld T, Lunghini F, Mangiatordi GF, Marcou G, Marsh D, Martin T, Mauri A, Muratov EN, Myatt GJ, Nguyen DT, Nicolotti O, Note R, Pande P, Parks AK, Peryea T, Polash AH, Rallo R, Roncaglioni A, Rowlands C, Ruiz P, Russo DP, Sayed A, Sayre R, Sheils T, Siegel C, Silva AC, Simeonov A, Sosnin S, Southall N, Strickland J, Tang Y, Teppen B, Tetko IV, Thomas D, Tkachenko V, Todeschini R, Toma C, Tripodi I, Trisciuzzi D, Tropsha A, Varnek A, Vukovic K, Wang Z, Wang L, Waters KM, Wedlake AJ, Wijeyesakere SJ, Wilson D, Xiao Z, Yang H, Zahoranszky-Kohalmi G, Zakharov AV, Zhang FF, Zhang Z, Zhao T, Zhu H, Zorn KM, Casey W, and Kleinstreuer NC
- Subjects
- Animals, Computer Simulation, Rats, Toxicity Tests, Acute, United States, United States Environmental Protection Agency, Government Agencies
- Abstract
Background: Humans are exposed to tens of thousands of chemical substances that need to be assessed for their potential toxicity. Acute systemic toxicity testing serves as the basis for regulatory hazard classification, labeling, and risk management. However, it is cost- and time-prohibitive to evaluate all new and existing chemicals using traditional rodent acute toxicity tests. In silico models built using existing data facilitate rapid acute toxicity predictions without using animals., Objectives: The U.S. Interagency Coordinating Committee on the Validation of Alternative Methods (ICCVAM) Acute Toxicity Workgroup organized an international collaboration to develop in silico models for predicting acute oral toxicity based on five different end points: Lethal Dose 50 ( LD 50 value, U.S. Environmental Protection Agency hazard (four) categories, Globally Harmonized System for Classification and Labeling hazard (five) categories, very toxic chemicals [ LD 50 ( LD 50 ≤ 50 mg / kg )], and nontoxic chemicals ( L D 50 > 2,000 mg / kg )., Methods: An acute oral toxicity data inventory for 11,992 chemicals was compiled, split into training and evaluation sets, and made available to 35 participating international research groups that submitted a total of 139 predictive models. Predictions that fell within the applicability domains of the submitted models were evaluated using external validation sets. These were then combined into consensus models to leverage strengths of individual approaches., Results: The resulting consensus predictions, which leverage the collective strengths of each individual model, form the Collaborative Acute Toxicity Modeling Suite (CATMoS). CATMoS demonstrated high performance in terms of accuracy and robustness when compared with in vivo results., Discussion: CATMoS is being evaluated by regulatory agencies for its utility and applicability as a potential replacement for in vivo rat acute oral toxicity studies. CATMoS predictions for more than 800,000 chemicals have been made available via the National Toxicology Program's Integrated Chemical Environment tools and data sets (ice.ntp.niehs.nih.gov). The models are also implemented in a free, standalone, open-source tool, OPERA, which allows predictions of new and untested chemicals to be made. https://doi.org/10.1289/EHP8495.
- Published
- 2021
- Full Text
- View/download PDF
22. Structure-Activity Relationship Modeling and Experimental Validation of the Imidazolium and Pyridinium Based Ionic Liquids as Potential Antibacterials of MDR Acinetobacter Baumannii and Staphylococcus Aureus .
- Author
-
Semenyuta IV, Trush MM, Kovalishyn VV, Rogalsky SP, Hodyna DM, Karpov P, Xia Z, Tetko IV, and Metelytsia LO
- Subjects
- Acinetobacter baumannii pathogenicity, Bacterial Infections microbiology, Drug Resistance, Multiple, Humans, Imidazoles chemical synthesis, Ionic Liquids chemical synthesis, Ionic Liquids chemistry, Pyridines chemical synthesis, Staphylococcus aureus drug effects, Staphylococcus aureus pathogenicity, Structure-Activity Relationship, Acinetobacter baumannii drug effects, Bacterial Infections drug therapy, Imidazoles chemistry, Pyridines chemistry
- Abstract
Online Chemical Modeling Environment (OCHEM) was used for QSAR analysis of a set of ionic liquids (ILs) tested against multi-drug resistant (MDR) clinical isolate Acinetobacter baumannii and Staphylococcus aureus strains. The predictive accuracy of regression models has coefficient of determination q
2 = 0.66 - 0.79 with cross-validation and independent test sets. The models were used to screen a virtual chemical library of ILs, which was designed with targeted activity against MDR Acinetobacter baumannii and Staphylococcus aureus strains. Seven most promising ILs were selected, synthesized, and tested. Three ILs showed high activity against both these MDR clinical isolates.- Published
- 2021
- Full Text
- View/download PDF
23. From Big Data to Artificial Intelligence: chemoinformatics meets new challenges.
- Author
-
Tetko IV and Engkvist O
- Abstract
The increasing volume of biomedical data in chemistry and life sciences requires development of new methods and approaches for their analysis. Artificial Intelligence and machine learning, especially neural networks, are increasingly used in the chemical industry, in particular with respect to Big Data. This editorial highlights the main results presented during the special session of the International Conference on Neural Networks organized by "Big Data in Chemistry" project and draws perspectives on the future progress of the field.
- Published
- 2020
- Full Text
- View/download PDF
24. State-of-the-art augmented NLP transformer models for direct and single-step retrosynthesis.
- Author
-
Tetko IV, Karpov P, Van Deursen R, and Godin G
- Abstract
We investigated the effect of different training scenarios on predicting the (retro)synthesis of chemical compounds using text-like representation of chemical reactions (SMILES) and Natural Language Processing (NLP) neural network Transformer architecture. We showed that data augmentation, which is a powerful method used in image processing, eliminated the effect of data memorization by neural networks and improved their performance for prediction of new sequences. This effect was observed when augmentation was used simultaneously for input and the target data simultaneously. The top-5 accuracy was 84.8% for the prediction of the largest fragment (thus identifying principal transformation for classical retro-synthesis) for the USPTO-50k test dataset, and was achieved by a combination of SMILES augmentation and a beam search algorithm. The same approach provided significantly better results for the prediction of direct reactions from the single-step USPTO-MIT test set. Our model achieved 90.6% top-1 and 96.1% top-5 accuracy for its challenging mixed set and 97% top-5 accuracy for the USPTO-MIT separated set. It also significantly improved results for USPTO-full set single-step retrosynthesis for both top-1 and top-10 accuracies. The appearance frequency of the most abundantly generated SMILES was well correlated with the prediction outcome and can be used as a measure of the quality of reaction prediction.
- Published
- 2020
- Full Text
- View/download PDF
25. Focused Library Generator: case of Mdmx inhibitors.
- Author
-
Xia Z, Karpov P, Popowicz G, and Tetko IV
- Subjects
- Antineoplastic Agents chemistry, Antineoplastic Agents pharmacology, Binding Sites, Cell Cycle Proteins chemistry, Computer-Aided Design statistics & numerical data, Databases, Chemical statistics & numerical data, Databases, Pharmaceutical, Drug Discovery methods, Drug Discovery statistics & numerical data, Humans, Ligands, Molecular Docking Simulation, Molecular Dynamics Simulation, Neural Networks, Computer, Protein Binding, Proto-Oncogene Proteins chemistry, Quantitative Structure-Activity Relationship, Cell Cycle Proteins antagonists & inhibitors, Drug Design, Proto-Oncogene Proteins antagonists & inhibitors, Small Molecule Libraries
- Abstract
We present a Focused Library Generator that is able to create from scratch new molecules with desired properties. After training the Generator on the ChEMBL database, transfer learning was used to switch the generator to producing new Mdmx inhibitors that are a promising class of anticancer drugs. Lilly medicinal chemistry filters, molecular docking, and a QSAR IC
50 model were used to refine the output of the Generator. Pharmacophore screening and molecular dynamics (MD) simulations were then used to further select putative ligands. Finally, we identified five promising hits with equivalent or even better predicted binding free energies and IC50 values than known Mdmx inhibitors. The source code of the project is available on https://github.com/bigchem/online-chem.- Published
- 2020
- Full Text
- View/download PDF
26. QSAR-derived affinity fingerprints (part 1): fingerprint construction and modeling performance for similarity searching, bioactivity classification and scaffold hopping.
- Author
-
Škuta C, Cortés-Ciriano I, Dehaen W, Kříž P, van Westen GJP, Tetko IV, Bender A, and Svozil D
- Abstract
An affinity fingerprint is the vector consisting of compound's affinity or potency against the reference panel of protein targets. Here, we present the QAFFP fingerprint, 440 elements long in silico QSAR-based affinity fingerprint, components of which are predicted by Random Forest regression models trained on bioactivity data from the ChEMBL database. Both real-valued (rv-QAFFP) and binary (b-QAFFP) versions of the QAFFP fingerprint were implemented and their performance in similarity searching, biological activity classification and scaffold hopping was assessed and compared to that of the 1024 bits long Morgan2 fingerprint (the RDKit implementation of the ECFP4 fingerprint). In both similarity searching and biological activity classification, the QAFFP fingerprint yields retrieval rates, measured by AUC (~ 0.65 and ~ 0.70 for similarity searching depending on data sets, and ~ 0.85 for classification) and EF5 (~ 4.67 and ~ 5.82 for similarity searching depending on data sets, and ~ 2.10 for classification), comparable to that of the Morgan2 fingerprint (similarity searching AUC of ~ 0.57 and ~ 0.66, and EF5 of ~ 4.09 and ~ 6.41, depending on data sets, classification AUC of ~ 0.87, and EF5 of ~ 2.16). However, the QAFFP fingerprint outperforms the Morgan2 fingerprint in scaffold hopping as it is able to retrieve 1146 out of existing 1749 scaffolds, while the Morgan2 fingerprint reveals only 864 scaffolds.
- Published
- 2020
- Full Text
- View/download PDF
27. GEN: highly efficient SMILES explorer using autodidactic generative examination networks.
- Author
-
van Deursen R, Ertl P, Tetko IV, and Godin G
- Abstract
Recurrent neural networks have been widely used to generate millions of de novo molecules in defined chemical spaces. Reported deep generative models are exclusively based on LSTM and/or GRU units and frequently trained using canonical SMILES. In this study, we introduce Generative Examination Networks (GEN) as a new approach to train deep generative networks for SMILES generation. In our GENs, we have used an architecture based on multiple concatenated bidirectional RNN units to enhance the validity of generated SMILES. GENs autonomously learn the target space in a few epochs and are stopped early using an independent online examination mechanism, measuring the quality of the generated set. Herein we have used online statistical quality control (SQC) on the percentage of valid molecular SMILES as examination measure to select the earliest available stable model weights. Very high levels of valid SMILES (95-98%) can be generated using multiple parallel encoding layers in combination with SMILES augmentation using unrestricted SMILES randomization. Our trained models combine an excellent novelty rate (85-90%) while generating SMILES with strong conservation of the property space (95-99%). In GENs, both the generative network and the examination mechanism are open to other architectures and quality criteria.
- Published
- 2020
- Full Text
- View/download PDF
28. Transformer-CNN: Swiss knife for QSAR modeling and interpretation.
- Author
-
Karpov P, Godin G, and Tetko IV
- Abstract
We present SMILES-embeddings derived from the internal encoder state of a Transformer [1] model trained to canonize SMILES as a Seq2Seq problem. Using a CharNN [2] architecture upon the embeddings results in higher quality interpretable QSAR/QSPR models on diverse benchmark datasets including regression and classification tasks. The proposed Transformer-CNN method uses SMILES augmentation for training and inference, and thus the prognosis is based on an internal consensus. That both the augmentation and transfer learning are based on embeddings allows the method to provide good results for small datasets. We discuss the reasons for such effectiveness and draft future directions for the development of the method. The source code and the embeddings needed to train a QSAR model are available on https://github.com/bigchem/transformer-cnn. The repository also has a standalone program for QSAR prognosis which calculates individual atoms contributions, thus interpreting the model's result. OCHEM [3] environment (https://ochem.eu) hosts the on-line implementation of the method proposed.
- Published
- 2020
- Full Text
- View/download PDF
29. CoMPARA: Collaborative Modeling Project for Androgen Receptor Activity.
- Author
-
Mansouri K, Kleinstreuer N, Abdelaziz AM, Alberga D, Alves VM, Andersson PL, Andrade CH, Bai F, Balabin I, Ballabio D, Benfenati E, Bhhatarai B, Boyer S, Chen J, Consonni V, Farag S, Fourches D, García-Sosa AT, Gramatica P, Grisoni F, Grulke CM, Hong H, Horvath D, Hu X, Huang R, Jeliazkova N, Li J, Li X, Liu H, Manganelli S, Mangiatordi GF, Maran U, Marcou G, Martin T, Muratov E, Nguyen DT, Nicolotti O, Nikolov NG, Norinder U, Papa E, Petitjean M, Piir G, Pogodin P, Poroikov V, Qiao X, Richard AM, Roncaglioni A, Ruiz P, Rupakheti C, Sakkiah S, Sangion A, Schramm KW, Selvaraj C, Shah I, Sild S, Sun L, Taboureau O, Tang Y, Tetko IV, Todeschini R, Tong W, Trisciuzzi D, Tropsha A, Van Den Driessche G, Varnek A, Wang Z, Wedebye EB, Williams AJ, Xie H, Zakharov AV, Zheng Z, and Judson RS
- Subjects
- Androgens, Databases, Factual, High-Throughput Screening Assays, Humans, Receptors, Androgen, United States, United States Environmental Protection Agency, Computer Simulation, Endocrine Disruptors
- Abstract
Background: Endocrine disrupting chemicals (EDCs) are xenobiotics that mimic the interaction of natural hormones and alter synthesis, transport, or metabolic pathways. The prospect of EDCs causing adverse health effects in humans and wildlife has led to the development of scientific and regulatory approaches for evaluating bioactivity. This need is being addressed using high-throughput screening (HTS) in vitro approaches and computational modeling., Objectives: In support of the Endocrine Disruptor Screening Program, the U.S. Environmental Protection Agency (EPA) led two worldwide consortiums to virtually screen chemicals for their potential estrogenic and androgenic activities. Here, we describe the Collaborative Modeling Project for Androgen Receptor Activity (CoMPARA) efforts, which follows the steps of the Collaborative Estrogen Receptor Activity Prediction Project (CERAPP)., Methods: The CoMPARA list of screened chemicals built on CERAPP's list of 32,464 chemicals to include additional chemicals of interest, as well as simulated ToxCast™ metabolites, totaling 55,450 chemical structures. Computational toxicology scientists from 25 international groups contributed 91 predictive models for binding, agonist, and antagonist activity predictions. Models were underpinned by a common training set of 1,746 chemicals compiled from a combined data set of 11 ToxCast™/Tox21 HTS in vitro assays., Results: The resulting models were evaluated using curated literature data extracted from different sources. To overcome the limitations of single-model approaches, CoMPARA predictions were combined into consensus models that provided averaged predictive accuracy of approximately 80% for the evaluation set., Discussion: The strengths and limitations of the consensus predictions were discussed with example chemicals; then, the models were implemented into the free and open-source OPERA application to enable screening of new chemicals with a defined applicability domain and accuracy assessment. This implementation was used to screen the entire EPA DSSTox database of ∼ 875,000 chemicals, and their predicted AR activities have been made available on the EPA CompTox Chemicals dashboard and National Toxicology Program's Integrated Chemical Environment. https://doi.org/10.1289/EHP5580.
- Published
- 2020
- Full Text
- View/download PDF
30. Chemical space exploration guided by deep neural networks.
- Author
-
Karlov DS, Sosnin S, Tetko IV, and Fedorov MV
- Abstract
A parametric t-SNE approach based on deep feed-forward neural networks was applied to the chemical space visualization problem. It is able to retain more information than certain dimensionality reduction techniques used for this purpose (principal component analysis (PCA), multidimensional scaling (MDS)). The applicability of this method to some chemical space navigation tasks (activity cliffs and activity landscapes identification) is discussed. We created a simple web tool to illustrate our work (http://space.syntelly.com)., Competing Interests: IVT is CEO of BIGCHEM GmbH, which licenses the OCHEM (http://ochem.eu) software. The other authors declared that they have no actual or potential conflicts of interests., (This journal is © The Royal Society of Chemistry.)
- Published
- 2019
- Full Text
- View/download PDF
31. Modelling the toxicity of a large set of metal and metal oxide nanoparticles using the OCHEM platform.
- Author
-
Kovalishyn V, Abramenko N, Kopernyk I, Charochkina L, Metelytsia L, Tetko IV, Peijnenburg W, and Kustov L
- Subjects
- Computational Biology, Machine Learning, Metal Nanoparticles chemistry, Neural Networks, Computer, Oxides chemistry, Quantitative Structure-Activity Relationship, Reproducibility of Results, Toxicity Tests, Metal Nanoparticles toxicity, Models, Chemical
- Abstract
Inorganic nanomaterials have become one of the new areas of modern knowledge and technology and have already found an increasing number of applications. However, some nanoparticles show toxicity to living organisms, and can potentially have a negative influence on environmental ecosystems. While toxicity can be determined experimentally, such studies are time consuming and costly. Computational toxicology can provide an alternative approach and there is a need to develop methods to reliably assess Quantitative Structure-Property Relationships for nanomaterials (nano-QSPRs). Importantly, development of such models requires careful collection and curation of data. This article overviews freely available nano-QSPR models, which were developed using the Online Chemical Modeling Environment (OCHEM). Multiple data on toxicity of nanoparticles to different living organisms were collected from the literature and uploaded in the OCHEM database. The main characteristics of nanoparticles such as chemical composition of nanoparticles, average particle size, shape, surface charge and information about the biological test species were used as descriptors for developing QSPR models. QSPR methodologies used Random Forests (WEKA-RF), k-Nearest Neighbors and Associative Neural Networks. The predictive ability of the models was tested through cross-validation, giving cross-validated coefficients q
2 = 0.58-0.80 for regression models and balanced accuracies of 65-88% for classification models. These results matched the predictions for the test sets used to develop the models. The proposed nano-QSPR models and uploaded data are freely available online at http://ochem.eu/article/103451 and can be used for estimation of toxicity of new and emerging nanoparticles at the early stages of nanomaterial development., (Copyright © 2017 Elsevier Ltd. All rights reserved.)- Published
- 2018
- Full Text
- View/download PDF
32. Does 'Big Data' exist in medicinal chemistry, and if so, how can it be harnessed?
- Author
-
Tetko IV, Engkvist O, and Chen H
- Published
- 2016
- Full Text
- View/download PDF
33. CERAPP: Collaborative Estrogen Receptor Activity Prediction Project.
- Author
-
Mansouri K, Abdelaziz A, Rybacka A, Roncaglioni A, Tropsha A, Varnek A, Zakharov A, Worth A, Richard AM, Grulke CM, Trisciuzzi D, Fourches D, Horvath D, Benfenati E, Muratov E, Wedebye EB, Grisoni F, Mangiatordi GF, Incisivo GM, Hong H, Ng HW, Tetko IV, Balabin I, Kancherla J, Shen J, Burton J, Nicklaus M, Cassotti M, Nikolov NG, Nicolotti O, Andersson PL, Zang Q, Politi R, Beger RD, Todeschini R, Huang R, Farag S, Rosenberg SA, Slavov S, Hu X, and Judson RS
- Subjects
- Computer Simulation, Endocrine Disruptors classification, Environmental Policy, Quantitative Structure-Activity Relationship, United States, Endocrine Disruptors toxicity, Receptors, Estrogen metabolism, Toxicity Tests
- Abstract
Background: Humans are exposed to thousands of man-made chemicals in the environment. Some chemicals mimic natural endocrine hormones and, thus, have the potential to be endocrine disruptors. Most of these chemicals have never been tested for their ability to interact with the estrogen receptor (ER). Risk assessors need tools to prioritize chemicals for evaluation in costly in vivo tests, for instance, within the U.S. EPA Endocrine Disruptor Screening Program., Objectives: We describe a large-scale modeling project called CERAPP (Collaborative Estrogen Receptor Activity Prediction Project) and demonstrate the efficacy of using predictive computational models trained on high-throughput screening data to evaluate thousands of chemicals for ER-related activity and prioritize them for further testing., Methods: CERAPP combined multiple models developed in collaboration with 17 groups in the United States and Europe to predict ER activity of a common set of 32,464 chemical structures. Quantitative structure-activity relationship models and docking approaches were employed, mostly using a common training set of 1,677 chemical structures provided by the U.S. EPA, to build a total of 40 categorical and 8 continuous models for binding, agonist, and antagonist ER activity. All predictions were evaluated on a set of 7,522 chemicals curated from the literature. To overcome the limitations of single models, a consensus was built by weighting models on scores based on their evaluated accuracies., Results: Individual model scores ranged from 0.69 to 0.85, showing high prediction reliabilities. Out of the 32,464 chemicals, the consensus model predicted 4,001 chemicals (12.3%) as high priority actives and 6,742 potential actives (20.8%) to be considered for further testing., Conclusion: This project demonstrated the possibility to screen large libraries of chemicals using a consensus of different in silico approaches. This concept will be applied in future projects related to other end points., Citation: Mansouri K, Abdelaziz A, Rybacka A, Roncaglioni A, Tropsha A, Varnek A, Zakharov A, Worth A, Richard AM, Grulke CM, Trisciuzzi D, Fourches D, Horvath D, Benfenati E, Muratov E, Wedebye EB, Grisoni F, Mangiatordi GF, Incisivo GM, Hong H, Ng HW, Tetko IV, Balabin I, Kancherla J, Shen J, Burton J, Nicklaus M, Cassotti M, Nikolov NG, Nicolotti O, Andersson PL, Zang Q, Politi R, Beger RD, Todeschini R, Huang R, Farag S, Rosenberg SA, Slavov S, Hu X, Judson RS. 2016., Cerapp: Collaborative Estrogen Receptor Activity Prediction Project. Environ Health Perspect 124:1023-1033; http://dx.doi.org/10.1289/ehp.1510267.
- Published
- 2016
- Full Text
- View/download PDF
34. Identification of Small-Molecule Frequent Hitters of Glutathione S-Transferase-Glutathione Interaction.
- Author
-
Brenke JK, Salmina ES, Ringelstetter L, Dornauer S, Kuzikov M, Rothenaigner I, Schorpp K, Giehler F, Gopalakrishnan J, Kieser A, Gul S, Tetko IV, and Hadian K
- Subjects
- Glutathione antagonists & inhibitors, Glutathione Transferase antagonists & inhibitors, Humans, Protein Interaction Maps drug effects, Small Molecule Libraries chemistry, Substrate Specificity, Glutathione chemistry, Glutathione Transferase chemistry, High-Throughput Screening Assays methods, Small Molecule Libraries pharmacology
- Abstract
In high-throughput screening (HTS) campaigns, the binding of glutathione S-transferase (GST) to glutathione (GSH) is used for detection of GST-tagged proteins in protein-protein interactions or enzyme assays. However, many false-positives, so-called frequent hitters (FH), arise that either prevent GST/GSH interaction or interfere with assay signal generation or detection. To identify GST-FH compounds, we analyzed the data of five independent AlphaScreen-based screening campaigns to classify compounds that inhibit the GST/GSH interaction. We identified 53 compounds affecting GST/GSH binding but not influencing His-tag/Ni(2+)-NTA interaction and general AlphaScreen signals. The structures of these 53 experimentally identified GST-FHs were analyzed in chemoinformatic studies to categorize substructural features that promote interference with GST/GSH binding. Here, we confirmed several existing chemoinformatic filters and more importantly extended them as well as added novel filters that specify compounds with anti-GST/GSH activity. Selected compounds were also tested using different antibody-based GST detection technologies and exhibited no interference clearly demonstrating specificity toward their GST/GSH interaction. Thus, these newly described GST-FH will further contribute to the identification of FH compounds containing promiscuous substructures. The developed filters were uploaded to the OCHEM website (http://ochem.eu) and are publicly accessible for analysis of future HTS results., (© 2016 Society for Laboratory Automation and Screening.)
- Published
- 2016
- Full Text
- View/download PDF
35. The development of models to predict melting and pyrolysis point data associated with several hundred thousand compounds mined from PATENTS.
- Author
-
Tetko IV, M Lowe D, and Williams AJ
- Abstract
Background: Melting point (MP) is an important property in regards to the solubility of chemical compounds. Its prediction from chemical structure remains a highly challenging task for quantitative structure-activity relationship studies. Success in this area of research critically depends on the availability of high quality MP data as well as accurate chemical structure representations in order to develop models. Currently, available datasets for MP predictions have been limited to around 50k molecules while lots more data are routinely generated following the synthesis of novel materials. Significant amounts of MP data are freely available within the patent literature and, if it were available in the appropriate form, could potentially be used to develop predictive models., Results: We have developed a pipeline for the automated extraction and annotation of chemical data from published PATENTS. Almost 300,000 data points have been collected and used to develop models to predict melting and pyrolysis (decomposition) points using tools available on the OCHEM modeling platform (http://ochem.eu). A number of technical challenges were simultaneously solved to develop models based on these data. These included the handing of sparse data matrices with >200,000,000,000 entries and parallel calculations using 32 × 6 cores per task using 13 descriptor sets totaling more than 700,000 descriptors. We showed that models developed using data collected from PATENTS had similar or better prediction accuracy compared to the highly curated data used in previous publications. The separation of data for chemicals that decomposed rather than melting, from compounds that did undergo a normal melting transition, was performed and models for both pyrolysis and MPs were developed. The accuracy of the consensus MP models for molecules from the drug-like region of chemical space was similar to their estimated experimental accuracy, 32 °C. Last but not least, important structural features related to the pyrolysis of chemicals were identified, and a model to predict whether a compound will decompose instead of melting was developed., Conclusions: We have shown that automated tools for the analysis of chemical information have reached a mature stage allowing for the extraction and collection of high quality data to enable the development of structure-activity relationship models. The developed models and data are publicly available at http://ochem.eu/article/99826.
- Published
- 2016
- Full Text
- View/download PDF
36. Extended Functional Groups (EFG): An Efficient Set for Chemical Characterization and Structure-Activity Relationship Studies of Chemical Compounds.
- Author
-
Salmina ES, Haider N, and Tetko IV
- Subjects
- Software, Structure-Activity Relationship, Databases, Chemical
- Abstract
The article describes a classification system termed "extended functional groups" (EFG), which are an extension of a set previously used by the CheckMol software, that covers in addition heterocyclic compound classes and periodic table groups. The functional groups are defined as SMARTS patterns and are available as part of the ToxAlerts tool (http://ochem.eu/alerts) of the On-line CHEmical database and Modeling (OCHEM) environment platform. The article describes the motivation and the main ideas behind this extension and demonstrates that EFG can be efficiently used to develop and interpret structure-activity relationship models.
- Published
- 2015
- Full Text
- View/download PDF
37. Identifying potential endocrine disruptors among industrial chemicals and their metabolites--development and evaluation of in silico tools.
- Author
-
Rybacka A, Rudén C, Tetko IV, and Andersson PL
- Subjects
- Binding, Competitive, Endocrine Disruptors metabolism, Humans, MCF-7 Cells, Models, Biological, Protein Binding, Endocrine Disruptors analysis, Environmental Monitoring methods, Industry, Prealbumin metabolism, Receptors, Androgen metabolism, Receptors, Estrogen metabolism
- Abstract
The aim of this study was to improve the identification of endocrine disrupting chemicals (EDCs) by developing and evaluating in silico tools that predict interactions at the estrogen (E) and androgen (A) receptors, and binding to transthyretin (T). In particular, the study focuses on evaluating the use of the EAT models in combination with a metabolism simulator to study the significance of bioactivation for endocrine disruption. Balanced accuracies of the EAT models ranged from 77-87%, 62-77%, and 65-89% for E-, A-, and T-binding respectively. The developed models were applied on a set of more than 6000 commonly used industrial chemicals of which 9% were predicted E- and/or A-binders and 1% were predicted T-binders. The numbers of E- and T-binders increased 2- and 3-fold, respectively, after metabolic transformation, while the number of A-binders marginally changed. In-depth validation confirmed that several of the predicted bioactivated E- or T-binders demonstrated in vivo estrogenic activity or influenced blood levels of thyroxine in vivo. The metabolite simulator was evaluated using in vivo data from the literature which showed a 50% accuracy for studied chemicals. The study stresses, in summary, the importance of including metabolic activation in prioritization activities of potentially emerging contaminants., (Copyright © 2015 Elsevier Ltd. All rights reserved.)
- Published
- 2015
- Full Text
- View/download PDF
38. Prediction-driven matched molecular pairs to interpret QSARs and aid the molecular optimization process.
- Author
-
Sushko Y, Novotarskyi S, Körner R, Vogt J, Abdelaziz A, and Tetko IV
- Abstract
Background: QSAR is an established and powerful method for cheap in silico assessment of physicochemical properties and biological activities of chemical compounds. However, QSAR models are rather complex mathematical constructs that cannot easily be interpreted. Medicinal chemists would benefit from practical guidance regarding which molecules to synthesize. Another possible approach is analysis of pairs of very similar molecules, so-called matched molecular pairs (MMPs). Such an approach allows identification of molecular transformations that affect particular activities (e.g. toxicity). In contrast to QSAR, chemical interpretation of these transformations is straightforward. Furthermore, such transformations can give medicinal chemists useful hints for the hit-to-lead optimization process., Results: The current study suggests a combination of QSAR and MMP approaches by finding MMP transformations based on QSAR predictions for large chemical datasets. The study shows that such an approach, referred to as prediction-driven MMP analysis, is a useful tool for medicinal chemists, allowing identification of large numbers of "interesting" transformations that can be used to drive the molecular optimization process. All the methodological developments have been implemented as software products available online as part of OCHEM (http://ochem.eu/)., Conclusions: The prediction-driven MMPs methodology was exemplified by two use cases: modelling of aquatic toxicity and CYP3A4 inhibition. This approach helped us to interpret QSAR models and allowed identification of a number of "significant" molecular transformations that affect the desired properties. This can facilitate drug design as a part of molecular optimization process. Graphical AbstractMolecular matched pairs and transformation graphs facilitate interpretable molecular optimisation process.
- Published
- 2014
- Full Text
- View/download PDF
39. Identification of Small-Molecule Frequent Hitters from AlphaScreen High-Throughput Screens.
- Author
-
Schorpp K, Rothenaigner I, Salmina E, Reinshagen J, Low T, Brenke JK, Gopalakrishnan J, Tetko IV, Gul S, and Hadian K
- Subjects
- Automation, Biological Assay, Escherichia coli metabolism, Fluorescence Resonance Energy Transfer, Kinetics, Nitrilotriacetic Acid analogs & derivatives, Nitrilotriacetic Acid chemistry, Organometallic Compounds chemistry, Protein Binding, Protein Interaction Mapping, Recombinant Proteins chemistry, Drug Discovery methods, High-Throughput Screening Assays methods, Small Molecule Libraries chemistry
- Abstract
Although small-molecule drug discovery efforts have focused largely on enzyme, receptor, and ion-channel targets, there has been an increase in such activities to search for protein-protein interaction (PPI) disruptors by applying high-throughout screening (HTS)-compatible protein-binding assays. However, a disadvantage of these assays is that many primary hits are frequent hitters regardless of the PPI being investigated. We have used the AlphaScreen technology to screen four different robust PPI assays each against 25,000 compounds. These activities led to the identification of 137 compounds that demonstrated repeated activity in all PPI assays. These compounds were subsequently evaluated in two AlphaScreen counter assays, leading to classification of compounds that either interfered with the AlphaScreen chemistry (60 compounds) or prevented the binding of the protein His-tag moiety to nickel chelate (Ni(2+)-NTA) beads of the AlphaScreen detection system (77 compounds). To further triage the 137 frequent hitters, we subsequently confirmed by a time-resolved fluorescence resonance energy transfer assay that most of these compounds were only frequent hitters in AlphaScreen assays. A chemoinformatics analysis of the apparent hits provided details of the compounds that can be flagged as frequent hitters of the AlphaScreen technology, and these data have broad applicability for users of these detection technologies., (© 2013 Society for Laboratory Automation and Screening.)
- Published
- 2014
- Full Text
- View/download PDF
40. Robustness in experimental design: A study on the reliability of selection approaches.
- Author
-
Brandmaier S and Tetko IV
- Abstract
The quality criteria for experimental design approaches in chemoinformatics are numerous. Not only the error performance of a model resulting from the selected compounds is of importance, but also reliability, consistency, stability and robustness against small variations in the dataset or structurally diverse compounds. We developed a new stepwise, adaptive approach, DescRep, combining an iteratively refined descriptor selection with a sampling based on the putatively most representative compounds. A comparison of the proposed strategy was based on statistical performance of models derived from such a selection to those derived by other popular and frequently used approaches, such as the Kennard-Stone algorithm or the most descriptive compound selection. We used three datasets to carry out a statistical evaluation of the performance, reliability and robustness of the resulting models. Our results indicate that stepwise and adaptive approaches have a better adaptability to changes within a dataset and that this adaptability results in a better error performance and stability of the resulting models.
- Published
- 2013
- Full Text
- View/download PDF
41. Modeling of non-additive mixture properties using the Online CHEmical database and Modeling environment (OCHEM).
- Author
-
Oprisiu I, Novotarskyi S, and Tetko IV
- Abstract
The Online Chemical Modeling Environment (OCHEM, http://ochem.eu) is a web-based platform that provides tools for automation of typical steps necessary to create a predictive QSAR/QSPR model. The platform consists of two major subsystems: a database of experimental measurements and a modeling framework. So far, OCHEM has been limited to the processing of individual compounds. In this work, we extended OCHEM with a new ability to store and model properties of binary non-additive mixtures. The developed system is publicly accessible, meaning that any user on the Web can store new data for binary mixtures and develop models to predict their non-additive properties.The database already contains almost 10,000 data points for the density, bubble point, and azeotropic behavior of binary mixtures. For these data, we developed models for both qualitative (azeotrope/zeotrope) and quantitative endpoints (density and bubble points) using different learning methods and specially developed descriptors for mixtures. The prediction performance of the models was similar to or more accurate than results reported in previous studies. Thus, we have developed and made publicly available a powerful system for modeling mixtures of chemical compounds on the Web.
- Published
- 2013
- Full Text
- View/download PDF
42. The perspectives of computational chemistry modeling.
- Author
-
Tetko IV
- Subjects
- Algorithms, Computer Simulation trends, Databases as Topic trends, Humans, Models, Molecular, Research trends, Computer-Aided Design trends, Internet trends, Quantitative Structure-Activity Relationship, Software trends
- Abstract
The on-line tools for computational chemistry modeling will be increasingly used in the future. This will bring the advantages both for the authors and the readers.
- Published
- 2012
- Full Text
- View/download PDF
43. Beyond the 'best' match: machine learning annotation of protein sequences by integration of different sources of information.
- Author
-
Tetko IV, Rodchenkov IV, Walter MC, Rattei T, and Mewes HW
- Subjects
- Algorithms, Bacterial Proteins genetics, Genome, Bacterial, Bacterial Proteins chemistry
- Abstract
Motivation: Accurate automatic assignment of protein functions remains a challenge for genome annotation. We have developed and compared the automatic annotation of four bacterial genomes employing a 5-fold cross-validation procedure and several machine learning methods., Results: The analyzed genomes were manually annotated with FunCat categories in MIPS providing a gold standard. Features describing a pair of sequences rather than each sequence alone were used. The descriptors were derived from sequence alignment scores, InterPro domains, synteny information, sequence length and calculated protein properties. Following training we scored all pairs from the validation sets, selected a pair with the highest predicted score and annotated the target protein with functional categories of the prototype protein. The data integration using machine-learning methods provided significantly higher annotation accuracy compared to the use of individual descriptors alone. The neural network approach showed the best performance. The descriptors derived from the InterPro domains and sequence similarity provided the highest contribution to the method performance. The predicted annotation scores allow differentiation of reliable versus non-reliable annotations. The developed approach was applied to annotate the protein sequences from 180 complete bacterial genomes., Availability: The FUNcat Annotation Tool (FUNAT) is available on-line as Web Services at http://mips.gsf.de/proj/funat.
- Published
- 2008
- Full Text
- View/download PDF
44. The Fusarium graminearum genome reveals a link between localized polymorphism and pathogen specialization.
- Author
-
Cuomo CA, Güldener U, Xu JR, Trail F, Turgeon BG, Di Pietro A, Walton JD, Ma LJ, Baker SE, Rep M, Adam G, Antoniw J, Baldwin T, Calvo S, Chang YL, Decaprio D, Gale LR, Gnerre S, Goswami RS, Hammond-Kosack K, Harris LJ, Hilburn K, Kennell JC, Kroken S, Magnuson JK, Mannhaupt G, Mauceli E, Mewes HW, Mitterbauer R, Muehlbauer G, Münsterkötter M, Nelson D, O'donnell K, Ouellet T, Qi W, Quesneville H, Roncero MI, Seong KY, Tetko IV, Urban M, Waalwijk C, Ward TJ, Yao J, Birren BW, and Kistler HC
- Subjects
- DNA, Fungal, Evolution, Molecular, Fusarium physiology, Hordeum microbiology, Molecular Sequence Data, Plant Diseases microbiology, Point Mutation, Polymorphism, Single Nucleotide, Sequence Analysis, DNA, Fusarium genetics, Genome, Fungal, Polymorphism, Genetic
- Abstract
We sequenced and annotated the genome of the filamentous fungus Fusarium graminearum, a major pathogen of cultivated cereals. Very few repetitive sequences were detected, and the process of repeat-induced point mutation, in which duplicated sequences are subject to extensive mutation, may partially account for the reduced repeat content and apparent low number of paralogous (ancestrally duplicated) genes. A second strain of F. graminearum contained more than 10,000 single-nucleotide polymorphisms, which were frequently located near telomeres and within other discrete chromosomal segments. Many highly polymorphic regions contained sets of genes implicated in plant-fungus interactions and were unusually divergent, with higher rates of recombination. These regions of genome innovation may result from selection due to interactions of F. graminearum with its plant hosts.
- Published
- 2007
- Full Text
- View/download PDF
45. Volume learning algorithm significantly improved PLS model for predicting the estrogenic activity of xenoestrogens.
- Author
-
Kovalishyn VV, Kholodovych V, Tetko IV, and Welsh WJ
- Subjects
- Endocrine Disruptors chemistry, Endocrine Disruptors metabolism, Environmental Pollutants chemistry, Environmental Pollutants metabolism, Estrogens metabolism, Estrogens, Non-Steroidal chemistry, Estrogens, Non-Steroidal metabolism, Models, Molecular, Molecular Structure, Neural Networks, Computer, Protein Binding, Quantitative Structure-Activity Relationship, Receptors, Estrogen chemistry, Receptors, Estrogen metabolism, Xenobiotics chemistry, Xenobiotics metabolism, Algorithms, Estrogens chemistry, Least-Squares Analysis
- Abstract
Volume learning algorithm (VLA) artificial neural network and partial least squares (PLS) methods were compared using the leave-one-out cross-validation procedure for prediction of relative potency of xenoestrogenic compounds to the estrogen receptor. Using Wilcoxon signed rank test we showed that VLA outperformed PLS by producing models with statistically superior results for a structurally diverse set of compounds comprising eight chemical families. Thus, CoMFA/VLA models are successful in prediction of the endocrine disrupting potential of environmental pollutants and can be effectively applied for testing of prospective chemicals prior their exposure to the environment.
- Published
- 2007
- Full Text
- View/download PDF
46. Spatiotemporal expression control correlates with intragenic scaffold matrix attachment regions (S/MARs) in Arabidopsis thaliana.
- Author
-
Tetko IV, Haberer G, Rudd S, Meyers B, Mewes HW, and Mayer KF
- Subjects
- Cell Nucleus genetics, Chromatin genetics, DNA, Plant genetics, Genes, Plant, Oligonucleotide Array Sequence Analysis, Transcription, Genetic, Arabidopsis genetics, Gene Expression Regulation, Plant
- Abstract
Scaffold/matrix attachment regions (S/MARs) are essential for structural organization of the chromatin within the nucleus and serve as anchors of chromatin loop domains. A significant fraction of genes in Arabidopsis thaliana contains intragenic S/MAR elements and a significant correlation of S/MAR presence and overall expression strength has been demonstrated. In this study, we undertook a genome scale analysis of expression level and spatiotemporal expression differences in correlation with the presence or absence of genic S/MAR elements. We demonstrate that genes containing intragenic S/MARs are prone to pronounced spatiotemporal expression regulation. This characteristic is found to be even more pronounced for transcription factor genes. Our observations illustrate the importance of S/MARs in transcriptional regulation and the role of chromatin structural characteristics for gene regulation. Our findings open new perspectives for the understanding of tissue- and organ-specific regulation of gene expression.
- Published
- 2006
- Full Text
- View/download PDF
47. A systematic approach to infer biological relevance and biases of gene network structures.
- Author
-
Antonov AV, Tetko IV, and Mewes HW
- Subjects
- Gene Expression Profiling, Internet, Models, Statistical, Oligonucleotide Array Sequence Analysis, Saccharomyces cerevisiae genetics, Two-Hybrid System Techniques, Computational Biology methods, Genes, Genomics methods, Models, Genetic, Software
- Abstract
The development of high-throughput technologies has generated the need for bioinformatics approaches to assess the biological relevance of gene networks. Although several tools have been proposed for analysing the enrichment of functional categories in a set of genes, none of them is suitable for evaluating the biological relevance of the gene network. We propose a procedure and develop a web-based resource (BIOREL) to estimate the functional bias (biological relevance) of any given genetic network by integrating different sources of biological information. The weights of the edges in the network may be either binary or continuous. These essential features make our web tool unique among many similar services. BIOREL provides standardized estimations of the network biases extracted from independent data. By the analyses of real data we demonstrate that the potential application of BIOREL ranges from various benchmarking purposes to systematic analysis of the network biology.
- Published
- 2006
- Full Text
- View/download PDF
48. The Mouse Functional Genome Database (MfunGD): functional annotation of proteins in the light of their cellular context.
- Author
-
Ruepp A, Doudieu ON, van den Oever J, Brauner B, Dunger-Kaltenbach I, Fobo G, Frishman G, Montrone C, Skornia C, Wanka S, Rattei T, Pagel P, Riley L, Frishman D, Surmeli D, Tetko IV, Oesterheld M, Stümpflen V, and Mewes HW
- Subjects
- Animals, Internet, Multiprotein Complexes chemistry, Proteomics, Software, User-Computer Interface, Databases, Genetic, Genomics, Mice genetics, Multiprotein Complexes genetics, Multiprotein Complexes physiology
- Abstract
MfunGD (http://mips.gsf.de/genre/proj/mfungd/) provides a resource for annotated mouse proteins and their occurrence in protein networks. Manual annotation concentrates on proteins which are found to interact physically with other proteins. Accordingly, manually curated information from a protein-protein interaction database (MPPI) and a database of mammalian protein complexes is interconnected with MfunGD. Protein function annotation is performed using the Functional Catalogue (FunCat) annotation scheme which is widely used for the analysis of protein networks. The dataset is also supplemented with information about the literature that was used in the annotation process as well as links to the SIMAP Fasta database, the Pedant protein analysis system and cross-references to external resources. Proteins that so far were not manually inspected are annotated automatically by a graphical probabilistic model and/or superparamagnetic clustering. The database is continuously expanding to include the rapidly growing amount of functional information about gene products from mouse. MfunGD is implemented in GenRE, a J2EE-based component-oriented multi-tier architecture following the separation of concern principle.
- Published
- 2006
- Full Text
- View/download PDF
49. Surrogate data--a secure way to share corporate data.
- Author
-
Tetko IV, Abagyan R, and Oprea TI
- Subjects
- Chemical Engineering, Drug Design, Drug Industry, Models, Chemical, Molecular Structure, Neural Networks, Computer, Computer Security, Databases, Factual
- Abstract
The privacy of chemical structure is of paramount importance for the industrial sector, in particular for the pharmaceutical industry. At the same time, companies handle large amounts of physico-chemical and biological data that could be shared in order to improve our molecular understanding of pharmacokinetic and toxicological properties, which could lead to improved predictivity and shorten the development time for drugs, in particular in the early phases of drug discovery. The current study provides some theoretical limits on the information required to produce reverse engineering of molecules from generated descriptors and demonstrates that the information content of molecules can be as low as less than one bit per atom. Thus theoretically just one descriptor can be used to completely disclose the molecular structure. Instead of sharing descriptors, we propose to share surrogate data. The sharing of surrogate data is nothing else but sharing of reliably predicted molecules. The use of surrogate data can provide the same information as the original set. We consider the practical application of this idea to predict lipophilicity of chemical compounds and we demonstrate that surrogate and real (original) data provides similar prediction ability. Thus, our proposed strategy makes it possible not only to share descriptors, but also complete collections of surrogate molecules without the danger of disclosing the underlying molecular structures.
- Published
- 2005
- Full Text
- View/download PDF
50. Eclair--a web service for unravelling species origin of sequences sampled from mixed host interfaces.
- Author
-
Rudd S and Tetko IV
- Subjects
- Artificial Intelligence, Classification, Computational Biology methods, Disease Susceptibility, Gene Library, Internet, Species Specificity, User-Computer Interface, Expressed Sequence Tags chemistry, Host-Parasite Interactions, Sequence Analysis, DNA methods, Software, Symbiosis
- Abstract
The identification of the genes that participate at the biological interface of two species remains critical to our understanding of the mechanisms of disease resistance, disease susceptibility and symbiosis. The sequencing of complementary DNA (cDNA) libraries prepared from the biological interface between two organisms provides an inexpensive way to identify the novel genes that may be expressed as a cause or consequence of compatible or incompatible interactions. Sequence classification and annotation of species origin typically use an orthology-based approach and require access to large portions of either genome, or a close relative. Novel species- or clade-specific sequences may have no counterpart within existing databases and remain ambiguous features. Here we present a web-service, Eclair, which utilizes support vector machines for the classification of the origin of expressed sequence tags stemming from mixed host cDNA libraries. In addition to providing an interface for the classification of sequences, users are presented with the opportunity to train a model to suit their preferred species pair. Eclair is freely available at http://eclair.btk.fi.
- Published
- 2005
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.