7 results on '"Tetko, IV"'
Search Results
2. Development of dimethyl sulfoxide solubility models using 163,000 molecules: using a domain applicability metric to select more reliable predictions.
- Author
-
Tetko IV, Novotarskyi S, Sushko I, Ivanov V, Petrenko AE, Dieden R, Lebon F, and Mathieu B
- Subjects
- Linear Models, Neural Networks, Computer, Reproducibility of Results, Solubility, Support Vector Machine, Artificial Intelligence, Databases, Pharmaceutical, Dimethyl Sulfoxide chemistry, Informatics methods
- Abstract
The dimethyl sulfoxide (DMSO) solubility data from Enamine and two UCB pharma compound collections were analyzed using 8 different machine learning methods and 12 descriptor sets. The analyzed data sets were highly imbalanced with 1.7-5.8% nonsoluble compounds. The libraries' enrichment by soluble molecules from the set of 10% of the most reliable predictions was used to compare prediction performances of the methods. The highest accuracies were calculated using a C4.5 decision classification tree, random forest, and associative neural networks. The performances of the methods developed were estimated on individual data sets and their combinations. The developed models provided on average a 2-fold decrease of the number of nonsoluble compounds amid all compounds predicted as soluble in DMSO. However, a 4-9-fold enrichment was observed if only 10% of the most reliable predictions were considered. The structural features influencing compounds to be soluble or nonsoluble in DMSO were also determined. The best models developed with the publicly available Enamine data set are freely available online at http://ochem.eu/article/33409 .
- Published
- 2013
- Full Text
- View/download PDF
3. A comparison of different QSAR approaches to modeling CYP450 1A2 inhibition.
- Author
-
Novotarskyi S, Sushko I, Körner R, Pandey AK, and Tetko IV
- Subjects
- Enzyme Inhibitors chemistry, Humans, Molecular Conformation, Artificial Intelligence, Cytochrome P-450 CYP1A2 Inhibitors, Enzyme Inhibitors pharmacology, Quantitative Structure-Activity Relationship
- Abstract
Prediction of CYP450 inhibition activity of small molecules poses an important task due to high risk of drug-drug interactions. CYP1A2 is an important member of CYP450 superfamily and accounts for 15% of total CYP450 presence in human liver. This article compares 80 in-silico QSAR models that were created by following the same procedure with different combinations of descriptors and machine learning methods. The training and test sets consist of 3745 and 3741 inhibitors and noninhibitors from PubChem BioAssay database. A heterogeneous external test set of 160 inhibitors was collected from literature. The studied descriptor sets involve E-state, Dragon and ISIDA SMF descriptors. Machine learning methods involve Associative Neural Networks (ASNN), K Nearest Neighbors (kNN), Random Tree (RT), C4.5 Tree (J48), and Support Vector Machines (SVM). The influence of descriptor selection on model accuracy was studied. The benefits of "bagging" modeling approach were shown. Applicability domain approach was successfully applied in this study and ways of increasing model accuracy through use of applicability domain measures were demonstrated as well as fragment-based model interpretation was performed. The most accurate models in this study achieved values of 83% and 68% correctly classified instances on the internal and external test sets, respectively. The applicability domain approach allowed increasing the prediction accuracy to 90% for 78% of the internal and 17% of the external test sets, respectively. The most accurate models are available online at http://ochem.eu/models/Q5747 .
- Published
- 2011
- Full Text
- View/download PDF
4. Inductive transfer of knowledge: application of multi-task learning and feature net approaches to model tissue-air partition coefficients.
- Author
-
Varnek A, Gaudin C, Marcou G, Baskin I, Pandey AK, and Tetko IV
- Subjects
- Air, Animals, Databases, Factual, Humans, Informatics, Least-Squares Analysis, Linear Models, Neural Networks, Computer, Organic Chemicals chemistry, Organic Chemicals pharmacokinetics, Quantitative Structure-Activity Relationship, Rats, Tissue Distribution, Artificial Intelligence, Models, Biological
- Abstract
Two inductive knowledge transfer approaches - multitask learning (MTL) and Feature Net (FN) - have been used to build predictive neural networks (ASNN) and PLS models for 11 types of tissue-air partition coefficients (TAPC). Unlike conventional single-task learning (STL) modeling focused only on a single target property without any relations to other properties, in the framework of inductive transfer approach, the individual models are viewed as nodes in the network of interrelated models built in parallel (MTL) or sequentially (FN). It has been demonstrated that MTL and FN techniques are extremely useful in structure-property modeling on small and structurally diverse data sets, when conventional STL modeling is unable to produce any predictive model. The predictive STL individual models were obtained for 4 out of 11 TAPC, whereas application of inductive knowledge transfer techniques resulted in models for 9 TAPC. Differences in prediction performances of the models as a function of the machine-learning method, and of the number of properties simultaneously involved in the learning, has been discussed.
- Published
- 2009
- Full Text
- View/download PDF
5. Support vector machines for separation of mixed plant-pathogen EST collections based on codon usage.
- Author
-
Friedel CC, Jahn KH, Sommer S, Rudd S, Mewes HW, and Tetko IV
- Subjects
- Algorithms, Ascomycota pathogenicity, DNA, Plant genetics, Pattern Recognition, Automated methods, Plants genetics, Plants parasitology, Sequence Alignment methods, Artificial Intelligence, Ascomycota genetics, Chromosome Mapping methods, Codon genetics, Hordeum genetics, Hordeum microbiology, Host-Parasite Interactions genetics, Sequence Analysis, DNA methods
- Abstract
Motivation: Discovery of host and pathogen genes expressed at the plant-pathogen interface often requires the construction of mixed libraries that contain sequences from both genomes. Sequence identification requires high-throughput and reliable classification of genome origin. When using single-pass cDNA sequences difficulties arise from the short sequence length, the lack of sufficient taxonomically relevant sequence data in public databases and ambiguous sequence homology between plant and pathogen genes., Results: A novel method is described, which is independent of the availability of homologous genes and relies on subtle differences in codon usage between plant and fungal genes. We used support vector machines (SVMs) to identify the probable origin of sequences. SVMs were compared to several other machine learning techniques and to a probabilistic algorithm (PF-IND) for expressed sequence tag (EST) classification also based on codon bias differences. Our software (Eclat) has achieved a classification accuracy of 93.1% on a test set of 3217 EST sequences from Hordeum vulgare and Blumeria graminis, which is a significant improvement compared to PF-IND (prediction accuracy of 81.2% on the same test set). EST sequences with at least 50 nt of coding sequence can be classified using Eclat with high confidence. Eclat allows training of classifiers for any host-pathogen combination for which there are sufficient classified training sequences., Availability: Eclat is freely available on the Internet (http://mips.gsf.de/proj/est) or on request as a standalone version., Contact: friedel@informatik.uni-muenchen.de.
- Published
- 2005
- Full Text
- View/download PDF
6. Gene selection from microarray data for cancer classification--a machine learning approach.
- Author
-
Wang Y, Tetko IV, Hall MA, Frank E, Facius A, Mayer KF, and Mewes HW
- Subjects
- Algorithms, Cytoskeletal Proteins, Gene Expression Profiling, Glycoproteins genetics, Humans, Leukemia, Myeloid, Acute genetics, Precursor Cell Lymphoblastic Leukemia-Lymphoma genetics, Zyxin, Artificial Intelligence, Leukemia, Myeloid, Acute classification, Oligonucleotide Array Sequence Analysis methods, Precursor Cell Lymphoblastic Leukemia-Lymphoma classification
- Abstract
A DNA microarray can track the expression levels of thousands of genes simultaneously. Previous research has demonstrated that this technology can be useful in the classification of cancers. Cancer microarray data normally contains a small number of samples which have a large number of gene expression levels as features. To select relevant genes involved in different types of cancer remains a challenge. In order to extract useful gene information from cancer microarray data and reduce dimensionality, feature selection algorithms were systematically investigated in this study. Using a correlation-based feature selector combined with machine learning algorithms such as decision trees, naïve Bayes and support vector machines, we show that classification performance at least as good as published results can be obtained on acute leukemia and diffuse large B-cell lymphoma microarray data sets. We also demonstrate that a combined use of different classification and feature selection approaches makes it possible to select relevant genes with high confidence. This is also the first paper which discusses both computational and biological evidence for the involvement of zyxin in leukaemogenesis.
- Published
- 2005
- Full Text
- View/download PDF
7. QSAR-derived affinity fingerprints (part 1): fingerprint construction and modeling performance for similarity searching, bioactivity classification and scaffold hopping
- Author
-
Ctibor Škuta, Igor V. Tetko, Andreas Bender, Pavel Kříž, Daniel Svozil, G. J. P. van Westen, Wim Dehaen, Isidro Cortes-Ciriano, Škuta, C. [0000-0001-5325-4934], Cortés-Ciriano, I. [0000-0002-2036-494X], Dehaen, W. [0000-0002-9597-0629], Kříž, P. [0000-0003-2473-1919], van Westen, G. J. P. [0000-0003-0717-1817], Tetko, I. V. [0000-0002-6855-0012], Bender, A. [0000-0002-6683-7546], Svozil, D. [0000-0003-2577-5163], Apollo - University of Cambridge Repository, Škuta, C [0000-0001-5325-4934], Cortés-Ciriano, I [0000-0002-2036-494X], Dehaen, W [0000-0002-9597-0629], Kříž, P [0000-0003-2473-1919], van Westen, GJP [0000-0003-0717-1817], Tetko, IV [0000-0002-6855-0012], Bender, A [0000-0002-6683-7546], and Svozil, D [0000-0003-2577-5163]
- Subjects
Quantitative structure–activity relationship ,Computer science ,In silico ,Bioactivity modeling ,Library and Information Sciences ,Scaffold hopping ,01 natural sciences ,Biological fingerprint ,lcsh:Chemistry ,03 medical and health sciences ,Similarity (network science) ,Similarity searching ,Research article ,Physical and Theoretical Chemistry ,030304 developmental biology ,0303 health sciences ,lcsh:T58.5-58.64 ,lcsh:Information technology ,business.industry ,QSAR ,Fingerprint (computing) ,Pattern recognition ,chEMBL ,Computer Graphics and Computer-Aided Design ,0104 chemical sciences ,Computer Science Applications ,Random forest ,010404 medicinal & biomolecular chemistry ,lcsh:QD1-999 ,Big Data in Chemistry ,Affinity fingerprint ,Artificial intelligence ,business ,Research Article - Abstract
Funder: FP7 People: Marie-Curie Actions; doi: http://dx.doi.org/10.13039/100011264; Grant(s): 238701, 238701, An affinity fingerprint is the vector consisting of compound’s affinity or potency against the reference panel of protein targets. Here, we present the QAFFP fingerprint, 440 elements long in silico QSAR-based affinity fingerprint, components of which are predicted by Random Forest regression models trained on bioactivity data from the ChEMBL database. Both real-valued (rv-QAFFP) and binary (b-QAFFP) versions of the QAFFP fingerprint were implemented and their performance in similarity searching, biological activity classification and scaffold hopping was assessed and compared to that of the 1024 bits long Morgan2 fingerprint (the RDKit implementation of the ECFP4 fingerprint). In both similarity searching and biological activity classification, the QAFFP fingerprint yields retrieval rates, measured by AUC (~ 0.65 and ~ 0.70 for similarity searching depending on data sets, and ~ 0.85 for classification) and EF5 (~ 4.67 and ~ 5.82 for similarity searching depending on data sets, and ~ 2.10 for classification), comparable to that of the Morgan2 fingerprint (similarity searching AUC of ~ 0.57 and ~ 0.66, and EF5 of ~ 4.09 and ~ 6.41, depending on data sets, classification AUC of ~ 0.87, and EF5 of ~ 2.16). However, the QAFFP fingerprint outperforms the Morgan2 fingerprint in scaffold hopping as it is able to retrieve 1146 out of existing 1749 scaffolds, while the Morgan2 fingerprint reveals only 864 scaffolds.
- Published
- 2020
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.