85 results on '"Aires de Sousa, J"'
Search Results
2. Comparing roadsoils pollution patterns extracted by MOLMAP and classical three-way decomposition methods
- Author
-
Gómez-Carracedo, M.P., Ballabio, D., Andrade, J.M., Aires-de-Sousa, J., and Consonni, V.
- Published
- 2010
- Full Text
- View/download PDF
3. Online chemical modeling environment (OCHEM): web platform for data storage, model development and publishing of chemical information
- Author
-
Sushko I, Pandey AK, Novotarskyi S, Körner R, Rupp M, Teetz W, Brandmaier S, Abdelaziz A, Prokopenko VV, Tanchuk VY, Todeschini R, Varnek A, Marcou G, Ertl P, Potemkin V, Grishina M, Gasteiger J, Baskin II, Palyulin VA, Radchenko EV, Welsh WJ, Kholodovych V, Chekmarev D, Cherkasov A, Aires-de-Sousa J, Zhang Q-Y, Bender A, Nigsch F, Patiny L, Williams A, Tkachenko V, and Tetko IV
- Subjects
Information technology ,T58.5-58.64 ,Chemistry ,QD1-999 - Published
- 2011
- Full Text
- View/download PDF
4. Multivariate statistical approaches for wine classification based on low molecular weight phenolic compounds
- Author
-
CABRITA, M. J., AIRES-DE-SOUSA, J., DA SILVA, GOMES M.D.R., REI, F., and FREITAS, COSTA A.M.
- Published
- 2012
- Full Text
- View/download PDF
5. Prediction of enantioselectivity using chirality codes and Classification and Regression Trees
- Author
-
Caetano, S., Aires-de-Sousa, J., Daszykowski, M., and Heyden, Y. Vander
- Published
- 2005
- Full Text
- View/download PDF
6. Theoretical DFT studies of pyridazine based push-pull π-conjugated heterocyclic systems for SHG nonlinear optics
- Author
-
Aires de Sousa, J., Raposo, M. Manuela M., and Universidade do Minho
- Subjects
Density Functional Theory (DFT) ,Nonlinear optics (NLO) ,Second Harmonic Generation (SHG) ,Diazines ,Thiophene ,Push-pull pi-conjugated systems ,First hyperpolarizability (beta) ,Furan ,Pyridazine ,DFT ,Second Harmonic Generators (SHG) ,Ciências Naturais::Ciências Químicas - Abstract
Among the numerous classes of π-conjugated organic systems, push-pull substituted heterocyclic compounds are of great interest because it has been experimentally and theoretically demonstrated that they increase the second-order molecular nonlinear optical (NLO) properties response of push-pull chromophores with respect to aryl analogues. In fact, the incorporation of heterocycles into the π-conjugated systems is a powerful approach for tuning the optoelectronic properties because the heterocycles bring higher polarizability, modulate the conjugation pathway, and behave as auxiliary electron donors/acceptors. Therefore a series of push-pull pyridazines were designed in order to understand how structural modifications influence their NLO properties. The electron-deficient pyridazines 1a-e are functionalized with a thiophene electron-rich heterocycle at position 6 and different aromatic and heteroaromatic moieties (phenyl, thienyl, furanyl) functionalized with electron acceptor groups at position 3. DFT calculations were carried out to obtain information on conformation, electronic structure, electron distribution, dipolar moment, and molecular nonlinearity response of the push-pull pyridazine derivatives. Calculations were performed at the B3LYP level with the 6-311G** basis set and a polarizable continuum model with dioxane as the solvent. Hyperpolarizability factors were estimated using an incident wavelength of 1064 nm., Fundação para a Ciência e Tecnologia (FCT) and to FEDER-COMPETE for financial support through the research center CQUM (UID/QUI/0686/2016) and Project PEst-OE/QUI/UI0612/2013. The NMR spectrometers are part of the National NMR Network (PTNMR) and are partially supported by Infrastructure Project Nº 022161 (co-financed by FEDER through COMPETE 2020, POCI and PORL and FCT through PIDDAC). This work was also supported by the Associated Laboratory for Sustainable Chemistry−Clean Processes and Technologies−LAQV which is financed by national funds from FCT/MEC (UID/QUI/50006/2013) and cofinanced by the ERDF under the PT2020 Partnership Agreement (POCI-01-0145-FEDER-007265). Infrastructure Project Nº 022161, info:eu-repo/semantics/publishedVersion
- Published
- 2018
7. Fragmentation pattern of amides by EI and HRESI: study of protonation sites using DFT-3LYP data
- Author
-
Fokoue, H. H., primary, Marques, J. V., additional, Correia, M. V., additional, Yamaguchi, L. F., additional, Qu, X., additional, Aires-de-Sousa, J., additional, Scotti, M. T., additional, Lopes, N. P., additional, and Kato, M. J., additional
- Published
- 2018
- Full Text
- View/download PDF
8. Expert System for Predicting Reaction Conditions: The Michael Reaction Case
- Author
-
Marcou, G., primary, Aires de Sousa, J., additional, Latino, D. A. R. S., additional, de Luca, A., additional, Horvath, D., additional, Rietsch, V., additional, and Varnek, A., additional
- Published
- 2015
- Full Text
- View/download PDF
9. Prediction of Antibacterial Activity of Diterpenes against MSRA with Machine Learning Methods
- Author
-
Latino, DARS, Rijo, P, Pinheiro, L, Simões, MF, Freitas, FFM, Aires-de-Sousa, J, Calado, ART, Fernandes, FMSS, and Repositório da Universidade de Lisboa
- Abstract
Made available in DSpace on 2015-12-30T10:18:48Z (GMT). No. of bitstreams: 0 Previous issue date: 2006
- Published
- 2006
10. POSTER: Prediction of enantioselectivity using Classification and Regression Trees
- Author
-
Caetano, S., Aires-De-Sousa, J., Yvan, Vander Heyden, and Analytical Chemistry and Pharmaceutical Technology
- Abstract
/
- Published
- 2004
11. ORAL PRES.: Prediction of enantioselectivity using CART
- Author
-
Caetano, S., Aires-De-Sousa, J., Yvan, Vander Heyden, and Analytical Chemistry and Pharmaceutical Technology
- Abstract
/
- Published
- 2004
12. Online chemical modeling environment (OCHEM): web platform for data storage, model development and publishing of chemical information
- Author
-
Sushko, I, Novotarskyi, S, Körner, R, Pandey, A, Rupp, M, Teetz, W, Brandmaier, S, Abdelaziz, A, Prokopenko, V, Tanchuk, V, Todeschini, R, Varnek, A, Marcou, G, Ertl, P, Potemkin, V, Grishina, M, Gasteiger, J, Schwab, C, Baskin, I, Palyulin, V, Radchenko, E, Welsh, W, Kholodovych, V, Chekmarev, D, Cherkasov, A, Aires de Sousa, J, Zhang, Q, Bender, A, Nigsch, F, Patiny, L, Williams, A, Pandey, AK, Prokopenko, VV, Tanchuk, VY, Palyulin, VA, Radchenko, EV, Welsh, WJ, Zhang, Q. Y, Williams, A., TODESCHINI, ROBERTO, Sushko, I, Novotarskyi, S, Körner, R, Pandey, A, Rupp, M, Teetz, W, Brandmaier, S, Abdelaziz, A, Prokopenko, V, Tanchuk, V, Todeschini, R, Varnek, A, Marcou, G, Ertl, P, Potemkin, V, Grishina, M, Gasteiger, J, Schwab, C, Baskin, I, Palyulin, V, Radchenko, E, Welsh, W, Kholodovych, V, Chekmarev, D, Cherkasov, A, Aires de Sousa, J, Zhang, Q, Bender, A, Nigsch, F, Patiny, L, Williams, A, Pandey, AK, Prokopenko, VV, Tanchuk, VY, Palyulin, VA, Radchenko, EV, Welsh, WJ, Zhang, Q. Y, Williams, A., and TODESCHINI, ROBERTO
- Abstract
The Online Chemical Modeling Environment is a web-based platform that aims to automate and simplify the typical steps required for QSAR modeling. The platform consists of two major subsystems: the database of experimental measurements and the modeling framework. A user-contributed database contains a set of tools for easy input, search and modification of thousands of records. The OCHEM database is based on the wiki principle and focuses primarily on the quality and verifiability of the data. The database is tightly integrated with the modeling framework, which supports all the steps required to create a predictive model: data search, calculation and selection of a vast variety of molecular descriptors, application of machine learning methods, validation, analysis of the model and assessment of the applicability domain. As compared to other similar systems, OCHEM is not intended to re-implement the existing tools or models but rather to invite the original authors to contribute their results, make them publicly available, share them with other users and to become members of the growing research community. Our intention is to make OCHEM a widely used platform to perform the QSPR/QSAR studies online and share it with other users on the Web. The ultimate goal of OCHEM is collecting all possible chemoinformatics tools within one simple, reliable and user-friendly resource. The OCHEM is free for web users and it is available online at http://www.ochem.eu . © 2011 The Author(s).
- Published
- 2011
13. Comparing roadsoils pollution patterns extracted by MOLMAP and classical three-way decomposition methods
- Author
-
Gomez Carracedo, M, Ballabio, D, Andrade, J, Aires de Sousa, J, Consonni, V, Gomez Carracedo, MP, BALLABIO, DAVIDE, Andrade, JM, CONSONNI, VIVIANA, Gomez Carracedo, M, Ballabio, D, Andrade, J, Aires de Sousa, J, Consonni, V, Gomez Carracedo, MP, BALLABIO, DAVIDE, Andrade, JM, and CONSONNI, VIVIANA
- Abstract
A recent approach based on self-organizing maps (SOMs) to extract patterns from three-way data, named MOLMAP, was applied in a four-seasons study on soil pollution and its results compared with three different conventional approaches: Parallel factor analysis (PARAFAC), matrix augmented principal components analysis (MA-PCA) and Procrustes rotation. Each sampling season comprised 92 roadsoil samples and 12 analytical variables (Cd, Co, Cu, Cr, Fe, Mn, Ni, Pb, Zn, loss on ignition, pH and humidity). It was found that all techniques yielded highly similar results as the samples became organized in two major groups, each with a differentiated pollution pattern. This confirmed MOLMAP as a reliable option to handle environmental three-way datasets and to extract accurate pollution patterns. (C) 2010 Elsevier B.V. All rights reserved.
- Published
- 2010
14. ChemInform Abstract: A New Enantioselective Synthesis of N-Arylaziridines by Phase-Transfer Catalysis.
- Author
-
AIRES-DE-SOUSA, J., primary, LOBO, A. M., additional, and PRABHAKAR, S., additional
- Published
- 2010
- Full Text
- View/download PDF
15. Combining Kohonen neural networks and variable selection by classification trees to cluster road soil samples
- Author
-
Gómez-Carracedo, M.P., primary, Andrade, J.M., additional, Carrera, G.V.S.M., additional, Aires-de-Sousa, J., additional, Carlosena, A., additional, and Prada, D., additional
- Published
- 2010
- Full Text
- View/download PDF
16. ExpertSystem for Predicting Reaction Conditions:The Michael Reaction Case.
- Author
-
Marcou, G., Aires de Sousa, J., Latino, D. A. R. S., de Luca, A., Horvath, D., Rietsch, V., and Varnek, A.
- Published
- 2015
- Full Text
- View/download PDF
17. Verifying Wine Origin: A Neural Network Approach
- Author
-
Aires-De-Sousa, J., primary
- Published
- 1996
- Full Text
- View/download PDF
18. Structure-Based Classification of Chemical Reactions without Assignment of Reaction Centers
- Author
-
Zhang, Q.-Y. and Aires-de-Sousa, J.
- Abstract
The automatic classification of chemical reactions is of high importance for the analysis of reaction databases, reaction retrieval, reaction prediction, or synthesis planning. In this work, the classification of photochemical reactions was investigated with no explicit assignment of the reacting centers. Classifications were explored with Random Forests or Kohonen neural networks in three different situations, using different levels of information: (a) pairs of reactants were classified according to the type of reaction they produce, (b) products were classified according to the type of reaction from which they can be synthesized, and (c) reactions were classified from the difference between the descriptors of the product and the descriptors of the reactants. In all cases molecular maps of atom-level properties (MOLMAPs) were used as descriptors. They are generated by a self-organizing map and encode physicochemical properties of the bonds available in a molecule. Correct classification could be achieved for approximately 90% of the 78 reactions in an independent test set.
- Published
- 2005
19. Prediction of Enantiomeric Excess in a Combinatorial Library of Catalytic Enantioselective Reactions
- Author
-
Aires-de-Sousa, J. and Gasteiger, J.
- Abstract
A quantitative structure−enantioselectivity relationship was established for a combinatorial library of enantioselective reactions performed by addition of diethyl zinc to benzaldehyde. Chiral catalysts and additives were encoded by their chirality codes and presented as input to neural networks. The networks were trained to predict the enantiomeric excess. With independent test sets, predictions of enantiomeric excess could be made with an average error as low as 6% ee. Multilinear regression, perceptrons, and support vector machines were also evaluated as modeling tools. The method is of interest for the computer-aided design of combinatorial libraries involving chiral compounds or enantioselective reactions. This is the first example of a quantitative structure−property relationship based on chirality codes.
- Published
- 2005
20. Prediction of enantiomeric selectivity in chromatography
- Author
-
Aires-de-Sousa, J. and Gasteiger, J.
- Published
- 2002
- Full Text
- View/download PDF
21. New Description of Molecular Chirality and Its Application to the Prediction of the Preferred Enantiomer in Stereoselective Reactions
- Author
-
Aires-de-Sousa, J. and Gasteiger, J.
- Abstract
A new representation of molecular chirality as a fixed-length code is introduced. This code describes chiral carbon atoms using atomic properties and geometrical features independent of conformation and is able to distinguish between enantiomers. It was used as input to counterpropagation (CPG) neural networks in two different applications. In the case of a catalytic enantioselective reaction the CPG network established a correlation between the chirality codes of the catalysts and the major enantiomer obtained by the reaction. In the second application&sbd;enantioselective reduction of ketones by DIP-chloride&sbd;the series of major and minor enantiomers produced from different substrates were clustered by the CPG neural network into separate regions, one characteristic of the minor products and the other characteristic of the major products.
- Published
- 2001
22. Expert system for predicting reaction conditions: The Michael reaction case
- Author
-
Marcou G., Aires De Sousa J., Latino D., De Luca A., Horvath D., Rietsch V., Varnek A., Marcou G., Aires De Sousa J., Latino D., De Luca A., Horvath D., Rietsch V., and Varnek A.
- Abstract
© 2015 American Chemical Society. A generic chemical transformation may often be achieved under various synthetic conditions. However, for any specific reagents, only one or a few among the reported synthetic protocols may be successful. For example, Michael β-addition reactions may proceed under different choices of solvent (e.g., hydrophobic, aprotic polar, protic) and catalyst (e.g., Brønsted acid, Lewis acid, Lewis base, etc.). Chemoinformatics methods could be efficiently used to establish a relationship between the reagent structures and the required reaction conditions, which would allow synthetic chemists to waste less time and resources in trying out various protocols in search for the appropriate one. In order to address this problem, a number of 2-classes classification models have been built on a set of 198 Michael reactions retrieved from literature. Trained models discriminate between processes that are compatible and respectively processes not feasible under a specific reaction condition option (feasible or not with a Lewis acid catalyst, feasible or not in hydrophobic solvent, etc.). Eight distinct models were built to decide the compatibility of a Michael addition process with each considered reaction condition option, while a ninth model was aimed to predict whether the assumed Michael addition is feasible at all. Different machine-learning methods (Support Vector Machine, Naive Bayes, and Random Forest) in combination with different types of descriptors (ISIDA fragments issued from Condensed Graphs of Reactions, MOLMAP, Electronic Effect Descriptors, and Chemistry Development Kit computed descriptors) have been used. Models have good predictive performance in 3-fold cross-validation done three times: balanced accuracy varies from 0.7 to 1. Developed models are available for the users at http://infochim.u-strasbg.fr/webserv/VSEngine.html. Eventually, these were challenged to predict feasibility conditions for ∼50 novel Michael reactions from the eN
23. Expert system for predicting reaction conditions: The Michael reaction case
- Author
-
Marcou G., Aires De Sousa J., Latino D., De Luca A., Horvath D., Rietsch V., Varnek A., Marcou G., Aires De Sousa J., Latino D., De Luca A., Horvath D., Rietsch V., and Varnek A.
- Abstract
© 2015 American Chemical Society. A generic chemical transformation may often be achieved under various synthetic conditions. However, for any specific reagents, only one or a few among the reported synthetic protocols may be successful. For example, Michael β-addition reactions may proceed under different choices of solvent (e.g., hydrophobic, aprotic polar, protic) and catalyst (e.g., Brønsted acid, Lewis acid, Lewis base, etc.). Chemoinformatics methods could be efficiently used to establish a relationship between the reagent structures and the required reaction conditions, which would allow synthetic chemists to waste less time and resources in trying out various protocols in search for the appropriate one. In order to address this problem, a number of 2-classes classification models have been built on a set of 198 Michael reactions retrieved from literature. Trained models discriminate between processes that are compatible and respectively processes not feasible under a specific reaction condition option (feasible or not with a Lewis acid catalyst, feasible or not in hydrophobic solvent, etc.). Eight distinct models were built to decide the compatibility of a Michael addition process with each considered reaction condition option, while a ninth model was aimed to predict whether the assumed Michael addition is feasible at all. Different machine-learning methods (Support Vector Machine, Naive Bayes, and Random Forest) in combination with different types of descriptors (ISIDA fragments issued from Condensed Graphs of Reactions, MOLMAP, Electronic Effect Descriptors, and Chemistry Development Kit computed descriptors) have been used. Models have good predictive performance in 3-fold cross-validation done three times: balanced accuracy varies from 0.7 to 1. Developed models are available for the users at http://infochim.u-strasbg.fr/webserv/VSEngine.html. Eventually, these were challenged to predict feasibility conditions for ∼50 novel Michael reactions from the eN
24. ChemInform Abstract: A New Enantioselective Synthesis of N-Arylaziridines by Phase-Transfer Catalysis.
- Author
-
AIRES-DE-SOUSA, J., LOBO, A. M., and PRABHAKAR, S.
- Published
- 1996
- Full Text
- View/download PDF
25. Comparing roadsoils pollution patterns extracted by MOLMAP and classical three-way decomposition methods
- Author
-
M.P. Gómez-Carracedo, J. M. Andrade, João Aires-de-Sousa, Davide Ballabio, Viviana Consonni, Gomez Carracedo, M, Ballabio, D, Andrade, J, Aires de Sousa, J, and Consonni, V
- Subjects
Pollution ,Chemistry ,media_common.quotation_subject ,Mineralogy ,Sampling (statistics) ,Soil classification ,Biochemistry ,Analytical Chemistry ,Matrix (chemical analysis) ,Chemometrics ,CHIM/01 - CHIMICA ANALITICA ,self-organizing mass, parallel factor analysis, soils, Procrustes rotation, heavy metals ,Principal component analysis ,Environmental Chemistry ,Sample preparation ,Loss on ignition ,Biological system ,Spectroscopy ,media_common - Abstract
A recent approach based on self-organizing maps (SOMs) to extract patterns from three-way data, named MOLMAP, was applied in a four-seasons study on soil pollution and its results compared with three different conventional approaches: Parallel factor analysis (PARAFAC), matrix augmented principal components analysis (MA-PCA) and Procrustes rotation. Each sampling season comprised 92 roadsoil samples and 12 analytical variables (Cd, Co, Cu, Cr, Fe, Mn, Ni, Pb, Zn, loss on ignition, pH and humidity). It was found that all techniques yielded highly similar results as the samples became organized in two major groups, each with a differentiated pollution pattern. This confirmed MOLMAP as a reliable option to handle environmental three-way datasets and to extract accurate pollution patterns. (C) 2010 Elsevier B.V. All rights reserved.
- Published
- 2010
26. Online chemical modeling environment (OCHEM): web platform for data storage, model development and publishing of chemical information
- Author
-
Gilles Marcou, Florian Nigsch, Ahmed Abdelaziz, Qingyou Zhang, Vladyslav Kholodovych, William J. Welsh, Matthias Rupp, Antony J. Williams, Vsevolod Yu. Tanchuk, Valery Tkachenko, Volodymyr V. Prokopenko, Sergii Novotarskyi, Alexandre Varnek, Igor I. Baskin, Christof H. Schwab, Peter Ertl, João Aires-de-Sousa, Eugene V. Radchenko, Johann Gasteiger, Robert Körner, Igor V. Tetko, Iurii Sushko, Andreas Bender, Maria Grishina, Vladimir A. Palyulin, Dmitriy Chekmarev, Luc Patiny, Wolfram Teetz, Artem Cherkasov, Stefan Brandmaier, Roberto Todeschini, Anil Kumar Pandey, Vladimir Potemkin, Sushko, I, Novotarskyi, S, Körner, R, Pandey, A, Rupp, M, Teetz, W, Brandmaier, S, Abdelaziz, A, Prokopenko, V, Tanchuk, V, Todeschini, R, Varnek, A, Marcou, G, Ertl, P, Potemkin, V, Grishina, M, Gasteiger, J, Schwab, C, Baskin, I, Palyulin, V, Radchenko, E, Welsh, W, Kholodovych, V, Chekmarev, D, Cherkasov, A, Aires de Sousa, J, Zhang, Q, Bender, A, Nigsch, F, Patiny, L, and Williams, A
- Subjects
Information management ,Databases, Factual ,Computer science ,Estimation of accuracy of predictions ,Information Management ,Molecular Similarity ,Modeling workflow ,Quantitative Structure-Activity Relationship ,01 natural sciences ,Partition-Coefficients ,Descriptors ,Article ,Set (abstract data type) ,World Wide Web ,03 medical and health sciences ,User-Computer Interface ,Resource (project management) ,CHIM/01 - CHIMICA ANALITICA ,Applicability domain ,Drug Discovery ,Physical and Theoretical Chemistry ,030304 developmental biology ,0303 health sciences ,Internet ,On-line web platform ,business.industry ,Information Dissemination ,Open access ,E-State Indexes ,0104 chemical sciences ,Variety (cybernetics) ,Computer Science Applications ,Data sharing ,010404 medicinal & biomolecular chemistry ,On-line web platform, Modeling workflow, Estimation of accuracy of predictions, Applicability domain, Data sharing, Open access ,In-Silico ,Models, Chemical ,Cheminformatics ,Shape Signatures ,Associative Neural Networks ,The Internet ,ddc:004 ,business ,Prediction - Abstract
The Online Chemical Modeling Environment is a web-based platform that aims to automate and simplify the typical steps required for QSAR modeling. The platform consists of two major subsystems: the database of experimental measurements and the modeling framework. A user-contributed database contains a set of tools for easy input, search and modification of thousands of records. The OCHEM database is based on the wiki principle and focuses primarily on the quality and verifiability of the data. The database is tightly integrated with the modeling framework, which supports all the steps required to create a predictive model: data search, calculation and selection of a vast variety of molecular descriptors, application of machine learning methods, validation, analysis of the model and assessment of the applicability domain. As compared to other similar systems, OCHEM is not intended to re-implement the existing tools or models but rather to invite the original authors to contribute their results, make them publicly available, share them with other users and to become members of the growing research community. Our intention is to make OCHEM a widely used platform to perform the QSPR/QSAR studies online and share it with other users on the Web. The ultimate goal of OCHEM is collecting all possible chemoinformatics tools within one simple, reliable and user-friendly resource. The OCHEM is free for web users and it is available online at http://www.ochem.eu . © 2011 The Author(s).
- Full Text
- View/download PDF
27. Exploring Molecular Heteroencoders with Latent Space Arithmetic: Atomic Descriptors and Molecular Operators.
- Author
-
Gao X, Baimacheva N, and Aires-de-Sousa J
- Abstract
A variational heteroencoder based on recurrent neural networks, trained with SMILES linear notations of molecular structures, was used to derive the following atomic descriptors: delta latent space vectors (DLSVs) obtained from the original SMILES of the whole molecule and the SMILES of the same molecule with the target atom replaced. Different replacements were explored, namely, changing the atomic element, replacement with a character of the model vocabulary not used in the training set, or the removal of the target atom from the SMILES. Unsupervised mapping of the DLSV descriptors with t-distributed stochastic neighbor embedding (t-SNE) revealed a remarkable clustering according to the atomic element, hybridization, atomic type, and aromaticity. Atomic DLSV descriptors were used to train machine learning (ML) models to predict
19 F NMR chemical shifts. An R2 of up to 0.89 and mean absolute errors of up to 5.5 ppm were obtained for an independent test set of 1046 molecules with random forests or a gradient-boosting regressor. Intermediate representations from a Transformer model yielded comparable results. Furthermore, DLSVs were applied as molecular operators in the latent space: the DLSV of a halogenation (H→F substitution) was summed to the LSVs of 4135 new molecules with no fluorine atom and decoded into SMILES, yielding 99% of valid SMILES, with 75% of the SMILES incorporating fluorine and 56% of the structures incorporating fluorine with no other structural change.- Published
- 2024
- Full Text
- View/download PDF
28. GUIDEMOL: A Python graphical user interface for molecular descriptors based on RDKit.
- Author
-
Aires-de-Sousa J
- Subjects
- Adaptor Proteins, Signal Transducing, Software, Cheminformatics
- Abstract
GUIDEMOL is a Python computer program based on the RDKit software to process molecular structures and calculate molecular descriptors with a graphical user interface using the tkinter package. It can calculate descriptors already implemented in RDKit as well as grid representations of 3D molecular structures using the electrostatic potential or voxels. The GUIDEMOL app provides easy access to RDKit tools for chemoinformatics users with no programming skills and can be adapted to calculate other descriptors or to trigger other procedures. A command line interface (CLI) is also provided for the calculation of grid representations. The source code is available at https://github.com/jairesdesousa/guidemol., (© 2023 Wiley-VCH GmbH.)
- Published
- 2024
- Full Text
- View/download PDF
29. Machine Learning to Predict Homolytic Dissociation Energies of C-H Bonds: Calibration of DFT-based Models with Experimental Data.
- Author
-
Li W, Luan Y, Zhang Q, and Aires-de-Sousa J
- Subjects
- Thermodynamics, Calibration, Machine Learning
- Abstract
Random Forest (RF) QSPR models were developed with a data set of homolytic bond dissociation energies (BDE) previously calculated by B3LYP/6-311++G(d,p)//DFTB for 2263 sp3C-H covalent bonds. The best set of attributes consisted in 114 descriptors of the carbon atom (counts of atom types in 5 spheres around the kernel atom and ring descriptors). The optimized model predicted the DFT-calculated BDE of an independent test set of 224 bonds with MAE=2.86 kcal/mol. A new data set of 409 bonds from the iBonD database (http://ibond.nankai.edu.cn) was predicted by the RF with a modest MAE (5.36 kcal/mol) but a relatively high R
2 (0.75) against experimental energies. A prediction scheme was explored that corrects the RF prediction with the average deviation observed for the k nearest neighbours (KNN) in an additional memory of experimental data. The corrected predictions achieved MAE=2.22 kcal/mol for an independent test set of 145 bonds and the corresponding experimental bond energies., (© 2022 The Authors. Molecular Informatics published by Wiley-VCH GmbH.)- Published
- 2023
- Full Text
- View/download PDF
30. Prediction of the Phase Composition Profile of Three-Compound Mixtures in Liquid-Liquid Equilibrium: A Chemoinformatics Approach.
- Author
-
Carrera GVSM, Cruz ML, Klimenko K, Esperança JMSS, and Aires-de-Sousa J
- Subjects
- Temperature, Cheminformatics, Ionic Liquids chemistry
- Abstract
Machine-learning models were developed to predict the composition profile of a three-compound mixture in liquid-liquid equilibrium (LLE), given the global composition at certain temperature and pressure. A chemoinformatics approach was explored, based on the MOLMAP technology to encode molecules and mixtures. The chemical systems involved an ionic liquid (IL) and two organic molecules. Two complementary models have been optimized for the IL-rich and IL-poor phases. The two global optimized models are highly accurate, and were validated with independent test sets, where combinations of molecule1+molecule2+IL are different from those in the training set. These results highlight the MOLMAP encoding scheme, based on atomic properties to train models that learn relationships between features of complex multi-component chemical systems and their profile of phase compositions., (© 2022 Wiley-VCH GmbH.)
- Published
- 2022
- Full Text
- View/download PDF
31. Machine learning prediction of UV-Vis spectra features of organic compounds related to photoreactive potential.
- Author
-
Mamede R, Pereira F, and Aires-de-Sousa J
- Abstract
Machine learning (ML) algorithms were explored for the classification of the UV-Vis absorption spectrum of organic molecules based on molecular descriptors and fingerprints generated from 2D chemical structures. Training and test data (~ 75 k molecules and associated UV-Vis data) were assembled from a database with lists of experimental absorption maxima. They were labeled with positive class (related to photoreactive potential) if an absorption maximum is reported in the range between 290 and 700 nm (UV/Vis) with molar extinction coefficient (MEC) above 1000 Lmol
-1 cm-1 , and as negative if no such a peak is in the list. Random forests were selected among several algorithms. The models were validated with two external test sets comprising 998 organic molecules, obtaining a global accuracy up to 0.89, sensitivity of 0.90 and specificity of 0.88. The ML output (UV-Vis spectrum class) was explored as a predictor of the 3T3 NRU phototoxicity in vitro assay for a set of 43 molecules. Comparable results were observed with the classification directly based on experimental UV-Vis data in the same format., (© 2021. The Author(s).)- Published
- 2021
- Full Text
- View/download PDF
32. Machine Learning Classification of One-Chiral-Center Organic Molecules According to Optical Rotation.
- Author
-
Mamede R, de-Almeida BS, Chen M, Zhang Q, and Aires-de-Sousa J
- Subjects
- Algorithms, Molecular Structure, Optical Rotation, Stereoisomerism, Machine Learning, Neural Networks, Computer
- Abstract
In this study, machine learning algorithms were investigated for the classification of organic molecules with one carbon chiral center according to the sign of optical rotation. Diverse heterogeneous data sets comprising up to 13,080 compounds and their corresponding optical rotation were retrieved from Reaxys and processed independently for three solvents: dichloromethane, chloroform, and methanol. The molecular structures were represented by chiral descriptors based on the physicochemical and topological properties of ligands attached to the chiral center. The sign of optical rotation was predicted by random forests (RF) and artificial neural networks for independent test sets with an accuracy of up to 75% for dichloromethane, 82% for chloroform, and 82% for methanol. RF probabilities and the availability of structures in the training set with the same spheres of atom types around the chiral center defined applicability domains in which the accuracy is higher.
- Published
- 2021
- Full Text
- View/download PDF
33. Alkylated monoterpene indole alkaloid derivatives as potent P-glycoprotein inhibitors in resistant cancer cells.
- Author
-
Cardoso DSP, Kincses A, Nové M, Spengler G, Mulhovo S, Aires-de-Sousa J, Dos Santos DJVA, and Ferreira MU
- Subjects
- Alkylation, Animals, Antineoplastic Agents chemical synthesis, Antineoplastic Agents chemistry, Cell Proliferation drug effects, Dose-Response Relationship, Drug, Drug Screening Assays, Antitumor, Indole Alkaloids chemical synthesis, Indole Alkaloids chemistry, Mice, Molecular Structure, Quantitative Structure-Activity Relationship, Tumor Cells, Cultured, ATP Binding Cassette Transporter, Subfamily B, Member 1 antagonists & inhibitors, Antineoplastic Agents pharmacology, Indole Alkaloids pharmacology
- Abstract
Aiming at generating a series of monoterpene indole alkaloids with enhanced multidrug resistance (MDR) reversing activity in cancer, two major epimeric alkaloids isolated from Tabernaemontana elegans, tabernaemontanine (1) and dregamine (2), were derivatized by alkylation of the indole nitrogen. Twenty-six new derivatives (3-28) were prepared by reaction with different aliphatic and aromatic halides, whose structures were elucidated mainly by NMR, including 2D NMR experiments. Their MDR reversal ability was evaluated through a functional assay, using as models resistant human colon adenocarcinoma and human ABCB1-gene transfected L5178Y mouse lymphoma cells, overexpressing P-glycoprotein (P-gp), by flow cytometry. A considerable increase of activity was found for most of the derivatives, being the strongest P-gp inhibitors those sharing N-phenethyl moieties, displaying outstanding inhibitory activity, associated with weak cytotoxicity. Chemosensitivity assays were also performed in a model of combination chemotherapy in the same cell lines, by studying the in vitro interactions between the compounds and the antineoplastic drug doxorubicin. Most of the compounds have shown strong synergistic interactions with doxorubicin, highlighting their potential as MDR reversers. QSAR models were also explored for insights on drug-receptor interaction, and it was found that lipophilicity and bulkiness features were associated with inhibitory activity, although linear correlations were not observed., Competing Interests: Declaration of competing interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper., (Copyright © 2020 Elsevier Masson SAS. All rights reserved.)
- Published
- 2021
- Full Text
- View/download PDF
34. QSPR Modeling of Liquid-liquid Equilibria in Two-phase Systems of Water and Ionic Liquid.
- Author
-
Klimenko KO, Inês JM, Esperança JMSS, Rebelo LPN, Aires-de-Sousa J, and Carrera GVSM
- Subjects
- Data Curation, Datasets as Topic, Osmolar Concentration, Pressure, Temperature, Ionic Liquids chemistry, Models, Chemical, Quantitative Structure-Activity Relationship, Water chemistry
- Abstract
The increasing application of new ionic liquids (IL) creates the need of liquid-liquid equilibria data for both miscible and quasi-immiscible systems. In this study, equilibrium concentrations at different temperatures for ionic liquid+water two-phase systems were modeled using a Quantitative-Structure-Property Relationship (QSPR) method. Data on equilibrium concentrations were taken from the ILThermo Ionic Liquids database, curated and used to make models that predict the weight fraction of water in ionic liquid rich phase and ionic liquid in the aqueous phase as two separate properties. The major modeling challenge stems from the fact that each single IL is characterized by several data points, since equilibrium concentrations are temperature dependent. Thus, new approaches for the detection of potential data point outliers, testing set selection, and quality prediction have been developed. Training set comprised equilibrium concentration data for 67 and 68 ILs in case of water in IL and IL in water modeling, respectively. SiRMS, MOLMAPS, Rcdk and Chemaxon descriptors were used to build Random Forest models for both properties. Models were subjected to the Y-scrambling test for robustness assessment. The best models have also been validated using an external test set that is not part of the ILThermo database. A two-phase equilibrium diagram for one of the external test set IL is presented for better visualization of the results and potential derivation of tie lines., (© 2020 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.)
- Published
- 2020
- Full Text
- View/download PDF
35. Machine learning to predict the specific optical rotations of chiral fluorinated molecules.
- Author
-
Chen M, Wu T, Xiao K, Zhao T, Zhou Y, Zhang Q, and Aires-de-Sousa J
- Abstract
A chemoinformatics method was applied to the assignment of absolute configurations and to the quantitative prediction of specific optical rotations using a data set of 88 chiral fluorinated molecules (44 pairs of enantiomers). Counterpropagation neural networks were explored for the classification of enantiomers as dextrorotatory or levorotatory. Regression models were trained using multilayer perceptrons (MLP), random forests (RF) or multilinear regressions (MLR), on the basis of physicochemical atomic stereo (PAS) descriptors. New descriptors were also derived considering the common structural features of the data set (cPAS descriptors), which enabled RF models to predict the whole data set with R = 0.964, mean absolute error (MAE) of 9.8° and root mean square error (RMSE) of 12.5° in leave-one-pair-out cross-validation experiments. The predictions for the 30 compounds measured in chloroform were obtained with R = 0.971, MAE = 9.1° and RMSE = 12.5°, which compares favorably with quantum chemistry calculations reported in the literature., (Copyright © 2019 Elsevier B.V. All rights reserved.)
- Published
- 2019
- Full Text
- View/download PDF
36. Synthesis of Pyridazine Derivatives by Suzuki-Miyaura Cross-Coupling Reaction and Evaluation of Their Optical and Electronic Properties through Experimental and Theoretical Studies.
- Author
-
Fernandes SSM, Aires-de-Sousa J, Belsley M, and Raposo MMM
- Subjects
- Pyridazines chemistry, Pyridazines pharmacology, Spectrum Analysis, Models, Theoretical, Oxidative Coupling, Pyridazines chemical synthesis
- Abstract
A series of π-conjugated molecules, based on pyridazine and thiophene heterocycles 3a ⁻ e , were synthesized using commercially, or readily available, coupling components, through a palladium catalyzed Suzuki-Miyaura cross-coupling reaction. The electron-deficient pyridazine heterocycle was functionalized by a thiophene electron-rich heterocycle at position six, and different (hetero)aromatic moieties (phenyl, thienyl, furanyl) were functionalized with electron acceptor groups at position three. Density Functional Theory (DFT) calculations were carried out to obtain information on the conformation, electronic structure, electron distribution, dipolar moment, and molecular nonlinear response of the synthesized push-pull pyridazine derivatives. Hyper-Rayleigh scattering in 1,4-dioxane solutions, using a fundamental wavelength of 1064 nm, was used to evaluate their second-order nonlinear optical properties. The thienylpyridazine functionalized with the cyano-phenyl moiety exhibited the largest first hyperpolarizability ( β = 175 × 10
-30 esu, using the T convention) indicating its potential as a second harmonic generation (SHG) chromophore.- Published
- 2018
- Full Text
- View/download PDF
37. Machine learning for the prediction of molecular dipole moments obtained by density functional theory.
- Author
-
Pereira F and Aires-de-Sousa J
- Abstract
Machine learning (ML) algorithms were explored for the fast estimation of molecular dipole moments calculated by density functional theory (DFT) by B3LYP/6-31G(d,p) on the basis of molecular descriptors generated from DFT-optimized geometries and partial atomic charges obtained by empirical or ML schemes. A database was used with 10,071 structures, new molecular descriptors were designed and the models were validated with external test sets. Several ML algorithms were screened. Random forest regression models predicted an external test set of 3368 compounds achieving mean absolute error up to 0.44 D. The results represent a significant improvement of the dipole moments calculated using empirical point charges located at the nucleus, even assuming the DFT-optimized geometry (root mean square error, RMSE, of 0.68 D vs. 1.53 D and R
2 = 0.87 vs. 0.66).- Published
- 2018
- Full Text
- View/download PDF
38. Computational Methodologies in the Exploration of Marine Natural Product Leads.
- Author
-
Pereira F and Aires-de-Sousa J
- Subjects
- Biological Products pharmacology, Chemistry, Pharmaceutical methods, Drug Design, Models, Chemical, Models, Molecular, Molecular Structure, Quantitative Structure-Activity Relationship, Aquatic Organisms, Biological Products chemistry, Computational Biology methods, Drug Discovery methods, Models, Biological
- Abstract
Computational methodologies are assisting the exploration of marine natural products (MNPs) to make the discovery of new leads more efficient, to repurpose known MNPs, to target new metabolites on the basis of genome analysis, to reveal mechanisms of action, and to optimize leads. In silico efforts in drug discovery of NPs have mainly focused on two tasks: dereplication and prediction of bioactivities. The exploration of new chemical spaces and the application of predicted spectral data must be included in new approaches to select species, extracts, and growth conditions with maximum probabilities of medicinal chemistry novelty. In this review, the most relevant current computational dereplication methodologies are highlighted. Structure-based (SB) and ligand-based (LB) chemoinformatics approaches have become essential tools for the virtual screening of NPs either in small datasets of isolated compounds or in large-scale databases. The most common LB techniques include Quantitative Structure⁻Activity Relationships (QSAR), estimation of drug likeness, prediction of adsorption, distribution, metabolism, excretion, and toxicity (ADMET) properties, similarity searching, and pharmacophore identification. Analogously, molecular dynamics, docking and binding cavity analysis have been used in SB approaches. Their significance and achievements are the main focus of this review.
- Published
- 2018
- Full Text
- View/download PDF
39. NavMol 3.0: enabling the representation of metabolic reactions by blind users.
- Author
-
Binev Y, Peixoto D, Pereira F, Rodrigues I, Cavaco S, Lobo AM, and Aires-de-Sousa J
- Subjects
- Humans, Blindness, Metabolic Networks and Pathways, Sensory Aids, Software
- Abstract
Summary: The representation of metabolic reactions strongly relies on visualization, which is a major barrier for blind users. The NavMol software renders the communication and interpretation of molecular structures and reactions accessible by integrating chemoinformatics and assistive technology. NavMol 3.0 provides a molecular editor for metabolic reactions. The user can start with templates of reactions and build from such cores. Atom-to-atom mapping enables changes in the reactants to be reflected in the products (and vice-versa) and the reaction centres to be automatically identified. Blind users can easily interact with the software using the keyboard and text-to-speech technology., Availability and Implementation: NavMol 3.0 is free and open source under the GNU general public license (GPLv3), and can be downloaded at http://sourceforge.net/projects/navmol as a JAR file., Contact: joao@airesdesousa.com., (© The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com)
- Published
- 2018
- Full Text
- View/download PDF
40. Machine Learning Methods to Predict Density Functional Theory B3LYP Energies of HOMO and LUMO Orbitals.
- Author
-
Pereira F, Xiao K, Latino DA, Wu C, Zhang Q, and Aires-de-Sousa J
- Subjects
- Machine Learning, Quantum Theory
- Abstract
Machine learning algorithms were explored for the fast estimation of HOMO and LUMO orbital energies calculated by DFT B3LYP, on the basis of molecular descriptors exclusively based on connectivity. The whole project involved the retrieval and generation of molecular structures, quantum chemical calculations for a database with >111 000 structures, development of new molecular descriptors, and training/validation of machine learning models. Several machine learning algorithms were screened, and an applicability domain was defined based on Euclidean distances to the training set. Random forest models predicted an external test set of 9989 compounds achieving mean absolute error (MAE) up to 0.15 and 0.16 eV for the HOMO and LUMO orbitals, respectively. The impact of the quantum chemical calculation protocol was assessed with a subset of compounds. Inclusion of the orbital energy calculated by PM7 as an additional descriptor significantly improved the quality of estimations (reducing the MAE in >30%).
- Published
- 2017
- Full Text
- View/download PDF
41. Synthesis and Biological Evaluation of Hybrid 1,5- and 2,5-Disubstituted Indoles as Potentially New Antitubercular Agents.
- Author
-
Soares A, Estevao MS, Marques MMB, Kovalishyn V, Latino DARS, Aires-de-Sousa J, Ramos J, Viveiros M, and Martins F
- Subjects
- Antitubercular Agents chemical synthesis, Drug Design, Indoles chemical synthesis, Isoniazid pharmacology, Machine Learning, Mycobacterium tuberculosis drug effects, Neural Networks, Computer, Pyridines chemical synthesis, Quantitative Structure-Activity Relationship, Antitubercular Agents pharmacology, Indoles pharmacology, Pyridines pharmacology
- Abstract
Background: Tuberculosis (TB) is the second leading cause of mortality worldwide being a highly contagious and insidious illness caused by Mycobacterium tuberculosis, Mtb. Additionally, the emergence of multidrug-resistant and extensively drug-resistant strains of Mtb, together with significant levels of co-infection with HIV and TB (HIV/TB) make the search for new antitubercular drugs urgent and challenging., Methods: This work was based on the hypothesis that an active compound could be obtained if substituents present in some other active compounds were attached on a core of an important structure, in this case the indole scaffold, thus generating a hybrid compound. A QSAR-oriented design based on classification and regression models along with the estimation of physicochemical and biological properties have also been used to assist in the selection of compounds. Chosen compounds were synthesized using various synthetic procedures and evaluated against M. tuberculosis H37Rv strain., Results: Selected compounds possess substituents at positions C5, C2 and N1 of the indole ring. The substituents involve p-halophenyl, pyridyl, benzyloxy and benzylamine groups. Four compounds were synthesised using suitable synthetic procedures to attain the desired substitution at the indole core. From these, three compounds are new and have been fully characterized, and tested in vitro against the H37Rv ATCC27294T Mtb strain, using isoniazid as a control. One of them, compound 2, with the pyridyl group at N1, has an experimental log (1/MIC) very close to 5 and can be considered as being (weakly) active. In fact, it is more active than 64% of all indole molecules in our data sets of experimental results from literature. The most active indole in this data sets has log (1/MIC)=5.93 with only 6 compounds with log (1/MIC) above 5.5., Conclusion: Despite the lower activity found for the tested compounds, when compared to other reported indole-derivatives, these structures, which rely on a hybrid design concept, may constitute interesting scaffolds to prepare a new family of TB inhibitors with improved activity., (Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.)
- Published
- 2017
- Full Text
- View/download PDF
42. Machine Learning Estimation of Atom Condensed Fukui Functions.
- Author
-
Zhang Q, Zheng F, Zhao T, Qu X, and Aires-de-Sousa J
- Subjects
- Models, Chemical, Quantitative Structure-Activity Relationship, Machine Learning
- Abstract
To enable the fast estimation of atom condensed Fukui functions, machine learning algorithms were trained with databases of DFT pre-calculated values for ca. 23,000 atoms in organic molecules. The problem was approached as the ranking of atom types with the Bradley-Terry (BT) model, and as the regression of the Fukui function. Random Forests (RF) were trained to predict the condensed Fukui function, to rank atoms in a molecule, and to classify atoms as high/low Fukui function. Atomic descriptors were based on counts of atom types in spheres around the kernel atom. The BT coefficients assigned to atom types enabled the identification (93-94 % accuracy) of the atom with the highest Fukui function in pairs of atoms in the same molecule with differences ≥0.1. In whole molecules, the atom with the top Fukui function could be recognized in ca. 50 % of the cases and, on the average, about 3 of the top 4 atoms could be recognized in a shortlist of 4. Regression RF yielded predictions for test sets with R(2) =0.68-0.69, improving the ability of BT coefficients to rank atoms in a molecule. Atom classification (as high/low Fukui function) was obtained with RF with sensitivity of 55-61 % and specificity of 94-95 %., (© 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.)
- Published
- 2016
- Full Text
- View/download PDF
43. Design, synthesis and biological evaluation of novel isoniazid derivatives with potent antitubercular activity.
- Author
-
Martins F, Santos S, Ventura C, Elvas-Leitão R, Santos L, Vitorino S, Reis M, Miranda V, Correia HF, Aires-de-Sousa J, Kovalishyn V, Latino DA, Ramos J, and Viveiros M
- Subjects
- Animals, Antitubercular Agents chemical synthesis, Antitubercular Agents chemistry, Chlorocebus aethiops, Crystallography, X-Ray, Dose-Response Relationship, Drug, Isoniazid chemistry, Microbial Sensitivity Tests, Models, Molecular, Molecular Structure, Structure-Activity Relationship, Vero Cells, Antitubercular Agents pharmacology, Drug Design, Isoniazid analogs & derivatives, Isoniazid pharmacology, Mycobacterium tuberculosis drug effects
- Abstract
The disturbing emergence of multidrug-resistant strains of Mycobacterium tuberculosis (Mtb) has been driving the scientific community to urgently search for new and efficient antitubercular drugs. Despite the various drugs currently under evaluation, isoniazid is still the key and most effective component in all multi-therapeutic regimens recommended by the WHO. This paper describes the QSAR-oriented design, synthesis and in vitro antitubercular activity of several potent isoniazid derivatives (isonicotinoyl hydrazones and isonicotinoyl hydrazides) against H37Rv and two resistant Mtb strains. QSAR studies entailed RFs and ASNNs classification models, as well as MLR models. Strict validation procedures were used to guarantee the models' robustness and predictive ability. Lipophilicity was shown not to be relevant to explain the activity of these derivatives, whereas shorter N-N distances and lengthy substituents lead to more active compounds. Compounds 1, 2, 4, 5 and 6, showed measured activities against H37Rv higher than INH (i.e., MIC ≤ 0.28 μM), while compound 9 exhibited a six fold decrease in MIC against the katG (S315T) mutated strain, by comparison with INH (i.e., 6.9 vs. 43.8 μM). All compounds were ineffective against H37RvINH (ΔkatG), a strain with a full deletion of the katG gene, thus corroborating the importance of KatG in the activation of INH-based compounds. The most potent compounds were also shown not to be cytotoxic up to a concentration 500 times higher than MIC., (Copyright © 2014 Elsevier Masson SAS. All rights reserved.)
- Published
- 2014
- Full Text
- View/download PDF
44. Automatic NMR-based identification of chemical reaction types in mixtures of co-occurring reactions.
- Author
-
Latino DA and Aires-de-Sousa J
- Subjects
- Azirines chemistry, Cycloaddition Reaction, Cycloparaffins chemistry, Photochemical Processes, Pyridazines chemistry, Reproducibility of Results, Algorithms, Artificial Intelligence, Magnetic Resonance Spectroscopy statistics & numerical data
- Abstract
The combination of chemoinformatics approaches with NMR techniques and the increasing availability of data allow the resolution of problems far beyond the original application of NMR in structure elucidation/verification. The diversity of applications can range from process monitoring, metabolic profiling, authentication of products, to quality control. An application related to the automatic analysis of complex mixtures concerns mixtures of chemical reactions. We encoded mixtures of chemical reactions with the difference between the (1)H NMR spectra of the products and the reactants. All the signals arising from all the reactants of the co-occurring reactions were taken together (a simulated spectrum of the mixture of reactants) and the same was done for products. The difference spectrum is taken as the representation of the mixture of chemical reactions. A data set of 181 chemical reactions was used, each reaction manually assigned to one of 6 types. From this dataset, we simulated mixtures where two reactions of different types would occur simultaneously. Automatic learning methods were trained to classify the reactions occurring in a mixture from the (1)H NMR-based descriptor of the mixture. Unsupervised learning methods (self-organizing maps) produced a reasonable clustering of the mixtures by reaction type, and allowed the correct classification of 80% and 63% of the mixtures in two independent test sets of different similarity to the training set. With random forests (RF), the percentage of correct classifications was increased to 99% and 80% for the same test sets. The RF probability associated to the predictions yielded a robust indication of their reliability. This study demonstrates the possibility of applying machine learning methods to automatically identify types of co-occurring chemical reactions from NMR data. Using no explicit structural information about the reactions participants, reaction elucidation is performed without structure elucidation of the molecules in the mixtures.
- Published
- 2014
- Full Text
- View/download PDF
45. A big data approach to the ultra-fast prediction of DFT-calculated bond energies.
- Author
-
Qu X, Latino DA, and Aires-de-Sousa J
- Abstract
Background: The rapid access to intrinsic physicochemical properties of molecules is highly desired for large scale chemical data mining explorations such as mass spectrum prediction in metabolomics, toxicity risk assessment and drug discovery. Large volumes of data are being produced by quantum chemistry calculations, which provide increasing accurate estimations of several properties, e.g. by Density Functional Theory (DFT), but are still too computationally expensive for those large scale uses. This work explores the possibility of using large amounts of data generated by DFT methods for thousands of molecular structures, extracting relevant molecular properties and applying machine learning (ML) algorithms to learn from the data. Once trained, these ML models can be applied to new structures to produce ultra-fast predictions. An approach is presented for homolytic bond dissociation energy (BDE)., Results: Machine learning models were trained with a data set of >12,000 BDEs calculated by B3LYP/6-311++G(d,p)//DFTB. Descriptors were designed to encode atom types and connectivity in the 2D topological environment of the bonds. The best model, an Associative Neural Network (ASNN) based on 85 bond descriptors, was able to predict the BDE of 887 bonds in an independent test set (covering a range of 17.67-202.30 kcal/mol) with RMSD of 5.29 kcal/mol, mean absolute deviation of 3.35 kcal/mol, and R (2) = 0.953. The predictions were compared with semi-empirical PM6 calculations, and were found to be superior for all types of bonds in the data set, except for O-H, N-H, and N-N bonds. The B3LYP/6-311++G(d,p)//DFTB calculations can approach the higher-level calculations B3LYP/6-311++G(3df,2p)//B3LYP/6-31G(d,p) with an RMSD of 3.04 kcal/mol, which is less than the RMSD of ASNN (against both DFT methods). An experimental web service for on-line prediction of BDEs is available at http://joao.airesdesousa.com/bde., Conclusion: Knowledge could be automatically extracted by machine learning techniques from a data set of calculated BDEs, providing ultra-fast access to accurate estimations of DFT-calculated BDEs. This demonstrates how to extract value from large volumes of data currently being produced by quantum chemistry calculations at an increasing speed mostly without human intervention. In this way, high-level theoretical quantum calculations can be used in large-scale applications that otherwise would not afford the intrinsic computational cost.
- Published
- 2013
- Full Text
- View/download PDF
46. Models for identification of erroneous atom-to-atom mapping of reactions performed by automated algorithms.
- Author
-
Muller C, Marcou G, Horvath D, Aires-de-Sousa J, and Varnek A
- Subjects
- Automation, Databases, Protein, False Positive Reactions, Models, Biological, Computational Biology methods, Support Vector Machine
- Abstract
Machine learning (SVM and JRip rule learner) methods have been used in conjunction with the Condensed Graph of Reaction (CGR) approach to identify errors in the atom-to-atom mapping of chemical reactions produced by an automated mapping tool by ChemAxon. The modeling has been performed on the three first enzymatic classes of metabolic reactions from the KEGG database. Each reaction has been converted into a CGR representing a pseudomolecule with conventional (single, double, aromatic, etc.) bonds and dynamic bonds characterizing chemical transformations. The ChemAxon tool was used to automatically detect the matching atom pairs in reagents and products. These automated mappings were analyzed by the human expert and classified as "correct" or "wrong". ISIDA fragment descriptors generated for CGRs for both correct and wrong mappings were used as attributes in machine learning. The learned models have been validated in n-fold cross-validation on the training set followed by a challenge to detect correct and wrong mappings within an external test set of reactions, never used for learning. Results show that both SVM and JRip models detect most of the wrongly mapped reactions. We believe that this approach could be used to identify erroneous atom-to-atom mapping performed by any automated algorithm.
- Published
- 2012
- Full Text
- View/download PDF
47. Automatic Perception of Chemical Similarities Between Metabolic Pathways.
- Author
-
Latino DA and Aires-de-Sousa J
- Abstract
Metabolic pathways are at the crossroad between the chemical world of small molecules and the biological world of enzymes, genes and regulation. Methods for their processing are therefore required for a great variety of applications. The work presented here reports a new method to encode metabolic pathways and reactomes of organisms based on the MOLMAP approach. Pathways are represented from features of the metabolites involved in their reactions enabling to automatically perceive chemical similarities, and making no use of EC numbers. MOLMAP descriptors are based on atomic topological and physicochemical features of the bonds involved in reactions. The results show that self-organizing maps (SOM) can be trained with MOLMAPs of pathways to automatically recognize similarities between pathways of the same type of metabolism. The study also illustrates the possibility of applying the MOLMAP methodology at progressively higher levels of complexity, bridging chemical and biological information, and going all the way from atomic properties to the classification of organisms., (Copyright © 2012 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.)
- Published
- 2012
- Full Text
- View/download PDF
48. Estimation of Mayr electrophilicity with a quantitative structure-property relationship approach using empirical and DFT descriptors.
- Author
-
Pereira F, Latino DA, and Aires-de-Sousa J
- Abstract
Quantitative structure-property relationships (QSPRs) were investigated for the estimation of the Mayr electrophilicity parameter using a data set of 64 compounds, all currently available uncharged electrophiles in Mayr's Database of Reactivity Parameters. Three collections of empirical descriptors were employed, from Dragon, Adriana.Code, and CDK. Models were built with multilinear regressions, k nearest neighbors, model trees, random forests, support vector machines (SVMs), associative neural networks, and counterpropagation neural networks. Quantum chemical descriptors were calculated with density functional theory (DFT) methods and incorporated in QSPR models. The best results were achieved with SVM using seven empirical and DFT descriptors; an R(2) of 0.92 was obtained for the test set (21 compounds). The final seven descriptors were the Parr electrophilicity index, ε(LUMO), hardness, and four CDK descriptors (FNSA-3, ATSc5, Kier2, and nAtomLAC). Screening of correlations between individual descriptors and Mayr electrophilicity revealed the highest absolute value of correlation for DFT ε(LUMO) (R = -0.82) and comparable correlations for some empirical descriptors, e.g., Dragon's folding degree index (R = -0.80), Kier flexibility index (R = -0.78), and Kier S2K index (R = -0.78). High correlations were observed in the training set between reactivity descriptors calculated by the PM6 semiempirical and DFT methods (R = 0.96 for ε(LUMO) and 0.94 for the electrophilicity index).
- Published
- 2011
- Full Text
- View/download PDF
49. Classification of chemical reactions and chemoinformatic processing of enzymatic transformations.
- Author
-
Latino DA and Aires-de-Sousa J
- Subjects
- Biochemical Phenomena, Chemical Phenomena, Computational Biology methods, Computers, Databases, Factual, Enzymes genetics, Genome, Metabolic Networks and Pathways, Models, Biological, Reproducibility of Results, Enzymes classification, Enzymes metabolism
- Abstract
The automatic perception of chemical similarities between chemical reactions is required for a variety of applications in chemistry and connected fields, namely with databases of metabolic reactions. Classification of enzymatic reactions is required, e.g., for genome-scale reconstruction (or comparison) of metabolic pathways, computer-aided validation of classification systems, or comparison of enzymatic mechanisms. This chapter presents different current approaches for the representation of chemical reactions enabling automatic reaction classification. Representations based on the encoding of the reaction center are illustrated, which use physicochemical features, Reaction Classification (RC) numbers, or Condensed Reaction Graphs (CRG). Representation of differences between the structures of products and reactants include reaction signatures, fingerprint differences, and the MOLMAP approach. The approaches are illustrated with applications to real datasets.
- Published
- 2011
- Full Text
- View/download PDF
50. Machine learning of chemical reactivity from databases of organic reactions.
- Author
-
Carrera GV, Gupta S, and Aires-de-Sousa J
- Subjects
- Computer Simulation, Databases, Factual, Models, Chemical, Molecular Structure, Artificial Intelligence, Borohydrides chemistry, Butylamines chemistry, Quantitative Structure-Activity Relationship
- Abstract
Databases of chemical reactions contain knowledge about the reactivity of specific reagents. Although information is in general only explicitly available for compounds reported to react, it is possible to derive information about substructures that do not react in the reported reactions. Both types of information (positive and negative) can be used to train machine learning techniques to predict if a compound reacts or not with a specific reagent. The whole process was implemented with two databases of reactions, one involving BuNH2 as the reagent, and the other NaCNBH3. Negative information was derived using MOLMAP molecular descriptors, and classification models were developed with Random Forests also based on MOLMAP descriptors. MOLMAP descriptors were based exclusively on calculated physicochemical features of molecules. Correct predictions were achieved for approximately 90% of independent test sets. While NaCNBH3 is a selective reducing reagent widely used in organic synthesis, BuNH2 is a nucleophile that mimics the reactivity of the lysine side chain (involved in an initiating step of the mechanism leading to skin sensitization).
- Published
- 2009
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.