Back to Search
Start Over
Analysis and Comparison of Vector Space and Metric Space Representations in QSAR Modeling
- Source :
- Molecules, Vol 24, Iss 9, p 1698 (2019), Molecules, Volume 24, Issue 9
- Publication Year :
- 2019
- Publisher :
- MDPI AG, 2019.
-
Abstract
- The performance of quantitative structure&ndash<br />activity relationship (QSAR) models largely depends on the relevance of the selected molecular representation used as input data matrices. This work presents a thorough comparative analysis of two main categories of molecular representations (vector space and metric space) for fitting robust machine learning models in QSAR problems. For the assessment of these methods, seven different molecular representations that included RDKit descriptors, five different fingerprints types (MACCS, PubChem, FP2-based, Atom Pair, and ECFP4), and a graph matching approach (non-contiguous atom matching structure similarity<br />NAMS) in both vector space and metric space, were subjected to state-of-art machine learning methods that included different dimensionality reduction methods (feature selection and linear dimensionality reduction). Five distinct QSAR data sets were used for direct assessment and analysis. Results show that, in general, metric-space and vector-space representations are able to produce equivalent models, but there are significant differences between individual approaches. The NAMS-based similarity approach consistently outperformed most fingerprint representations in model quality, closely followed by Atom Pair fingerprints. To further verify these findings, the metric space-based models were fitted to the same data sets with the closest neighbors removed. These latter results further strengthened the above conclusions. The metric space graph-based approach appeared significantly superior to the other representations, albeit at a significant computational cost.
- Subjects :
- Models, Molecular
Quantitative structure–activity relationship
Support Vector Machine
Similarity (geometry)
Quantitative Structure-Activity Relationship
Pharmaceutical Science
Feature selection
01 natural sciences
Article
support vector machines
Analytical Chemistry
Machine Learning
lcsh:QD241-441
03 medical and health sciences
QSAR modeling
feature selection
lcsh:Organic chemistry
Drug Discovery
vector space
Computer Simulation
Physical and Theoretical Chemistry
030304 developmental biology
Mathematics
0303 health sciences
PCA
business.industry
Dimensionality reduction
Organic Chemistry
metric space
Pattern recognition
0104 chemical sciences
Support vector machine
010404 medicinal & biomolecular chemistry
Metric space
non-contiguous atom matching structure similarity—NAMS
Chemistry (miscellaneous)
Metric (mathematics)
Molecular Medicine
Graph (abstract data type)
Artificial intelligence
business
Algorithms
random forest
Subjects
Details
- Language :
- English
- ISSN :
- 14203049
- Volume :
- 24
- Issue :
- 9
- Database :
- OpenAIRE
- Journal :
- Molecules
- Accession number :
- edsair.doi.dedup.....092d9f85ac3b81b341d1932f91e40cd2