Back to Search
Start Over
Evaluation of cross-validation strategies in sequence-based binding prediction using deep learning
- Source :
- UPCommons. Portal del coneixement obert de la UPC, Universitat Politècnica de Catalunya (UPC), Recercat. Dipósit de la Recerca de Catalunya, instname
- Publication Year :
- 2019
-
Abstract
- Binding prediction between targets and drug-like compounds through Deep Neural Networks have generated promising results in recent years, outperforming traditional machine learning-based methods. However, the generalization capability of these classification models is still an issue to be addressed. In this work, we explored how different cross-validation strategies applied to data from different molecular databases affect to the performance of binding prediction proteochemometrics models. These strategies are: (1) random splitting, (2) splitting based on K-means clustering (both of actives and inactives), (3) splitting based on source database and (4) splitting based both in the clustering and in the source database. These schemas are applied to a Deep Learning proteochemometrics model and to a simple logistic regression model to be used as baseline. Additionally, two different ways of describing molecules in the model are tested: (1) by their SMILES and (2) by three fingerprints. The classification performance of our Deep Learning-based proteochemometrics model is comparable to the state of the art. Our results show that the lack of generalization of these models is due to a bias in public molecular databases and that a restrictive cross-validation schema based on compounds clustering leads to worse but more robust and credible results. Our results also show better performance when representing molecules by their fingerprints.
- Subjects :
- Quantitative structure–activity relationship
Informatics
Computer science
Generalization
General Chemical Engineering
Quantitative Structure-Activity Relationship
Library and Information Sciences
Machine learning
computer.software_genre
Cross-validation
Deep Learning
Schema (psychology)
Drug Discovery
Deep neural networks
Aprenentatge automàtic
Cluster analysis
Sequence
business.industry
Deep learning
Reproducibility of Results
General Chemistry
Computer Science Applications
Informàtica::Aplicacions de la informàtica::Aplicacions informàtiques a la física i l‘enginyeria [Àrees temàtiques de la UPC]
Artificial intelligence
business
computer
Subjects
Details
- Language :
- English
- Database :
- OpenAIRE
- Journal :
- UPCommons. Portal del coneixement obert de la UPC, Universitat Politècnica de Catalunya (UPC), Recercat. Dipósit de la Recerca de Catalunya, instname
- Accession number :
- edsair.doi.dedup.....9fd0894ddc1e1c8155f85aed1d081d90