Back to Search
Start Over
Performance of machine-learning scoring functions in structure-based virtual screening
- Source :
- Scientific Reports, Scientific Reports, 2017, 7, ⟨10.1038/srep46710⟩, Scientific Reports, Nature Publishing Group, 2017, 7, ⟨10.1038/srep46710⟩
- Publication Year :
- 2017
- Publisher :
- HAL CCSD, 2017.
-
Abstract
- Classical scoring functions have reached a plateau in their performance in virtual screening and binding affinity prediction. Recently, machine-learning scoring functions trained on protein-ligand complexes have shown great promise in small tailored studies. They have also raised controversy, specifically concerning model overfitting and applicability to novel targets. Here we provide a new ready-to-use scoring function (RF-Score-VS) trained on 15 426 active and 893 897 inactive molecules docked to a set of 102 targets. We use the full DUD-E data sets along with three docking tools, five classical and three machine-learning scoring functions for model building and performance assessment. Our results show RF-Score-VS can substantially improve virtual screening performance: RF-Score-VS top 1% provides 55.6% hit rate, whereas that of Vina only 16.2% (for smaller percent the difference is even more encouraging: RF-Score-VS top 0.1% achieves 88.6% hit rate for 27.5% using Vina). In addition, RF-Score-VS provides much better prediction of measured binding affinity than Vina (Pearson correlation of 0.56 and −0.18, respectively). Lastly, we test RF-Score-VS on an independent test set from the DEKOIS benchmark and observed comparable results. We provide full data sets to facilitate further research in this area (http://github.com/oddt/rfscorevs) as well as ready-to-use RF-Score-VS (http://github.com/oddt/rfscorevs_binary).
- Subjects :
- 0301 basic medicine
Computer science
[SDV]Life Sciences [q-bio]
Overfitting
Machine learning
computer.software_genre
01 natural sciences
Article
Set (abstract data type)
03 medical and health sciences
Virtual screening
[SDV.BIBS] Life Sciences [q-bio]/Quantitative Methods [q-bio.QM]
Multidisciplinary
business.industry
Ligand (biochemistry)
[SDV.BIBS]Life Sciences [q-bio]/Quantitative Methods [q-bio.QM]
0104 chemical sciences
[SDV] Life Sciences [q-bio]
010404 medicinal & biomolecular chemistry
030104 developmental biology
Docking (molecular)
Test set
Benchmark (computing)
Artificial intelligence
business
computer
Subjects
Details
- Language :
- English
- ISSN :
- 20452322
- Database :
- OpenAIRE
- Journal :
- Scientific Reports, Scientific Reports, 2017, 7, ⟨10.1038/srep46710⟩, Scientific Reports, Nature Publishing Group, 2017, 7, ⟨10.1038/srep46710⟩
- Accession number :
- edsair.doi.dedup.....fcc7f816863c3f2a6ad22875f24a08c5
- Full Text :
- https://doi.org/10.1038/srep46710⟩