1. SPOTONE: Hot Spots on Protein Complexes with Extremely Randomized Trees via Sequence-Only Features.
- Author
-
Preto AJ and Moreira IS
- Subjects
- Amino Acid Sequence, Amino Acids metabolism, Binding Sites, Databases, Protein, Datasets as Topic, Humans, Protein Binding, Protein Interaction Mapping, Proteins metabolism, Amino Acids chemistry, Computational Biology methods, Machine Learning, Proteins chemistry
- Abstract
Protein Hot-Spots (HS) are experimentally determined amino acids, key to small ligand binding and tend to be structural landmarks on protein-protein interactions. As such, they were extensively approached by structure-based Machine Learning (ML) prediction methods. However, the availability of a much larger array of protein sequences in comparison to determined tree-dimensional structures indicates that a sequence-based HS predictor has the potential to be more useful for the scientific community. Herein, we present SPOTONE, a new ML predictor able to accurately classify protein HS via sequence-only features. This algorithm shows accuracy, AUROC, precision, recall and F1-score of 0.82, 0.83, 0.91, 0.82 and 0.85, respectively, on an independent testing set. The algorithm is deployed within a free-to-use webserver at http://moreiralab.com/resources/spotone, only requiring the user to submit a FASTA file with one or more protein sequences.
- Published
- 2020
- Full Text
- View/download PDF