Back to Search Start Over

Boosting phosphorylation site prediction with sequence feature‐based machine learning.

Authors :
Maiti, Shyantani
Hassan, Atif
Mitra, Pralay
Source :
Proteins; Feb2020, Vol. 88 Issue 2, p284-291, 8p
Publication Year :
2020

Abstract

Protein phosphorylation is one of the essential posttranslation modifications playing a vital role in the regulation of many fundamental cellular processes. We propose a LightGBM‐based computational approach that uses evolutionary, geometric, sequence environment, and amino acid‐specific features to decipher phosphate binding sites from a protein sequence. Our method, while compared with other existing methods on 2429 protein sequences taken from standard Phospho.ELM (P.ELM) benchmark data set featuring 11 organisms reports a higher F1 score = 0.504 (harmonic mean of the precision and recall) and ROC AUC = 0.836 (area under the curve of the receiver operating characteristics). The computation time of our proposed approach is much less than that of the recently developed deep learning‐based framework. Structural analysis on selected protein sequences informs that our prediction is the superset of the phosphorylation sites, as mentioned in P.ELM data set. The foundation of our scheme is manual feature engineering and a decision tree‐based classification. Hence, it is intuitive, and one can interpret the final tree as a set of rules resulting in a deeper understanding of the relationships between biophysical features and phosphorylation sites. Our innovative problem transformation method permits more control over precision and recall as is demonstrated by the fact that if we incorporate output probability of the existing deep learning framework as an additional feature, then our prediction improves (F1 score = 0.546; ROC AUC = 0.849). The implementation of our method can be accessed at http://cse.iitkgp.ac.in/~pralay/resources/PPSBoost/ and is mirrored at https://cosmos.iitkgp.ac.in/PPSBoost. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
08873585
Volume :
88
Issue :
2
Database :
Complementary Index
Journal :
Proteins
Publication Type :
Academic Journal
Accession number :
141032513
Full Text :
https://doi.org/10.1002/prot.25801