Back to Search Start Over

Identifying Pupylation Proteins and Sites by Incorporating Multiple Methods

Authors :
Wang-Ren Qiu
Meng-Yue Guan
Qian-Kun Wang
Li-Liang Lou
Xuan Xiao
Source :
Frontiers in Endocrinology, Vol 13 (2022)
Publication Year :
2022
Publisher :
Frontiers Media S.A., 2022.

Abstract

Pupylation is an important posttranslational modification in proteins and plays a key role in the cell function of microorganisms; an accurate prediction of pupylation proteins and specified sites is of great significance for the study of basic biological processes and development of related drugs since it would greatly save experimental costs and improve work efficiency. In this work, we first constructed a model for identifying pupylation proteins. To improve the pupylation protein prediction model, the KNN scoring matrix model based on functional domain GO annotation and the Word Embedding model were used to extract the features and Random Under-sampling (RUS) and Synthetic Minority Over-sampling Technique (SMOTE) were applied to balance the dataset. Finally, the balanced data sets were input into Extreme Gradient Boosting (XGBoost). The performance of 10-fold cross-validation shows that accuracy (ACC), Matthew’s correlation coefficient (MCC), and area under the ROC curve (AUC) are 95.23%, 0.8100, and 0.9864, respectively. For the pupylation site prediction model, six feature extraction codes (i.e., TPC, AAI, One-hot, PseAAC, CKSAAP, and Word Embedding) served to extract protein sequence features, and the chi-square test was employed for feature selection. Rigorous 10-fold cross-validations indicated that the accuracies are very high and outperformed its existing counterparts. Finally, for the convenience of researchers, PUP-PS-Fuse has been established at https://bioinfo.jcu.edu.cn/PUP-PS-Fuse and http://121.36.221.79/PUP-PS-Fuse/as a backup.

Details

Language :
English
ISSN :
16642392
Volume :
13
Database :
Directory of Open Access Journals
Journal :
Frontiers in Endocrinology
Publication Type :
Academic Journal
Accession number :
edsdoj.603ae93d22614e9c9d74677c2d62b2ff
Document Type :
article
Full Text :
https://doi.org/10.3389/fendo.2022.849549