Back to Search Start Over

PseKNC and Adaboost-Based Method for DNA-Binding Proteins Recognition

Authors :
Lina Yang
Xichun Li
Xiangyu Li
Patrick S. P. Wang
Ting Shu
Source :
International Journal of Pattern Recognition and Artificial Intelligence. 35:2150022
Publication Year :
2021
Publisher :
World Scientific Pub Co Pte Ltd, 2021.

Abstract

DNA-binding proteins are an essential part of the DNA. It also an integral component during life processes of various organisms, for instance, DNA recombination, replication, and so on. Recognition of such proteins helps medical researchers pinpoint the cause of disease. Traditional techniques of identifying DNA-binding proteins are expensive and time-consuming. Machine learning methods can identify these proteins quickly and efficiently. However, the accuracies of the existing related methods were not high enough. In this paper, we propose a framework to identify DNA-binding proteins. The proposed framework first uses PseKNC (ps), MomoKGap (mo), and MomoDiKGap (md) methods to combine three algorithms to extract features. Further, we apply Adaboost weight ranking to select optimal feature subsets from the above three types of features. Based on the selected features, three algorithms (k-nearest neighbor (knn), Support Vector Machine (SVM), and Random Forest (RF)) are applied to classify it. Finally, three predictors for identifying DNA-binding proteins are established, including [Formula: see text], [Formula: see text], [Formula: see text]. We utilize benchmark and independent datasets to train and evaluate the proposed framework. Three tests are performed, including Jackknife test, 10-fold cross-validation and independent test. Among them, the accuracy of ps+md is the highest. We named the model with the best result as psmdDBPs and applied it to identify DNA-binding proteins.

Details

ISSN :
17936381 and 02180014
Volume :
35
Database :
OpenAIRE
Journal :
International Journal of Pattern Recognition and Artificial Intelligence
Accession number :
edsair.doi...........4f8300fcaaa76a3ad66f645846256e95
Full Text :
https://doi.org/10.1142/s0218001421500221