Back to Search
Start Over
PseKNC and Adaboost-Based Method for DNA-Binding Proteins Recognition
- Source :
- International Journal of Pattern Recognition and Artificial Intelligence. 35:2150022
- Publication Year :
- 2021
- Publisher :
- World Scientific Pub Co Pte Ltd, 2021.
-
Abstract
- DNA-binding proteins are an essential part of the DNA. It also an integral component during life processes of various organisms, for instance, DNA recombination, replication, and so on. Recognition of such proteins helps medical researchers pinpoint the cause of disease. Traditional techniques of identifying DNA-binding proteins are expensive and time-consuming. Machine learning methods can identify these proteins quickly and efficiently. However, the accuracies of the existing related methods were not high enough. In this paper, we propose a framework to identify DNA-binding proteins. The proposed framework first uses PseKNC (ps), MomoKGap (mo), and MomoDiKGap (md) methods to combine three algorithms to extract features. Further, we apply Adaboost weight ranking to select optimal feature subsets from the above three types of features. Based on the selected features, three algorithms (k-nearest neighbor (knn), Support Vector Machine (SVM), and Random Forest (RF)) are applied to classify it. Finally, three predictors for identifying DNA-binding proteins are established, including [Formula: see text], [Formula: see text], [Formula: see text]. We utilize benchmark and independent datasets to train and evaluate the proposed framework. Three tests are performed, including Jackknife test, 10-fold cross-validation and independent test. Among them, the accuracy of ps+md is the highest. We named the model with the best result as psmdDBPs and applied it to identify DNA-binding proteins.
- Subjects :
- 0303 health sciences
business.industry
0206 medical engineering
02 engineering and technology
Computational biology
DNA-binding protein
Replication (computing)
law.invention
03 medical and health sciences
chemistry.chemical_compound
chemistry
Artificial Intelligence
law
Component (UML)
Recombinant DNA
Computer Vision and Pattern Recognition
Artificial intelligence
AdaBoost
business
020602 bioinformatics
Software
DNA
030304 developmental biology
Subjects
Details
- ISSN :
- 17936381 and 02180014
- Volume :
- 35
- Database :
- OpenAIRE
- Journal :
- International Journal of Pattern Recognition and Artificial Intelligence
- Accession number :
- edsair.doi...........4f8300fcaaa76a3ad66f645846256e95
- Full Text :
- https://doi.org/10.1142/s0218001421500221