1. Protein-DNA Binding Residue Prediction via Bagging Strategy and Sequence-based Cube-Format Feature
- Author
-
Gui-Jun Zhang, Ning-Xin Jia, Dong-Jun Yu, Zheng Linlin, Bai Yansong, and Jun Hu
- Subjects
Sequence ,Computer science ,business.industry ,Applied Mathematics ,Pattern recognition ,Base (topology) ,Convolutional neural network ,Feature (computer vision) ,Sliding window protocol ,Classifier (linguistics) ,Genetics ,Artificial intelligence ,Cube ,business ,Protein secondary structure ,Biotechnology - Abstract
Protein-DNA interactions play an important role in biological processes. Accurately identifying DNA-binding residues is a critical but challenging task for protein function annotations and drug design. Although wet-lab experimental methods are the most accurate way to identify DNA-binding residues, they are time consuming and labor intensive. There is an urgent need to develop computational methods to rapidly and accurately predict DNA-binding residues. In this study, we propose a novel sequence-based method, named PredDBR, for predicting DNA-binding residues. In PredDBR, for each protein, its position-specific frequency matrix (PSFM), predicted secondary structure (PSS), and predicted probabilities of ligand-binding residues (PPLBR) are first generated as three feature sources. Secondly, for each feature source, the sliding window technique is employed to extract the matrix-format feature of each residue. Then, we design two strategies, i.e., SR and AVE, to separately transform PSFM-based and two predicted feature source-based, i.e., PSS-based and PPLBR-based, matrix-format features of each residue into three cube-format features. Finally, after serially combining the three cube-format features, the ensemble classifier is generated via applying bagging strategy to multiple base classifiers built by the framework of 2D convolutional neural network. Experimental results demonstrate that PredDBR outperforms several state-of-the-art sequenced-based DNA-binding residue predictors.
- Published
- 2021