Back to Search Start Over

EL_LSTM: Prediction of DNA-Binding Residue from Protein Sequence by Combining Long Short-Term Memory and Ensemble Learning

Authors :
Lin Gui
Hongpeng Wang
Jiyun Zhou
Qin Lu
Ruifeng Xu
Source :
IEEE/ACM transactions on computational biology and bioinformatics. 17(1)
Publication Year :
2018

Abstract

Most past works for DNA-binding residue prediction did not consider the relationships between residues. In this paper, we propose a novel approach for DNA-binding residue prediction, referred to as EL_LSTM, which includes two main components. The first component is the Long Short-Term Memory (LSTM), which learns pairwise relationships between residues through a bi-gram model and then learns feature vectors for all residues. The second component is an ensemble learning based classifier introduced to tackle the data imbalance problem in binding residue predictions. We use a variant of the bagging strategy in ensemble learning to achieve balanced samples. Evaluations on PDNA-224 and DBP-123 show that adding feature relationships performs better than classifiers without feature relationships by at least 0.028 on MCC, 1.18 percent on ST and 0.012 on AUC. This indicates the usefulness of feature relationships for DNA-binding residue predictions. Evaluation on using ensemble learning indicates that the improvement can reach at least 0.021 on MCC, 1.32 percent on ST, and 0.018 on AUC compared to the use of a single LSTM classifier. Comparisons with the state-of-the-art predictors show that our proposed EL_LSTM outperforms them significantly. Further feature analysis validates the effectiveness of LSTM for the prediction of DNA-binding residues.

Details

ISSN :
15579964
Volume :
17
Issue :
1
Database :
OpenAIRE
Journal :
IEEE/ACM transactions on computational biology and bioinformatics
Accession number :
edsair.doi.dedup.....3bf49db4f5969952da7c629ec28be02e