Back to Search
Start Over
EL_LSTM: Prediction of DNA-Binding Residue from Protein Sequence by Combining Long Short-Term Memory and Ensemble Learning
- Source :
- IEEE/ACM transactions on computational biology and bioinformatics. 17(1)
- Publication Year :
- 2018
-
Abstract
- Most past works for DNA-binding residue prediction did not consider the relationships between residues. In this paper, we propose a novel approach for DNA-binding residue prediction, referred to as EL_LSTM, which includes two main components. The first component is the Long Short-Term Memory (LSTM), which learns pairwise relationships between residues through a bi-gram model and then learns feature vectors for all residues. The second component is an ensemble learning based classifier introduced to tackle the data imbalance problem in binding residue predictions. We use a variant of the bagging strategy in ensemble learning to achieve balanced samples. Evaluations on PDNA-224 and DBP-123 show that adding feature relationships performs better than classifiers without feature relationships by at least 0.028 on MCC, 1.18 percent on ST and 0.012 on AUC. This indicates the usefulness of feature relationships for DNA-binding residue predictions. Evaluation on using ensemble learning indicates that the improvement can reach at least 0.021 on MCC, 1.32 percent on ST, and 0.018 on AUC compared to the use of a single LSTM classifier. Comparisons with the state-of-the-art predictors show that our proposed EL_LSTM outperforms them significantly. Further feature analysis validates the effectiveness of LSTM for the prediction of DNA-binding residues.
- Subjects :
- Computer science
Feature vector
0206 medical engineering
Feature extraction
02 engineering and technology
Machine Learning
Protein sequencing
Genetics
Databases, Protein
Binding Sites
Artificial neural network
business.industry
Applied Mathematics
Computational Biology
Pattern recognition
DNA
Ensemble learning
Support vector machine
DNA-Binding Proteins
Pairwise comparison
Artificial intelligence
business
Classifier (UML)
020602 bioinformatics
Algorithms
Biotechnology
Subjects
Details
- ISSN :
- 15579964
- Volume :
- 17
- Issue :
- 1
- Database :
- OpenAIRE
- Journal :
- IEEE/ACM transactions on computational biology and bioinformatics
- Accession number :
- edsair.doi.dedup.....3bf49db4f5969952da7c629ec28be02e