Back to Search
Start Over
A novel method for predicting RNA-interacting residues in proteins using a combination of feature-based and sequence template-based methods
- Source :
- Biotechnology & Biotechnological Equipment, Vol 33, Iss 1, Pp 1138-1149 (2019)
- Publication Year :
- 2019
- Publisher :
- Taylor & Francis Group, 2019.
-
Abstract
- RNA-binding proteins (RBPs) play a significant role in many cellular processes and regulation of gene expression, therefore, accurately identifying the RNA-interacting residues in protein sequences is crucial to detect the structure of RBPs and infer their function for new drug design. The protein sequence as basic information has been widely used in many protein researches with the combination of machine learning techniques. Here, we propose a sequence-based method to predict the RNA-protein interacting residues in protein sequences. The prediction method is composed of two predictors including a feature-based predictor and a sequence template-based predictor. The feature-based predictor applies the random forest (RF) classifier with the protein sequence information. After getting the classification probability, an adjustment procedure is used in consideration of neighbouring correlation between RNA interacting residues. The sequence template-based predictor selects the optimal template of the query sequence by multiple sequence alignment and matches the interacting residues in template sequence into the query sequence. With the combination of two predictors, the coverage and prediction performance of our method have been greatly improved, the MCC value increases from 0.467 and 0.352 to 0.499 in our validation set. In order to evaluate our proposed method, an independent testing set is utilized to compare with other two hybrid methods. As a result, our method achieves better performance than the other two methods with an overall accuracy of 0.817, an MCC value of 0.511 and an F-score of 0.605, which demonstrates that our method can reliably predict the RNA interacting residues in protein sequences. Moreover, the effectiveness of our newly proposed adjustment procedure in the feature-based predictor is examined and analyzed in detail.
- Subjects :
- 0106 biological sciences
Regulation of gene expression
0303 health sciences
lcsh:Biotechnology
RNA
Computational biology
Biology
01 natural sciences
Ensemble learning
Random forest
03 medical and health sciences
lcsh:TP248.13-248.65
Feature based
ensemble learning
Template based
protein
random forest
030304 developmental biology
010606 plant biology & botany
Biotechnology
Sequence (medicine)
rna-interacting residues
Subjects
Details
- Language :
- English
- ISSN :
- 13143530 and 13102818
- Volume :
- 33
- Issue :
- 1
- Database :
- OpenAIRE
- Journal :
- Biotechnology & Biotechnological Equipment
- Accession number :
- edsair.doi.dedup.....2bfc27279079256d6f775d6ce57dc0f1