Back to Search Start Over

DCSE:Double-Channel-Siamese-Ensemble model for protein protein interaction prediction.

Authors :
Chen W
Wang S
Song T
Li X
Han P
Gao C
Source :
BMC genomics [BMC Genomics] 2022 Aug 04; Vol. 23 (1), pp. 555. Date of Electronic Publication: 2022 Aug 04.
Publication Year :
2022

Abstract

Background: Protein-protein interaction (PPI) is very important for many biochemical processes. Therefore, accurate prediction of PPI can help us better understand the role of proteins in biochemical processes. Although there are many methods to predict PPI in biology, they are time-consuming and lack accuracy, so it is necessary to build an efficiently and accurately computational model in the field of PPI prediction.<br />Results: We present a novel sequence-based computational approach called DCSE (Double-Channel-Siamese-Ensemble) to predict potential PPI. In the encoding layer, we treat each amino acid as a word, and map it into an N-dimensional vector. In the feature extraction layer, we extract features from local and global perspectives by Multilayer Convolutional Neural Network (MCN) and Multilayer Bidirectional Gated Recurrent Unit with Convolutional Neural Networks (MBC). Finally, the output of the feature extraction layer is then fed into the prediction layer to output whether the input protein pair will interact each other. The MCN and MBC are siamese and ensemble based network, which can effectively improve the performance of the model. In order to demonstrate our model's performance, we compare it with four machine learning based and three deep learning based models. The results show that our method outperforms other models in all evaluation criteria. The Accuracy, Precision, [Formula: see text], Recall and MCC of our model are 0.9303, 0.9091, 0.9268, 0.9452, 0.8609. For the other seven models, the highest Accuracy, Precision, [Formula: see text], Recall and MCC are 0.9288, 0.9243, 0.9246, 0.9250, 0.8572. We also test our model in the imbalanced dataset and transfer our model to another species. The results show our model is excellent.<br />Conclusion: Our model achieves the best performance by comparing it with seven other models. NLP-based coding method has a good effect on PPI prediction task. MCN and MBC extract protein sequence features from local and global perspectives and these two feature extraction layers are based on siamese and ensemble network structures. Siamese-based network structure can keep the features consistent and ensemble based network structure can effectively improve the accuracy of the model.<br /> (© 2022. The Author(s).)

Details

Language :
English
ISSN :
1471-2164
Volume :
23
Issue :
1
Database :
MEDLINE
Journal :
BMC genomics
Publication Type :
Academic Journal
Accession number :
35922751
Full Text :
https://doi.org/10.1186/s12864-022-08772-6