Back to Search Start Over

Predictive Recognition of DNA-binding Proteins Based on Pre-trained Language Model BERT.

Authors :
Ma, Yue
Pei, Yongzhen
Li, Changguo
Source :
Journal of Bioinformatics & Computational Biology; Dec2023, Vol. 21 Issue 6, p1-18, 18p
Publication Year :
2023

Abstract

Identifying proteins is crucial for disease diagnosis and treatment. With the increase of known proteins, large-scale batch predictions are essential. However, traditional biological experiments being time-consuming and expensive are difficult to accomplish this task efficiently. Nevertheless, deep learning algorithms based on big data analysis have manifested potential in this aspect. In recent years, language representation models, especially BERT, have made significant advancements in natural language processing. In this paper, using three protein segmentation methods and three encoder numbers, nine BERT models with different sizes are constructed to predict whether known proteins are DNA-binding proteins or not. Furthermore, based on the concept of protein motifs, multi-scale convolutional networks are fused into the models to extract the local features of DNA-binding proteins. Finally, we find that the larger the number of encoders, the better the model predictions under the condition of considering each amino acid in the protein as a word. Our proposed algorithm achieves 81.88% sensitivity and 0.39 MCC value on the test set. Furthermore, it achieves 62.41% accuracy on the independent test set PDB2272. It is evident that our proposed method can be a tool to assist in the identification of DNA-binding proteins. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
02197200
Volume :
21
Issue :
6
Database :
Complementary Index
Journal :
Journal of Bioinformatics & Computational Biology
Publication Type :
Academic Journal
Accession number :
175010063
Full Text :
https://doi.org/10.1142/S0219720023500282