Back to Search
Start Over
Identification of efflux proteins based on contextual representations with deep bidirectional transformer encoders.
- Source :
-
Analytical biochemistry [Anal Biochem] 2021 Nov 15; Vol. 633, pp. 114416. Date of Electronic Publication: 2021 Oct 14. - Publication Year :
- 2021
-
Abstract
- Efflux proteins are the transport proteins expressed in the plasma membrane, which are involved in the movement of unwanted toxic substances through specific efflux pumps. Several studies based on computational approaches have been proposed to predict transport proteins and thereby to understand the mechanism of the movement of ions across cell membranes. However, few methods were developed to identify efflux proteins. This paper presents an approach based on the contextualized word embeddings from Bidirectional Encoder Representations from Transformers (BERT) with the Support Vector Machine (SVM) classifier. BERT is the most effective pre-trained language model that performs exceptionally well on several Natural Language Processing (NLP) tasks. Therefore, the contextualized representations from BERT were implemented to incorporate multiple interpretations of identical amino acids in the sequence. A dataset of efflux proteins with annotations was first established. The feature vectors were extracted by transferring protein data through the hidden layers of the pre-trained model. Our proposed method was trained on complete training datasets to identify efflux proteins and achieved the accuracies of 94.15% and 87.13% in the independent tests on membrane and transport datasets, respectively. This study opens a research avenue for the implementation of contextualized word embeddings in Bioinformatics and Computational Biology.<br /> (Copyright © 2021 Elsevier Inc. All rights reserved.)
Details
- Language :
- English
- ISSN :
- 1096-0309
- Volume :
- 633
- Database :
- MEDLINE
- Journal :
- Analytical biochemistry
- Publication Type :
- Academic Journal
- Accession number :
- 34656612
- Full Text :
- https://doi.org/10.1016/j.ab.2021.114416