Back to Search Start Over

Identification of efflux proteins based on contextual representations with deep bidirectional transformer encoders.

Authors :
Taju SW
Shah SMA
Ou YY
Source :
Analytical biochemistry [Anal Biochem] 2021 Nov 15; Vol. 633, pp. 114416. Date of Electronic Publication: 2021 Oct 14.
Publication Year :
2021

Abstract

Efflux proteins are the transport proteins expressed in the plasma membrane, which are involved in the movement of unwanted toxic substances through specific efflux pumps. Several studies based on computational approaches have been proposed to predict transport proteins and thereby to understand the mechanism of the movement of ions across cell membranes. However, few methods were developed to identify efflux proteins. This paper presents an approach based on the contextualized word embeddings from Bidirectional Encoder Representations from Transformers (BERT) with the Support Vector Machine (SVM) classifier. BERT is the most effective pre-trained language model that performs exceptionally well on several Natural Language Processing (NLP) tasks. Therefore, the contextualized representations from BERT were implemented to incorporate multiple interpretations of identical amino acids in the sequence. A dataset of efflux proteins with annotations was first established. The feature vectors were extracted by transferring protein data through the hidden layers of the pre-trained model. Our proposed method was trained on complete training datasets to identify efflux proteins and achieved the accuracies of 94.15% and 87.13% in the independent tests on membrane and transport datasets, respectively. This study opens a research avenue for the implementation of contextualized word embeddings in Bioinformatics and Computational Biology.<br /> (Copyright © 2021 Elsevier Inc. All rights reserved.)

Details

Language :
English
ISSN :
1096-0309
Volume :
633
Database :
MEDLINE
Journal :
Analytical biochemistry
Publication Type :
Academic Journal
Accession number :
34656612
Full Text :
https://doi.org/10.1016/j.ab.2021.114416