Back to Search
Start Over
HIV- Bidirectional Encoder Representations From Transformers: A Set of Pretrained Transformers for Accelerating HIV Deep Learning Tasks
- Source :
- Frontiers in Virology. 2
- Publication Year :
- 2022
- Publisher :
- Frontiers Media SA, 2022.
-
Abstract
- The human immunodeficiency virus type 1 (HIV-1) is a global health threat that is characterized by extensive genetic diversity both within and between patients, rapid mutation to evade immune controls and antiretroviral therapies, and latent cellular and tissue reservoirs that stymie cure efforts. Viral genomic sequencing has proven effective at surveilling these phenotypes. However, rapid, accurate, and explainable prediction techniques lag our sequencing ability. Modern natural language processing libraries, like the Hugging Face transformers library, have both advanced the technical field and brought much-needed standardization of prediction tasks. Herein, the application of this toolset to an array of classification tasks useful to HIV-1 biology was explored: protease inhibitor resistance, coreceptor utilization, and body-site identification. HIV-Bidirectional Encoder Representations from Transformers (BERT), a protein-based transformer model fine-tuned on HIV-1 genomic sequences, was able to achieve accuracies of 88%, 92%, and 89% on the respective tasks, making it competitive with leading models capable of only one of these tasks. This model was also evaluated using a data augmentation strategy when mutations of known function were introduced. The HIV-BERT model produced results that agreed in directionality 10- to 1000-fold better than traditional machine learning models, indicating an improved ability to generalize biological knowledge to unseen sequences. The HIV-BERT model, trained task-specific models, and the datasets used to construct them have been released to the Hugging Face repository to accelerate research in this field.
Details
- ISSN :
- 2673818X
- Volume :
- 2
- Database :
- OpenAIRE
- Journal :
- Frontiers in Virology
- Accession number :
- edsair.doi...........e2200a0af8c1fdd278c669d35ddbed92
- Full Text :
- https://doi.org/10.3389/fviro.2022.880618