Back to Search
Start Over
MicroBERT: Distilling MoE-Based Knowledge from BERT into a Lighter Model.
- Source :
- Applied Sciences (2076-3417); Jul2024, Vol. 14 Issue 14, p6171, 14p
- Publication Year :
- 2024
-
Abstract
- Natural language-processing tasks have been improved greatly by large language models (LLMs). However, numerous parameters make their execution computationally expensive and difficult on resource-constrained devices. For this problem, as well as maintaining accuracy, some techniques such as distillation and quantization have been proposed. Unfortunately, current methods fail to integrate model pruning with downstream tasks and overlook sentence-level semantic modeling, resulting in reduced efficiency of distillation. To alleviate these limitations, we propose a novel distilled lightweight model for BERT named MicroBERT. This method can transfer the knowledge contained in the "teacher" BERT model to a "student" BERT model. The sentence-level feature alignment loss (FAL) distillation mechanism, guided by Mixture-of-Experts (MoE), captures comprehensive contextual semantic knowledge from the "teacher" model to enhance the "student" model's performance while reducing its parameters. To make the outputs of "teacher" and "student" models comparable, we introduce the idea of a generative adversarial network (GAN) to train a discriminator. Our experimental results based on four datasets show that all steps of our distillation mechanism are effective, and the MicroBERT (101.14%) model outperforms TinyBERT (99%) by 2.24% in terms of average distillation reductions in various tasks on the GLUE dataset. [ABSTRACT FROM AUTHOR]
- Subjects :
- LANGUAGE models
GENERATIVE adversarial networks
KNOWLEDGE transfer
DISTILLATION
GLUE
Subjects
Details
- Language :
- English
- ISSN :
- 20763417
- Volume :
- 14
- Issue :
- 14
- Database :
- Complementary Index
- Journal :
- Applied Sciences (2076-3417)
- Publication Type :
- Academic Journal
- Accession number :
- 178690739
- Full Text :
- https://doi.org/10.3390/app14146171