Author: "Weibin Gu" / Language: chinese - Searchworks@Jio Institute Digital Library Search Results

Searchworks

Author: Jialai SHI and Weibin GUO
Subjects: deep pre-training model, BERT, multi-teacher distillation, nature language understanding, Electronic computers. Computer science, QA75.5-76.95
Abstract: Knowledge distillation is a model compression scheme commonly used to solve the problems of large scale and slow inference of BERT constant depth pre-training model.The method of ＆quot;multi-teacher distillation＆quot; can further improve the performance of the student model, while the traditional ＆quot;one-to-one＆quot; mapping method mandatory assignment strategy for the middle layer of the teacher model will lead to the abandonment of most of the middle features.The ＆quot;one-tomany＆quot; mapping method is proposed to solve the problem that the middle layer cannot be aligned during knowledge distillation, and help students master the grammar, reference and other knowledge in the middle layer of the teacher model.Experiments on several data sets in GLUE show that the student model retains 93.9% of the average inference accuracy of the teacher model, while only accounting for 41.5% of the average parameter size of the teacher model.
Published: 2024
Full Text: View/download PDF

Searchworks