Start Over

基于空间注意力图的知识蒸馏算法.

Authors :: 王礼乐
 刘渊
Source :: Application Research of Computers / Jisuanji Yingyong Yanjiu. Jun2024, Vol. 41 Issue 6, p1693-1698. 6p.
Publication Year :: 2024
Abstract: Knowledge distillation algorithms have a great effect on the streamlining of deep neural networks. The current feature-based knowledge distillation algorithms either focus on a single part for improvement and ignore other beneficial parts, or provides effective guidance for the part that a small model should focus on, which makes the distillation effect insufficient. In order to make full use of the beneficial information of the large model and process it to improve the knowledge conversion rate of the small model, this paper proposed a new distillation algorithm. Firstly, it used the conditional probability distribution to fit the feature spatial distribution of the large model's middle layer, and then extracted the spatial attention maps that tended to be similar after fitting together with other beneficial information. Finally, it used the small convolutional layer, narrowed the gap between models, transmitted the transformed information to the small model to achieve distillation. Experimental results show that the algorithm has the applicability of multiple teacher-student combinations and the generality of multiple data sets, and compared with the current more advanced distillation algorithms, the performance is improved by about 1.19% and the time is shortened by 0.16 h. It has important engineering significance and wide application prospects for large networks' optimization and the application of deep learning on low-resource devices. [ABSTRACT FROM AUTHOR]