Back to Search
Start Over
Class-aware Information for Logit-based Knowledge Distillation
- Publication Year :
- 2022
- Publisher :
- arXiv, 2022.
-
Abstract
- Knowledge distillation aims to transfer knowledge to the student model by utilizing the predictions/features of the teacher model, and feature-based distillation has recently shown its superiority over logit-based distillation. However, due to the cumbersome computation and storage of extra feature transformation, the training overhead of feature-based methods is much higher than that of logit-based distillation. In this work, we revisit the logit-based knowledge distillation, and observe that the existing logit-based distillation methods treat the prediction logits only in the instance level, while many other useful semantic information is overlooked. To address this issue, we propose a Class-aware Logit Knowledge Distillation (CLKD) method, that extents the logit distillation in both instance-level and class-level. CLKD enables the student model mimic higher semantic information from the teacher model, hence improving the distillation performance. We further introduce a novel loss called Class Correlation Loss to force the student learn the inherent class-level correlation of the teacher. Empirical comparisons demonstrate the superiority of the proposed method over several prevailing logit-based methods and feature-based methods, in which CLKD achieves compelling results on various visual classification tasks and outperforms the state-of-the-art baselines.<br />Comment: 12 pages, 4 figures, 12 tables
Details
- Database :
- OpenAIRE
- Accession number :
- edsair.doi.dedup.....8615ed1f5b997a5f615cb4bdb747bc90
- Full Text :
- https://doi.org/10.48550/arxiv.2211.14773