Back to Search
Start Over
Cross-Architecture Knowledge Distillation.
- Source :
-
International Journal of Computer Vision . Aug2024, Vol. 132 Issue 8, p2798-2824. 27p. - Publication Year :
- 2024
-
Abstract
- The Transformer network architecture has gained attention due to its ability to learn global relations and its superior performance. To boost performance, it is natural to distill complementary knowledge from a Transformer network to a convolutional neural network (CNN). However, most existing knowledge distillation methods only consider homologous-architecture distillation, which may not be suitable for cross-architecture scenarios, such as from Transformer to CNN. To address this problem, we analyze the globality and transferability of models, which reflect the ability to capture global knowledge and transfer knowledge from teacher to student, respectively. Inspired by our observations, a novel cross-architecture knowledge distillation method is proposed, which supports bi-directional distillation including from Transformer to CNN and from CNN to Transformer. Specifically, rather than directly mimicking the output and intermediate features of the teacher, a partial cross-attention projector (PCA/iPCA) and a group-wise linear projector (GL/iGL) are introduced to align the student features with the teacher's in two projected feature spaces. To better match the teacher's knowledge with the student's knowledge, an adaptive distillation router (ADR) is presented to decide the knowledge from which layer the teacher should be distilled to guide which layer of the student. A multi-view robust training scheme is further presented, to improve the robustness of the framework for distillation. Extensive experiments show that the proposed method outperforms 17 state-of-the-art methods on both small-scale and large-scale datasets. [ABSTRACT FROM AUTHOR]
- Subjects :
- *CONVOLUTIONAL neural networks
*DISTILLATION
*TRANSFORMER models
Subjects
Details
- Language :
- English
- ISSN :
- 09205691
- Volume :
- 132
- Issue :
- 8
- Database :
- Academic Search Index
- Journal :
- International Journal of Computer Vision
- Publication Type :
- Academic Journal
- Accession number :
- 178402110
- Full Text :
- https://doi.org/10.1007/s11263-024-02002-0