Start Over

MaskedKD: Efficient Distillation of Vision Transformers with Masked Images

Authors :: Son, Seungwoo
Lee, Namhoon
Lee, Jaeho
Son, Seungwoo
Lee, Namhoon
Lee, Jaeho
Publication Year :: 2023
Abstract: Knowledge distillation is an effective method for training lightweight models, but it introduces a significant amount of computational overhead to the training cost, as the method requires acquiring teacher supervisions on training samples. This additional cost -- called distillation cost -- is most pronounced when we employ large-scale teacher models such as vision transformers (ViTs). We present MaskedKD, a simple yet effective strategy that can significantly reduce the cost of distilling ViTs without sacrificing the prediction accuracy of the student model. Specifically, MaskedKD diminishes the cost of running teacher at inference by masking a fraction of image patch tokens fed to the teacher, and therefore skipping the computations required to process those patches. The mask locations are selected to prevent masking away the core features of an image that the student model uses for prediction. This mask selection mechanism operates based on some attention score of the student model, which is already computed during the student forward pass, and thus incurs almost no additional computation. Without sacrificing the final student accuracy, MaskedKD dramatically reduces the amount of computations required for distilling ViTs. We demonstrate that MaskedKD can save up the distillation cost by $50\%$ without any student performance drop, leading to approximately $28\%$ drop in the overall training FLOPs.

Details

Database :: OAIster
Publication Type :: Electronic Resource
Accession number :: edsoai.on1381603551
Document Type :: Electronic Resource

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

MaskedKD: Efficient Distillation of Vision Transformers with Masked Images

Abstract

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

MaskedKD: Efficient Distillation of Vision Transformers with Masked Images

Abstract

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources