Back to Search Start Over

An Empirical Investigation into the Effect of Parameter Choices in Knowledge Distillation

Authors :
Sultan, Md Arafat
Trivedi, Aashka
Awasthy, Parul
Sil, Avirup
Publication Year :
2024

Abstract

We present a large-scale empirical study of how choices of configuration parameters affect performance in knowledge distillation (KD). An example of such a KD parameter is the measure of distance between the predictions of the teacher and the student, common choices for which include the mean squared error (MSE) and the KL-divergence. Although scattered efforts have been made to understand the differences between such options, the KD literature still lacks a systematic study on their general effect on student performance. We take an empirical approach to this question in this paper, seeking to find out the extent to which such choices influence student performance across 13 datasets from 4 NLP tasks and 3 student sizes. We quantify the cost of making sub-optimal choices and identify a single configuration that performs well across the board.

Details

Database :
arXiv
Publication Type :
Report
Accession number :
edsarx.2401.06356
Document Type :
Working Paper