Back to Search Start Over

Dual Knowledge Distillation for neural machine translation.

Authors :
Wan, Yuxian
Zhang, Wenlin
Li, Zhen
Zhang, Hao
Li, Yanxia
Source :
Computer Speech & Language. Mar2024, Vol. 84, pN.PAG-N.PAG. 1p.
Publication Year :
2024

Abstract

Existing knowledge distillation methods use large amount of bilingual data and focus on mining the corresponding knowledge distribution between the source language and the target language. However, for some languages, bilingual data is not abundant. In this paper, to make better use of both monolingual and limited bilingual data, we propose a new knowledge distillation method called Dual Knowledge Distillation (DKD). For monolingual data, we use a self-distillation strategy which combines self-training and knowledge distillation for the encoder to extract more consistent monolingual representation. For bilingual data, on top of the k Nearest Neighbor Knowledge Distillation (kNN-KD) method, a similar self-distillation strategy is adopted as a consistency regularization method to force the decoder to produce consistent output. Experiments on standard datasets, multi-domain translation datasets, and low-resource datasets show that DKD achieves consistent improvements over state-of-the-art baselines including kNN-KD. • Monolingual data can better help low-resource machine translation. • Dual Knowledge Distillation can better use bilingual and monolingual data. • Self-distillation can extract more consistent monolingual representation. • Consistency regularization can force the decoder to generate consistent output. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
08852308
Volume :
84
Database :
Academic Search Index
Journal :
Computer Speech & Language
Publication Type :
Academic Journal
Accession number :
173969743
Full Text :
https://doi.org/10.1016/j.csl.2023.101583