Back to Search Start Over

A multitask co-training framework for improving speech translation by leveraging speech recognition and machine translation tasks.

Authors :
Zhou, Yue
Yuan, Yuxuan
Shi, Xiaodong
Source :
Neural Computing & Applications. May2024, Vol. 36 Issue 15, p8641-8656. 16p.
Publication Year :
2024

Abstract

End-to-end speech translation (ST) has attracted substantial attention due to its less error accumulation and lower latency. Based on triplet ST data ⟨ speech-transcription-translation ⟩ , multitask learning (MTL) that utilizes machine translation ⟨ transcription-translation ⟩ or automatic speech recognition ⟨ speech-transcription ⟩ task to assist in training ST model is widely employed. However, current MTL methods often suffer from subnet role mismatch, semantic inconsistency, or usually focus only on transferring knowledge from automatic speech recognition (ASR) or machine translation (MT) task, leading to insufficient transferring of cross-task knowledge. To solve these problems, we propose the multitask co-training network (MCTN) to jointly model ST, MT, and ASR tasks. Specifically, the ASR task enables the acoustic encoder to better capture local information of speech frames, and the MT task enhances the translation capability of the model. MCTN benefits from three key aspects: a well-designed multitask framework to fully exploit the association between tasks, a model decoupling and parameter sharing method to maintain consistency in subnet roles, and a co-training strategy to utilize task information in triplet ST data. Our experiments show that MCTN achieves state-of-the-art results, when using only MuST-C dataset, and significantly outperforms strong end-to-end ST baselines and cascaded systems when external data are available. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
09410643
Volume :
36
Issue :
15
Database :
Academic Search Index
Journal :
Neural Computing & Applications
Publication Type :
Academic Journal
Accession number :
176627599
Full Text :
https://doi.org/10.1007/s00521-024-09547-8