Back to Search Start Over

Using Monolingual Data in Neural Machine Translation: a Systematic Study

Authors :
Franck Burlot
François Yvon
Lingua Custodia
Traitement du Langage Parlé (TLP)
Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur (LIMSI)
Université Paris Saclay (COmUE)-Centre National de la Recherche Scientifique (CNRS)-Sorbonne Université - UFR d'Ingénierie (UFR 919)
Sorbonne Université (SU)-Sorbonne Université (SU)-Université Paris-Saclay-Université Paris-Sud - Paris 11 (UP11)-Université Paris Saclay (COmUE)-Centre National de la Recherche Scientifique (CNRS)-Sorbonne Université - UFR d'Ingénierie (UFR 919)
Sorbonne Université (SU)-Sorbonne Université (SU)-Université Paris-Saclay-Université Paris-Sud - Paris 11 (UP11)
Source :
Proceedings of the Third Conference on Machine Translation: Research Papers, Conference on Machine Translation, Conference on Machine Translation, Oct 2018, Brussels, Belgium, WMT
Publication Year :
2019
Publisher :
arXiv, 2019.

Abstract

Neural Machine Translation (MT) has radically changed the way systems are developed. A major difference with the previous generation (Phrase-Based MT) is the way monolingual target data, which often abounds, is used in these two paradigms. While Phrase-Based MT can seamlessly integrate very large language models trained on billions of sentences, the best option for Neural MT developers seems to be the generation of artificial parallel data through \textsl{back-translation} - a technique that fails to fully take advantage of existing datasets. In this paper, we conduct a systematic study of back-translation, comparing alternative uses of monolingual data, as well as multiple data generation procedures. Our findings confirm that back-translation is very effective and give new explanations as to why this is the case. We also introduce new data simulation techniques that are almost as effective, yet much cheaper to implement.<br />Comment: Published in the Proceedings of the Third Conference on Machine Translation (Research Papers), 2018

Details

Database :
OpenAIRE
Journal :
Proceedings of the Third Conference on Machine Translation: Research Papers, Conference on Machine Translation, Conference on Machine Translation, Oct 2018, Brussels, Belgium, WMT
Accession number :
edsair.doi.dedup.....334d5ba91b0b41b7d3e84b7e7ed07746
Full Text :
https://doi.org/10.48550/arxiv.1903.11437