Back to Search
Start Over
Using Monolingual Data in Neural Machine Translation: a Systematic Study
- Source :
- Proceedings of the Third Conference on Machine Translation: Research Papers, Conference on Machine Translation, Conference on Machine Translation, Oct 2018, Brussels, Belgium, WMT
- Publication Year :
- 2019
- Publisher :
- arXiv, 2019.
-
Abstract
- Neural Machine Translation (MT) has radically changed the way systems are developed. A major difference with the previous generation (Phrase-Based MT) is the way monolingual target data, which often abounds, is used in these two paradigms. While Phrase-Based MT can seamlessly integrate very large language models trained on billions of sentences, the best option for Neural MT developers seems to be the generation of artificial parallel data through \textsl{back-translation} - a technique that fails to fully take advantage of existing datasets. In this paper, we conduct a systematic study of back-translation, comparing alternative uses of monolingual data, as well as multiple data generation procedures. Our findings confirm that back-translation is very effective and give new explanations as to why this is the case. We also introduce new data simulation techniques that are almost as effective, yet much cheaper to implement.<br />Comment: Published in the Proceedings of the Third Conference on Machine Translation (Research Papers), 2018
- Subjects :
- FOS: Computer and information sciences
language modeling
Phrase
Machine translation
Data simulation
Computer science
02 engineering and technology
010501 environmental sciences
computer.software_genre
01 natural sciences
[INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL]
0202 electrical engineering, electronic engineering, information engineering
[INFO]Computer Science [cs]
0105 earth and related environmental sciences
Previous generation
Computer Science - Computation and Language
business.industry
Multiple data
020201 artificial intelligence & image processing
Artificial intelligence
Language model
business
computer
Computation and Language (cs.CL)
Natural language processing
Subjects
Details
- Database :
- OpenAIRE
- Journal :
- Proceedings of the Third Conference on Machine Translation: Research Papers, Conference on Machine Translation, Conference on Machine Translation, Oct 2018, Brussels, Belgium, WMT
- Accession number :
- edsair.doi.dedup.....334d5ba91b0b41b7d3e84b7e7ed07746
- Full Text :
- https://doi.org/10.48550/arxiv.1903.11437