Back to Search Start Over

Word reordering on multiple pivots for the Japanese and Indonesian language pair.

Authors :
Budiwati, Sari Dewi
Aritsugi, Masayoshi
Source :
Machine Translation; Dec2021, Vol. 35 Issue 4, p611-636, 26p
Publication Year :
2021

Abstract

We investigated multiple pivot approaches for the Japanese and Indonesian (Ja–Id) language pair in phrase-based statistical machine translation (SMT). We used four languages as pivots: viz., English, Malaysian, Filipino, and the Myanmar language. Considering that each language pair between the source–pivot and pivot–target has a different word order, we conducted two experiments, namely, without reordering (WoR) and with reordering (WR) on the source language. Triangulation and linear interpolation (LI) approaches were used to combine multiple pivot phrase tables. The combination of phrase tables was employed without a source–target phrase table. In the WoR experiment, the use of multiple pivots improved the BLEU scores by 0.24 and 2.49 compared to the baseline and single pivot, respectively. However, the translation output of WoR was incomprehensible because it followed the Japanese word order. In the WR experiment, we reordered the Japanese word order, that is, subject–object–verb (SOV), into Indonesian word order, that is, subject–verb–object (SVO) using the Lader (Latent Derivation Reorderer). The multiple pivots of WR improved the BLEU scores by 0.47 compared with the baseline. Furthermore, by combining many pivot languages, the BLEU score was improved by more than 0.20. The translation output of WR is also more comprehensible than that of WoR. Finally, a comparison with neural machine translation (NMT) indicates that SMT obtained better results than NMT in the experiments, including a small dataset setup. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
09226567
Volume :
35
Issue :
4
Database :
Complementary Index
Journal :
Machine Translation
Publication Type :
Academic Journal
Accession number :
154873541
Full Text :
https://doi.org/10.1007/s10590-021-09288-8