Back to Search Start Over

Improved Unsupervised Statistical Machine Translation via Unsupervised Word Sense Disambiguation for a Low-Resource and Indic Languages.

Authors :
Saxena, Shefali
Chaurasia, Uttkarsh
Bansal, Nitin
Daniel, Philemon
Source :
IETE Journal of Research. Dec2023, Vol. 69 Issue 12, p8848-8858. 11p.
Publication Year :
2023

Abstract

Besides word order, word choice is a key stumbling block for machine translation (MT) in morphologically rich languages due to homonyms and polysemous difficulties. On the other hand, un-translated/improperly translated words are a severe issue for Statistical Machine Translation (SMT) models. The quantity of parallel training corpus has limited unsupervised SMT (USMT) systems. Still, current research lines have successfully trained SMT systems in an unsupervised manner using monolingual data alone. However, there is still a need to enhance the translation quality of the MT output due to unaligned and improperly sensed words. This problem is addressed by incorporating unsupervised Word Sense Disambiguation (WSD) into the decoding phase of USMT. The work provided a compendium of SMT systems for five translation tasks, i.e. En→Indic languages for the WMT test dataset and evaluated on BLEU and METEOR evaluation metrics. The studies were performed on En→Hi, En→Kn, En→Ta, En→Te, and En→Be tasks and showed an improvement in BLEU points by 2.3, 2.68, 0.78, 2.32, and 1.79, respectively, and METEOR points by 1.07, 1.34, 0.72, 0.693, and 1.191, respectively, over the baseline model. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
03772063
Volume :
69
Issue :
12
Database :
Academic Search Index
Journal :
IETE Journal of Research
Publication Type :
Academic Journal
Accession number :
176450155
Full Text :
https://doi.org/10.1080/03772063.2022.2098189