Back to Search Start Over

Linguistically enhanced word segmentation for better neural machine translation of low resource agglutinative languages.

Authors :
Chimalamarri, Santwana
Sitaram, Dinkar
Source :
International Journal of Speech Technology; Dec2021, Vol. 24 Issue 4, p1047-1053, 7p
Publication Year :
2021

Abstract

One of the several challenges faced by neural machine translation systems is the lack of standard parallel corpora for several language pairs. Poor translation qualities often result from inadequate data. Aggravating this problem further are the issues of morphological complexity and agglutination, leading to unmanageable vocabulary size, rare words and data sparsity issues. Though this problem has been partly addressed by sub-word algorithms such as BPE, translation systems still lag in their ability to understand sentence and word structures associated with rich morphologies. This paper aims to address these issues by employing linguistically driven sub-word units into NMT systems. This approach is further enhanced by additional POS tag feature inputs. The proposed approach outperforms BPE driven machine translation models by several BLEU points and is also shown to have better recall measures from evaluation by ROUGE metric. The results have been evaluated upon a morphologically complex Dravidian language pair, Kannada and Telugu. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
13812416
Volume :
24
Issue :
4
Database :
Complementary Index
Journal :
International Journal of Speech Technology
Publication Type :
Academic Journal
Accession number :
153652831
Full Text :
https://doi.org/10.1007/s10772-021-09865-5