Back to Search
Start Over
Linguistically enhanced word segmentation for better neural machine translation of low resource agglutinative languages.
- Source :
- International Journal of Speech Technology; Dec2021, Vol. 24 Issue 4, p1047-1053, 7p
- Publication Year :
- 2021
-
Abstract
- One of the several challenges faced by neural machine translation systems is the lack of standard parallel corpora for several language pairs. Poor translation qualities often result from inadequate data. Aggravating this problem further are the issues of morphological complexity and agglutination, leading to unmanageable vocabulary size, rare words and data sparsity issues. Though this problem has been partly addressed by sub-word algorithms such as BPE, translation systems still lag in their ability to understand sentence and word structures associated with rich morphologies. This paper aims to address these issues by employing linguistically driven sub-word units into NMT systems. This approach is further enhanced by additional POS tag feature inputs. The proposed approach outperforms BPE driven machine translation models by several BLEU points and is also shown to have better recall measures from evaluation by ROUGE metric. The results have been evaluated upon a morphologically complex Dravidian language pair, Kannada and Telugu. [ABSTRACT FROM AUTHOR]
Details
- Language :
- English
- ISSN :
- 13812416
- Volume :
- 24
- Issue :
- 4
- Database :
- Complementary Index
- Journal :
- International Journal of Speech Technology
- Publication Type :
- Academic Journal
- Accession number :
- 153652831
- Full Text :
- https://doi.org/10.1007/s10772-021-09865-5