Back to Search Start Over

Finding Sami Cognates with a Character-Based NMT Approach

Authors :
Jack Rueter
Mika Hämäläinen
Arppe, Antti
Good, Jeff
Hulden, Mans
Lachler, Jordan
Palmer, Alexis
Schwartz, Lane
Silfverberg, Miikka
Department of Digital Humanities
Language Technology
Department of Modern Languages 2010-2017
Source :
Proceedings of the Workshop on Computational Methods for Endangered Languages.
Publication Year :
2019
Publisher :
University of Colorado at Boulder, 2019.

Abstract

We approach the problem of expanding the set of cognate relations with a sequence-to-sequence NMT model. The language pair of interest, Skolt Sami and North Sami, has too limited a set of parallel data for an NMT model as such. We solve this problem on the one hand, by training the model with North Sami cognates with other Uralic languages and, on the other, by generating more synthetic training data with an SMT model. The cognates found using our method are made publicly available in the Online Dictionary of Uralic Languages.

Details

Database :
OpenAIRE
Journal :
Proceedings of the Workshop on Computational Methods for Endangered Languages
Accession number :
edsair.doi.dedup.....38624283e4f56576c363e028ef7d05c1
Full Text :
https://doi.org/10.33011/computel.v1i.395