Revisiting NMT for normalization of early English letters

Authors :: Jack Rueter
Mika Hämäläinen
Eetu Mäkelä
Tanja Säily
Jörg Tiedemann
Alex, Beatrice
Degaetano-Ortlieb, Stefania
Kazantseva, Anna
Reiter, Nils
Szpakowicz, Stan
Department of Digital Humanities
Language Technology
English Philology
Department of Languages
Department of Modern Languages 2010-2017
Digital Humanities
Human Sciences – Computing Interaction
Mind and Matter
Source :: LaTeCH@NAACL-HLT
Publication Year :: 2019
Publisher :: The Association for Computational Linguistics, 2019.
Abstract: This paper studies the use of NMT (neural machine translation) as a normalization method for an early English letter corpus. The corpus has previously been normalized so that only less frequent deviant forms are left out without normalization. This paper discusses different methods for improving the normalization of these deviant forms by using different approaches. Adding features to the training data is found to be unhelpful, but using a lexicographical resource to filter the top candidates produced by the NMT model together with lemmatization improves results.

Tools