1. Neural text normalization with adapted decoding and POS features
- Author
-
Ruzsics, Tatyana, Lusetti, Massimo, Göhring, Anne; https://orcid.org/0000-0003-2629-4476, Samardžić, Tanja, Stark, Elisabeth, Ruzsics, Tatyana, Lusetti, Massimo, Göhring, Anne; https://orcid.org/0000-0003-2629-4476, Samardžić, Tanja, and Stark, Elisabeth
- Abstract
Text normalization is the task of mapping noncanonical language, typical of speech transcription and computer-mediated communication, to a standardized writing. This task is especially important for languages such as Swiss German, with strong regional variation and no written standard. In this paper, we propose a novel solution for normalizing Swiss German WhatsApp messages using the encoder–decoder neural machine translation (NMT) framework. We enhance the performance of a plain character-level NMT model with the integration of a word-level language model and linguistic features in the form of part-of-speech (POS) tags. The two components are intended to improve the performance by addressing two specific issues: the former is intended to improve the fluency of the predicted sequences, whereas the latter aims at resolving cases of word-level ambiguity. Our systematic comparison shows that our proposed solution results in an improvement over a plain NMT system and also over a comparable character-level statistical machine translation system, considered the state of the art in this task till recently. We perform a thorough analysis of the compared systems’ output, showing that our two components produce indeed the intended, complementary improvements.
- Published
- 2019