Back to Search Start Over

Neural text normalization with adapted decoding and POS features

Authors :
Ruzsics, Tatyana
Lusetti, Massimo
Göhring, Anne; https://orcid.org/0000-0003-2629-4476
Samardžić, Tanja
Stark, Elisabeth
Ruzsics, Tatyana
Lusetti, Massimo
Göhring, Anne; https://orcid.org/0000-0003-2629-4476
Samardžić, Tanja
Stark, Elisabeth
Source :
Ruzsics, Tatyana; Lusetti, Massimo; Göhring, Anne; Samardžić, Tanja; Stark, Elisabeth (2019). Neural text normalization with adapted decoding and POS features. Natural Language Engineering, 25(5):585-605.
Publication Year :
2019

Abstract

Text normalization is the task of mapping noncanonical language, typical of speech transcription and computer-mediated communication, to a standardized writing. This task is especially important for languages such as Swiss German, with strong regional variation and no written standard. In this paper, we propose a novel solution for normalizing Swiss German WhatsApp messages using the encoder–decoder neural machine translation (NMT) framework. We enhance the performance of a plain character-level NMT model with the integration of a word-level language model and linguistic features in the form of part-of-speech (POS) tags. The two components are intended to improve the performance by addressing two specific issues: the former is intended to improve the fluency of the predicted sequences, whereas the latter aims at resolving cases of word-level ambiguity. Our systematic comparison shows that our proposed solution results in an improvement over a plain NMT system and also over a comparable character-level statistical machine translation system, considered the state of the art in this task till recently. We perform a thorough analysis of the compared systems’ output, showing that our two components produce indeed the intended, complementary improvements.

Details

Database :
OAIster
Journal :
Ruzsics, Tatyana; Lusetti, Massimo; Göhring, Anne; Samardžić, Tanja; Stark, Elisabeth (2019). Neural text normalization with adapted decoding and POS features. Natural Language Engineering, 25(5):585-605.
Notes :
application/pdf, info:doi/10.5167/uzh-177181, English
Publication Type :
Electronic Resource
Accession number :
edsoai.on1416176902
Document Type :
Electronic Resource