1. Romanian Syllabification Using Deep Neural Networks
- Author
-
Stefan Ruseti, Dragos-Georgian Corlatescu, and Mihai Dascalu
- Subjects
Conditional random field ,business.industry ,Computer science ,Syllabification ,Deep learning ,Part of speech ,computer.software_genre ,Software ,Artificial intelligence ,business ,computer ,Word (computer architecture) ,Natural language processing ,Transformer (machine learning model) ,Block (data storage) - Abstract
Syllabification may be considered trivial for humans, but it can prove to be a challenging task in terms of automated text analysis. In this study, we explore three approaches to syllabify words in Romanian using state-of-the-art deep learning architectures in sequence prediction, namely BiLSTM, CNN, and transformer. In contrast to previous approaches, our models take into account the part of speech of the word, which in return can weigh heavily in situations where words have the same written form, but different syllabification. Our best model obtains an accuracy of approximately 98% using a conditional random field on top of the BiLSTM architecture, surpassing all previous state-of-the-art models. Our model represents a building block for multiple smart learning ecosystems, ranging from better hyphenation software for text evaluation, to text-to-speech and speech-to-text frameworks employed in intelligent houses or personal assistants.
- Published
- 2021
- Full Text
- View/download PDF