1. An Efficient Transformer-Based Model for Vietnamese Punctuation Prediction
- Author
-
Binh T. Nguyen, Quang Pham, Cuong V. Dinh, and Hieu Vu Tran
- Subjects
Computer science ,business.industry ,media_common.quotation_subject ,Vietnamese ,computer.software_genre ,Punctuation ,language.human_language ,language ,Artificial intelligence ,Transfer of learning ,business ,computer ,Natural language processing ,Transformer (machine learning model) ,media_common - Abstract
In both formal and informal texts, missing punctuation marks make the texts confusing and challenging to read. This paper aims to conduct exhaustive experiments to investigate the benefits of the pre-trained Transformer-based models on two Vietnamese punctuation datasets. The experimental results show our models can achieve encouraging results, and adding Bi-LSTM or/and CRF layers on top of the proposed models can also boost model performance. Finally, our best model can significantly bypass state-of-the-art approaches on both the novel and news datasets for the Vietnamese language. It can gain the corresponding performance up to \(21.45\%\) and \(18.27\%\) in the overall F1-scores.
- Published
- 2021
- Full Text
- View/download PDF