Back to Search Start Over

Contextual semantic embeddings based on fine-tuned AraBERT model for Arabic text multi-class categorization

Authors :
Fatima-Zahra El-Alami
Noureddine En Nahnahi
Said Ouatik El Alaoui
Source :
Journal of King Saud University - Computer and Information Sciences. 34:8422-8428
Publication Year :
2022
Publisher :
Elsevier BV, 2022.

Abstract

Despite that pre-trained word embedding models have advanced a wide range of natural language processing applications, they ignore the contextual information and meaning within the text. In this paper, we investigate the potential of the pre-trained Arabic BERT (Bidirectional Encoder Representations from Transformers) model to learn universal contextualized sentence representations aiming to showcase its usefulness for Arabic text Multi-class categorization. We propose to exploit the pre-trained AraBERT for contextual text representation learning in two different ways, transfer learning model and feature extractor. On the one hand, we employ the Arabic BERT (AraBERT) model after fine-tuning its parameters on the OSAC datasets to transfer its knowledge for the Arabic text categorization. On the other hand, we inquire into AraBERT performance, as a feature extractor model, by combining it with several classifiers, including CNN, LSTM, Bi-LSTM, MLP, and SVM. Finally, we conduct an exhaustive set of experiments comparing two BERT models, namely AraBERT and multilingual BERT. The findings show that the fine-tuned AraBERT model accomplishes state-of-the-art performance results and attains up to 99% in terms of F1-score and accuracy.

Details

ISSN :
13191578
Volume :
34
Database :
OpenAIRE
Journal :
Journal of King Saud University - Computer and Information Sciences
Accession number :
edsair.doi...........886336d5403358dc7678b930c8dd3b3c