Back to Search
Start Over
Contextual semantic embeddings based on fine-tuned AraBERT model for Arabic text multi-class categorization
- Source :
- Journal of King Saud University - Computer and Information Sciences. 34:8422-8428
- Publication Year :
- 2022
- Publisher :
- Elsevier BV, 2022.
-
Abstract
- Despite that pre-trained word embedding models have advanced a wide range of natural language processing applications, they ignore the contextual information and meaning within the text. In this paper, we investigate the potential of the pre-trained Arabic BERT (Bidirectional Encoder Representations from Transformers) model to learn universal contextualized sentence representations aiming to showcase its usefulness for Arabic text Multi-class categorization. We propose to exploit the pre-trained AraBERT for contextual text representation learning in two different ways, transfer learning model and feature extractor. On the one hand, we employ the Arabic BERT (AraBERT) model after fine-tuning its parameters on the OSAC datasets to transfer its knowledge for the Arabic text categorization. On the other hand, we inquire into AraBERT performance, as a feature extractor model, by combining it with several classifiers, including CNN, LSTM, Bi-LSTM, MLP, and SVM. Finally, we conduct an exhaustive set of experiments comparing two BERT models, namely AraBERT and multilingual BERT. The findings show that the fine-tuned AraBERT model accomplishes state-of-the-art performance results and attains up to 99% in terms of F1-score and accuracy.
- Subjects :
- Word embedding
General Computer Science
business.industry
Computer science
020206 networking & telecommunications
02 engineering and technology
computer.software_genre
Class (biology)
Support vector machine
Categorization
0202 electrical engineering, electronic engineering, information engineering
Feature (machine learning)
020201 artificial intelligence & image processing
Artificial intelligence
business
Transfer of learning
computer
Feature learning
Natural language processing
Sentence
Subjects
Details
- ISSN :
- 13191578
- Volume :
- 34
- Database :
- OpenAIRE
- Journal :
- Journal of King Saud University - Computer and Information Sciences
- Accession number :
- edsair.doi...........886336d5403358dc7678b930c8dd3b3c