Back to Search Start Over

ArWordVec: efficient word embedding models for Arabic tweets.

Authors :
Fouad, Mohammed M.
Mahany, Ahmed
Aljohani, Naif
Abbasi, Rabeeh Ayaz
Hassan, Saeed-Ul
Source :
Soft Computing - A Fusion of Foundations, Methodologies & Applications. Jun2020, Vol. 24 Issue 11, p8061-8068. 8p.
Publication Year :
2020

Abstract

One of the major advances in artificial intelligence nowadays is to understand, process and utilize the humans' natural language. This has been achieved by employing the different natural language processing (NLP) techniques along with the aid of the various deep learning approaches and architectures. Using the distributed word representations to substitute the traditional bag-of-words approach has been utilized very efficiently in the last years for many NLP tasks. In this paper, we present the detailed steps of building a set of efficient word embedding models called ArWordVec that are generated from a huge repository of Arabic tweets. In addition, a new method for measuring Arabic word similarity is introduced that has been used in evaluating the performance of the generated ArWordVec models. The experimental results show that the performance of the ArWordVec models overcomes the recently available models on Arabic Twitter data for the word similarity task. In addition, two of the large Arabic tweets datasets are used to examine the performance of the proposed models in the multi-class sentiment analysis task. The results show that the proposed models are very efficient and help in achieving a classification accuracy ratio exceeding 73.86% with a high average F1 value of 74.15. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
14327643
Volume :
24
Issue :
11
Database :
Academic Search Index
Journal :
Soft Computing - A Fusion of Foundations, Methodologies & Applications
Publication Type :
Academic Journal
Accession number :
143056972
Full Text :
https://doi.org/10.1007/s00500-019-04153-6