Start Over

Arabic spam tweets classification using deep learning.

Authors :: Kaddoura, Sanaa
Alex, Suja A.
Itani, Maher
Henno, Safaa
AlNashash, Asma
Hemanth, D. Jude
Source :: Neural Computing & Applications. Aug2023, Vol. 35 Issue 23, p17233-17246. 14p.
Publication Year :: 2023
Abstract: With the increased use of social network sites, such as Twitter, attackers exploit these platforms to spread counterfeit content. Such content can be fake advertisements or illegal content. Classifying such content is a challenging task, especially in Arabic. The Arabic language has a complex structure and makes classification tasks more difficult. This paper presents an approach to classifying Arabic tweets using classical machine learning (non-deep machine learning) and deep learning techniques. Tweets corpus were collected through Twitter API and labelled manually to get a reliable dataset. For an efficient classifier, feature extraction is applied to the corpus dataset. Then, two learning techniques are used for each feature extraction technique on the created dataset using N-gram models (uni-gram, bi-gram, and char-gram). The applied classical machine learning algorithms are support vector machines, neural networks, logistics regression, and naïve Bayes. Global vector (GloVe) and fastText learning models are utilised for the deep learning approaches. The Precision, Recall, and F1-score are the suggested performance measures calculated in this paper. Afterwards, the dataset is increased using the synthetic minority oversampling technique class to create a balanced dataset. After applying the classical machine learning models, the experimental results show that the neural network algorithm outperforms the other algorithms. Moreover, the GloVe outperforms the fastText model for the deep learning approach. [ABSTRACT FROM AUTHOR]