Back to Search Start Over

A Study of Discriminatory Speech Classification Based on Improved Smote and SVM-RF

Authors :
Chao Wu
Huijuan Hu
Dingju Zhu
Xilin Shan
Kai-Leung Yung
Andrew W. H. Ip
Source :
Applied Sciences, Vol 14, Iss 15, p 6468 (2024)
Publication Year :
2024
Publisher :
MDPI AG, 2024.

Abstract

The rapid development of the Internet has facilitated expression, sharing, and interaction on social networks, but some speech may contain harmful discrimination. Therefore, it is crucial to classify such speech. In this paper, we collected discriminatory data from Sina Weibo and propose the improved Synthetic Minority Over-sampling Technique (SMOTE) algorithm based on Latent Dirichlet Allocation (LDA) to improve data quality and balance. And we propose a new integration method integrating Support Vector Machine (SVM) and Random Forest (RF). The experimental results demonstrate that the integrated model exhibits enhanced precision, recall, and F1 score by 6.0%, 5.4%, and 5.7%, respectively, in comparison with SVM alone. Moreover, it exhibits the best performance in comparison with other machine learning methods. Furthermore, the positive impact of improved SMOTE and this integrated method on model classification is also confirmed in ablation experiments.

Details

Language :
English
ISSN :
20763417
Volume :
14
Issue :
15
Database :
Directory of Open Access Journals
Journal :
Applied Sciences
Publication Type :
Academic Journal
Accession number :
edsdoj.1666b7e660e74aef9385ce3dd399cb64
Document Type :
article
Full Text :
https://doi.org/10.3390/app14156468