Back to Search Start Over

Multi-branch feature learning based speech emotion recognition using SCAR-NET.

Authors :
Mao, Keji
Wang, Yuxiang
Ren, Ligang
Zhang, Jinhong
Qiu, Jiefan
Dai, Guanglin
Source :
Connection Science; Dec2023, Vol. 35 Issue 1, p1-18, 18p
Publication Year :
2023

Abstract

Speech emotion recognition (SER) is an active research area in affective computing. Recognizing emotions from speech signals helps to assess human behaviour, which has promising applications in the area of human-computer interaction. The performance of deep learning-based SER methods relies heavily on feature learning. In this paper, we propose SCAR-NET, an improved convolutional neural network, to extract emotional features from speech signals and implement classification. This work includes two main parts: First, we extract spectral, temporal, and spectral-temporal correlation features through three parallel paths; and then split-convolve-aggregate residual blocks are designed for multi-branch deep feature learning. The features are refined by global average pooling (GAP) and pass through a softmax classifier to generate predictions for different emotions. We also conduct a series of experiments to evaluate the robustness and effectiveness of SCAR-NET which can achieve 96.45%, 83.13%, and 89.93% accuracy on the speech emotion datasets EMO-DB, SAVEE, and RAVDESS. These results show the outperformance of SCAR-NET. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
09540091
Volume :
35
Issue :
1
Database :
Complementary Index
Journal :
Connection Science
Publication Type :
Academic Journal
Accession number :
174546648
Full Text :
https://doi.org/10.1080/09540091.2023.2189217