Back to Search Start Over

CCTG-NET: Contextualized Convolutional Transformer-GRU Network for speech emotion recognition.

Authors :
Tellai, Mohammed
Mao, Qirong
Source :
International Journal of Speech Technology; Dec2023, Vol. 26 Issue 4, p1099-1116, 18p
Publication Year :
2023

Abstract

Speech is a crucial aspect of human-to-human interactions and plays a fundamental role in the advancement of human–computer interaction (HCI) systems. Developing an accurate speech emotion recognition (SER) system for human conversations poses a critical yet challenging task. Existing state-of-the-art (SOTA) research in SER primarily focuses on modeling vocal information within individual conversational speech utterances, overlooking the significance of incorporating transactional information from the interaction context. In this paper, we present a novel Contextualized Convolutional Transformer-GRU Network (CCTG-Net) for recognizing speech emotions using Mel-spectrogram features, effectively integrating contextual information for emotion recognition. Our experiments are conducted on the widely-used emotional benchmark dataset, IEMOCAP. Compared to SOTA methods in four-class emotion recognition, our proposed model achieves a weighted accuracy of 88.4% and an unweighted accuracy (UA) of 89.1%. This marks a substantial 3.0% enhancement in UA while maintaining an optimal balance between performance and complexity. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
13812416
Volume :
26
Issue :
4
Database :
Complementary Index
Journal :
International Journal of Speech Technology
Publication Type :
Academic Journal
Accession number :
174762206
Full Text :
https://doi.org/10.1007/s10772-023-10080-7