Back to Search
Start Over
CCTG-NET: Contextualized Convolutional Transformer-GRU Network for speech emotion recognition.
- Source :
- International Journal of Speech Technology; Dec2023, Vol. 26 Issue 4, p1099-1116, 18p
- Publication Year :
- 2023
-
Abstract
- Speech is a crucial aspect of human-to-human interactions and plays a fundamental role in the advancement of human–computer interaction (HCI) systems. Developing an accurate speech emotion recognition (SER) system for human conversations poses a critical yet challenging task. Existing state-of-the-art (SOTA) research in SER primarily focuses on modeling vocal information within individual conversational speech utterances, overlooking the significance of incorporating transactional information from the interaction context. In this paper, we present a novel Contextualized Convolutional Transformer-GRU Network (CCTG-Net) for recognizing speech emotions using Mel-spectrogram features, effectively integrating contextual information for emotion recognition. Our experiments are conducted on the widely-used emotional benchmark dataset, IEMOCAP. Compared to SOTA methods in four-class emotion recognition, our proposed model achieves a weighted accuracy of 88.4% and an unweighted accuracy (UA) of 89.1%. This marks a substantial 3.0% enhancement in UA while maintaining an optimal balance between performance and complexity. [ABSTRACT FROM AUTHOR]
Details
- Language :
- English
- ISSN :
- 13812416
- Volume :
- 26
- Issue :
- 4
- Database :
- Complementary Index
- Journal :
- International Journal of Speech Technology
- Publication Type :
- Academic Journal
- Accession number :
- 174762206
- Full Text :
- https://doi.org/10.1007/s10772-023-10080-7