Back to Search
Start Over
Speech Emotion Recognition Based on Temporal-Spatial Learnable Graph Convolutional Neural Network.
- Source :
- Electronics (2079-9292); Jun2024, Vol. 13 Issue 11, p2010, 16p
- Publication Year :
- 2024
-
Abstract
- The Graph Convolutional Neural Networks (GCN) method has shown excellent performance in the field of deep learning, and using graphs to represent speech data is a computationally efficient and scalable approach. In order to enhance the adequacy of graph neural networks in extracting speech emotional features, this paper proposes a Temporal-Spatial Learnable Graph Convolutional Neural Network (TLGCNN) for speech emotion recognition. TLGCNN firstly utilizes the Open-SMILE toolkit to extract frame-level speech emotion features. Then, a bidirectional long short-term memory (Bi LSTM) network is used to process the long-term dependencies of speech features which can further extract deep frame-level emotion features. The extracted frame-level emotion features are then input into subsequent network through two pathways. Finally, one pathway constructs the extracted frame-level deep emotion feature vectors into a graph structure applying an adaptive adjacency matrix to catch latent spatial connections, while the other pathway concatenates emotion feature vectors with graph-level embedding obtained from learnable graph convolutional neural network for prediction and classification. Through these two pathways, TLGCNN can simultaneously obtain temporal speech emotional information through Bi-LSTM and spatial speech emotional information through Learnable Graph Convolutional Neural (LGCN) network. Experimental results demonstrate that this method achieves weighted accuracy of 66.82% and 58.35% on the IEMOCAP and MSP-IMPROV databases, respectively. [ABSTRACT FROM AUTHOR]
Details
- Language :
- English
- ISSN :
- 20799292
- Volume :
- 13
- Issue :
- 11
- Database :
- Complementary Index
- Journal :
- Electronics (2079-9292)
- Publication Type :
- Academic Journal
- Accession number :
- 177857116
- Full Text :
- https://doi.org/10.3390/electronics13112010