Back to Search
Start Over
Multimodal Deep Convolutional Neural Network for Audio-Visual Emotion Recognition
- Source :
- ICMR
- Publication Year :
- 2016
- Publisher :
- ACM, 2016.
-
Abstract
- Emotion recognition is a challenging task because of the emotional gap between subjective emotion and the low-level audio-visual features. Inspired by the recent success of deep learning in bridging the semantic gap, this paper proposes to bridge the emotional gap based on a multimodal Deep Convolution Neural Network (DCNN), which fuses the audio and visual cues in a deep model. This multimodal DCNN is trained with two stages. First, two DCNN models pre-trained on large-scale image data are fine-tuned to perform audio and visual emotion recognition tasks respectively on the corresponding labeled speech and face data. Second, the outputs of these two DCNNs are integrated in a fusion network constructed by a number of fully-connected layers. The fusion network is trained to obtain a joint audio-visual feature representation for emotion recognition. Experimental results on the RML audio-visual database demonstrates the promising performance of the proposed method. To the best of our knowledge, this is an early work fusing audio and visual cues in DCNN for emotion recognition. Its success guarantees further research in this direction.
- Subjects :
- Bridging (networking)
Computer science
business.industry
Speech recognition
Deep learning
020206 networking & telecommunications
Pattern recognition
02 engineering and technology
Two stages
Convolutional neural network
ComputerApplications_MISCELLANEOUS
Audio visual
0202 electrical engineering, electronic engineering, information engineering
020201 artificial intelligence & image processing
Emotion recognition
Artificial intelligence
business
Sensory cue
Semantic gap
Subjects
Details
- Database :
- OpenAIRE
- Journal :
- Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval
- Accession number :
- edsair.doi...........f57a9e606b3fdd9f9ea613bd11c60465
- Full Text :
- https://doi.org/10.1145/2911996.2912051