Back to Search
Start Over
Multimodal Emotion Recognition on RAVDESS Dataset Using Transfer Learning
- Source :
- Sensors, Vol 21, Iss 7665, p 7665 (2021), Digibug. Repositorio Institucional de la Universidad de Granada, instname, Sensors (Basel, Switzerland), Sensors, Volume 21, Issue 22
-
Abstract
- Emotion Recognition is attracting the attention of the research community due to the multiple areas where it can be applied, such as in healthcare or in road safety systems. In this paper, we propose a multimodal emotion recognition system that relies on speech and facial information. For the speech-based modality, we evaluated several transfer-learning techniques, more specifically, embedding extraction and Fine-Tuning. The best accuracy results were achieved when we fine-tuned the CNN-14 of the PANNs framework, confirming that the training was more robust when it did not start from scratch and the tasks were similar. Regarding the facial emotion recognizers, we propose a framework that consists of a pre-trained Spatial Transformer Network on saliency maps and facial images followed by a bi-LSTM with an attention mechanism. The error analysis reported that the frame-based systems could present some problems when they were used directly to solve a videobased task despite the domain adaptation, which opens a new line of research to discover new ways to correct this mismatch and take advantage of the embedded knowledge of these pre-trained models. Finally, from the combination of these two modalities with a late fusion strategy, we achieved 80.08% accuracy on the RAVDESS dataset on a subject-wise 5-CV evaluation, classifying eight emotions. The results revealed that these modalities carry relevant information to detect users’ emotional state and their combination enables improvement of system performance.
- Subjects :
- Speech emotion recognition
Computer science
spatial transformers
Emotions
System safety
02 engineering and technology
TP1-1185
transfer learning
Machine learning
computer.software_genre
Computational paralinguistics
Biochemistry
Article
Analytical Chemistry
Task (project management)
Machine Learning
Human-computer-interaction
020204 information systems
human–computer interaction
0202 electrical engineering, electronic engineering, information engineering
Learning
Speech
Facial emotion recognition
Electrical and Electronic Engineering
Instrumentation
Spatial transformers
Transformer (machine learning model)
facial emotion recognition
computational paralinguistics
Modalities
Modality (human–computer interaction)
business.industry
Audio–visual emotion recognition
Chemical technology
Frame (networking)
Atomic and Molecular Physics, and Optics
Transfer learning
speech emotion recognition
Embedding
020201 artificial intelligence & image processing
Neural Networks, Computer
Artificial intelligence
Transfer of learning
business
computer
audio–visual emotion recognition
Subjects
Details
- Language :
- English
- ISSN :
- 14248220
- Volume :
- 21
- Issue :
- 22
- Database :
- OpenAIRE
- Journal :
- Sensors
- Accession number :
- edsair.doi.dedup.....520af49bbe38eb33a0635a3f75be544b
- Full Text :
- https://doi.org/10.3390/s21227665