Speech emotion recognition via ensembling neural networks

Authors :: Luo, Danqing
Zou, Yuexian
Huang, Dongyan
Luo, Danqing
Zou, Yuexian
Huang, Dongyan
Publication Year :: 2017
Abstract: Deep Neural Network (DNN) based speech emotion recognition (SER) methods have demonstrated competitive performance compared to traditional SER approaches. However, from literatures, it can be seen that the confusion matrices of different SER methods varied a lot, which indicates that different DNN architecture has different capability of modeling different emotion cues from speech. It also means that single classifier hardly performs well on all speech emotion categories, which may be possibly due to data imbalance and the limitation of classifier. Motivated by the improved research results of ensemble learning, this paper investigates an ensemble method for SER via aggregating results from several base classifiers. In this study, considering the outstanding performance of Recurrent Neural Network (RNN) in different speech tasks and Residual network (ResNet) in image related classification, we chose RNN and ResNet acting as base classifiers. Experiments show that our proposed ensemble SER system outperforms the state-of-art single classifier-based SER system.

Tools