Deep learning for emotional speech recognition.

Authors :: Alhamada, M. I.
Khalifa, O. O.
Abdalla, A. H.
Shariffudin, Shafinaz Sobihana
Herman, Sukreen Hana
Hashim, Hashimah
Source :: AIP Conference Proceedings; 2020, Vol. 2306 Issue 1, p1-10, 10p
Publication Year :: 2020
Abstract: Emotion speech recognition is a developing field in machine learning. The main purpose of this field is to produce a convenient system that is able to effortlessly communicate and interact with humans. The reliability of the current speech emotion recognition systems is far from being achieved. However, this is a challenging task due to the gap between acoustic features and human emotions, which rely strongly on the discriminative acoustic features extracted for a given recognition task. The speech signals were process with information which is divided into two main categories, linguistic and paralinguistic; emotions belong to the latter tree. The aim of this work is to develop a system that can understand paralinguistic information for paramount better human-machine interactions. A different extracted features like MFCC as well as feature classifications methods like HMM, GMM, LTSTM and ANN were used. In this paper, an improved architecture of CNN for speech emotion recognition were implemented. The main finding that the proposed CNN model achieved 93.96% accuracy rate in detecting emotions. [ABSTRACT FROM AUTHOR]