Back to Search
Start Over
Deep Visual Attributes vs. Hand-Crafted Audio Features on Multidomain Speech Emotion Recognition
- Source :
- Computation, Vol 5, Iss 2, p 26 (2017), Computation; Volume 5; Issue 2; Pages: 26, Computation
- Publication Year :
- 2017
- Publisher :
- MDPI AG, 2017.
-
Abstract
- Emotion recognition from speech may play a crucial role in many applications related to human–computer interaction or understanding the affective state of users in certain tasks, where other modalities such as video or physiological parameters are unavailable. In general, a human’s emotions may be recognized using several modalities such as analyzing facial expressions, speech, physiological parameters (e.g., electroencephalograms, electrocardiograms) etc. However, measuring of these modalities may be difficult, obtrusive or require expensive hardware. In that context, speech may be the best alternative modality in many practical applications. In this work we present an approach that uses a Convolutional Neural Network (CNN) functioning as a visual feature extractor and trained using raw speech information. In contrast to traditional machine learning approaches, CNNs are responsible for identifying the important features of the input thus, making the need of hand-crafted feature engineering optional in many tasks. In this paper no extra features are required other than the spectrogram representations and hand-crafted features were only extracted for validation purposes of our method. Moreover, it does not require any linguistic model and is not specific to any particular language. We compare the proposed approach using cross-language datasets and demonstrate that it is able to provide superior results vs. traditional ones that use hand-crafted features.
- Subjects :
- Feature engineering
General Computer Science
Speech recognition
Context (language use)
02 engineering and technology
Convolutional neural network
lcsh:QA75.5-76.95
Theoretical Computer Science
030507 speech-language pathology & audiology
03 medical and health sciences
emotion recognition
convolutional neural networks
0202 electrical engineering, electronic engineering, information engineering
Feature (machine learning)
spectrograms
Facial expression
Modalities
Modality (human–computer interaction)
Applied Mathematics
020206 networking & telecommunications
Modeling and Simulation
Spectrogram
lcsh:Electronic computers. Computer science
0305 other medical science
Psychology
Subjects
Details
- Language :
- English
- ISSN :
- 20793197
- Volume :
- 5
- Issue :
- 2
- Database :
- OpenAIRE
- Journal :
- Computation
- Accession number :
- edsair.doi.dedup.....89eecad296021bbe486da37636115b3b