1. Analyzing Continuous-Time and Sentence-Level Annotations for Speech Emotion Recognition.
- Author
-
Martinez-Lucas, Luz, Lin, Wei-Cheng, and Busso, Carlos
- Abstract
The emotional content of several databases are annotated with continuous-time (CT) annotations, providing traces with frame-by-frame scores describing the instantaneous value of an emotional attribute. However, having a single score describing the global emotion of a short segment is more convenient for several emotion recognition formulations. A common approach is to derive sentence-level (SL) labels from CT annotations by aggregating the values of the emotional traces across time and annotators. How similar are these aggregated SL labels from labels originally collected at the sentence level? The release of the MSP-Podcast (SL annotations) and MSP-Conversation (CT annotations) corpora provides the resources to explore the validity of aggregating SL labels from CT annotations. There are 2,884 speech segments that belong to both corpora. Using this set, this study (1) compares both types of annotations using statistical metrics, (2) evaluates their inter-evaluator agreements, and (3) explores the effect of these SL labels on speech emotion recognition (SER) tasks. The analysis reveals benefits of using SL labels derived from CT annotations in the estimation of valence. This analysis also provides insights on how the two types of labels differ and how that could affect a model. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF