Back to Search Start Over

Using a PCA-based dataset similarity measure to improve cross-corpus emotion recognition

Authors :
Andreas Wendemuth
Ronald Böck
Ingo Siegert
Source :
Computer Speech & Language. 51:1-23
Publication Year :
2018
Publisher :
Elsevier BV, 2018.

Abstract

In emotion recognition from speech, huge amounts of training material are needed for the development of classification engines. As most current corpora do not supply enough material, a combination of different datasets is advisable. Unfortunately, data recording is done differently and various emotion elicitation and emotion annotation methods are used. Therefore, a combination of corpora is usually not possible without further effort. The manuscript’s aim is to answer the question which corpora are similar enough to jointly be used as training material. A corpus similarity measure based on PCA-ranked features is presented and similar datasets are identified. To evaluate our method we used nine well-known benchmark corpora and automatically identified a sub-set of six most similar datasets. To test that the identified most similar six datasets influence the classification performance, we conducted several cross-corpora emotion recognition experiments comparing our identified six most similar datasets with other combinations. Our most similar sub-set outperforms all other combinations of corpora, the combination of all nine datasets as well as feature normalization techniques. Also influencing side-effects on the recognition rate were excluded. Finally, the predictive power of our measure is shown: increasing similarity score, expressing decreasing similarity, result in decreasing recognition rates. Thus, our similarity measure answers the question which corpora should be included into joint training.

Details

ISSN :
08852308
Volume :
51
Database :
OpenAIRE
Journal :
Computer Speech & Language
Accession number :
edsair.doi...........4a5d87dca714a4c0f1799fc9a193a220
Full Text :
https://doi.org/10.1016/j.csl.2018.02.002