Back to Search
Start Over
Using a PCA-based dataset similarity measure to improve cross-corpus emotion recognition
- Source :
- Computer Speech & Language. 51:1-23
- Publication Year :
- 2018
- Publisher :
- Elsevier BV, 2018.
-
Abstract
- In emotion recognition from speech, huge amounts of training material are needed for the development of classification engines. As most current corpora do not supply enough material, a combination of different datasets is advisable. Unfortunately, data recording is done differently and various emotion elicitation and emotion annotation methods are used. Therefore, a combination of corpora is usually not possible without further effort. The manuscript’s aim is to answer the question which corpora are similar enough to jointly be used as training material. A corpus similarity measure based on PCA-ranked features is presented and similar datasets are identified. To evaluate our method we used nine well-known benchmark corpora and automatically identified a sub-set of six most similar datasets. To test that the identified most similar six datasets influence the classification performance, we conducted several cross-corpora emotion recognition experiments comparing our identified six most similar datasets with other combinations. Our most similar sub-set outperforms all other combinations of corpora, the combination of all nine datasets as well as feature normalization techniques. Also influencing side-effects on the recognition rate were excluded. Finally, the predictive power of our measure is shown: increasing similarity score, expressing decreasing similarity, result in decreasing recognition rates. Thus, our similarity measure answers the question which corpora should be included into joint training.
- Subjects :
- Normalization (statistics)
Measure (data warehouse)
business.industry
Computer science
02 engineering and technology
Similarity measure
computer.software_genre
Theoretical Computer Science
Human-Computer Interaction
030507 speech-language pathology & audiology
03 medical and health sciences
Annotation
Similarity (network science)
0202 electrical engineering, electronic engineering, information engineering
Benchmark (computing)
Feature (machine learning)
020201 artificial intelligence & image processing
Artificial intelligence
Emotion recognition
0305 other medical science
business
computer
Software
Natural language processing
Subjects
Details
- ISSN :
- 08852308
- Volume :
- 51
- Database :
- OpenAIRE
- Journal :
- Computer Speech & Language
- Accession number :
- edsair.doi...........4a5d87dca714a4c0f1799fc9a193a220
- Full Text :
- https://doi.org/10.1016/j.csl.2018.02.002