Start Over

Cross-Corpus Multilingual Speech Emotion Recognition: Amharic vs. Other Languages

Authors :: Ephrem Afele Retta
Richard Sutcliffe
Jabar Mahmood
Michael Abebe Berwo
Eiad Almekhlafi
Sajjad Ahmad Khan
Shehzad Ashraf Chaudhry
Mustafa Mhamed
Jun Feng
Source :: Applied Sciences, Vol 13, Iss 23, p 12587 (2023)
Publication Year :: 2023
Publisher :: MDPI AG, 2023.
Abstract: In a conventional speech emotion recognition (SER) task, a classifier for a given language is trained on a pre-existing dataset for that same language. However, where training data for a language do not exist, data from other languages can be used instead. We experiment with cross-lingual and multilingual SER, working with Amharic, English, German, and Urdu. For Amharic, we use our own publicly available Amharic Speech Emotion Dataset (ASED). For English, German and Urdu, we use the existing RAVDESS, EMO-DB, and URDU datasets. We followed previous research in mapping labels for all of the datasets to just two classes: positive and negative. Thus, we can compare performance on different languages directly and combine languages for training and testing. In Experiment 1, monolingual SER trials were carried out using three classifiers, AlexNet, VGGE (a proposed variant of VGG), and ResNet50. The results, averaged for the three models, were very similar for ASED and RAVDESS, suggesting that Amharic and English SER are equally difficult. Similarly, German SER is more difficult, and Urdu SER is easier. In Experiment 2, we trained on one language and tested on another, in both directions for each of the following pairs: Amharic↔German, Amharic↔English, and Amharic↔Urdu. The results with Amharic as the target suggested that using English or German as the source gives the best result. In Experiment 3, we trained on several non-Amharic languages and then tested on Amharic. The best accuracy obtained was several percentage points greater than the best accuracy in Experiment 2, suggesting that a better result can be obtained when using two or three non-Amharic languages for training than when using just one non-Amharic language. Overall, the results suggest that cross-lingual and multilingual training can be an effective strategy for training an SER classifier when resources for a language are scarce.

Subjects :: speech emotion recognition
multilingual
cross-lingual
feature extraction
Technology
Engineering (General). Civil engineering (General)
TA1-2040
Biology (General)
QH301-705.5
Physics
QC1-999
Chemistry
QD1-999

Details

Language :: English
ISSN :: 20763417
Volume :: 13
Issue :: 23
Database :: Directory of Open Access Journals
Journal :: Applied Sciences
Publication Type :: Academic Journal
Accession number :: edsdoj.95ae339e168a4471be069f63e45f3bff
Document Type :: article
Full Text :: https://doi.org/10.3390/app132312587

Full Text Access

View/download PDF

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Cross-Corpus Multilingual Speech Emotion Recognition: Amharic vs. Other Languages

Abstract

Subjects

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Cross-Corpus Multilingual Speech Emotion Recognition: Amharic vs. Other Languages

Abstract

Subjects

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources