Back to Search
Start Over
Hierarchical Transformer Network for Utterance-Level Emotion Recognition
- Source :
- Applied Sciences, Vol 10, Iss 4447, p 4447 (2020), Applied Sciences, Volume 10, Issue 13
- Publication Year :
- 2020
- Publisher :
- MDPI AG, 2020.
-
Abstract
- While there have been significant advances in detecting emotions in text, in the field of utterance-level emotion recognition (ULER), there are still many problems to be solved. In this paper, we address some challenges in ULER in dialog systems. (1) The same utterance can deliver different emotions when it is in different contexts. (2) Long-range contextual information is hard to effectively capture. (3) Unlike the traditional text classification problem, for most datasets of this task, they contain inadequate conversations or speech. (4) To better model the emotional interaction between speakers, speaker information is necessary. To address the problems of (1) and (2), we propose a hierarchical transformer framework (apart from the description of other studies, the &ldquo<br />transformer&rdquo<br />in this paper usually refers to the encoder part of the transformer) with a lower-level transformer to model the word-level input and an upper-level transformer to capture the context of utterance-level embeddings. For problem (3), we use bidirectional encoder representations from transformers (BERT), a pretrained language model, as the lower-level transformer, which is equivalent to introducing external data into the model and solves the problem of data shortage to some extent. For problem (4), we add speaker embeddings to the model for the first time, which enables our model to capture the interaction between speakers. Experiments on three dialog emotion datasets, Friends, EmotionPush, and EmoryNLP, demonstrate that our proposed hierarchical transformer network models obtain competitive results compared with the state-of-the-art methods in terms of the macro-averaged F1-score (macro-F1).
- Subjects :
- FOS: Computer and information sciences
Computer Science - Machine Learning
Sound (cs.SD)
text classification
Computer science
Speech recognition
lcsh:Technology
Computer Science - Sound
law.invention
Machine Learning (cs.LG)
lcsh:Chemistry
0504 sociology
law
Audio and Speech Processing (eess.AS)
emotion recognition
FOS: Electrical engineering, electronic engineering, information engineering
Contextual information
General Materials Science
Emotion recognition
Dialog box
Transformer
Instrumentation
lcsh:QH301-705.5
Network model
Fluid Flow and Transfer Processes
dialog
Computer Science - Computation and Language
lcsh:T
Process Chemistry and Technology
05 social sciences
General Engineering
050401 social sciences methods
050301 education
pretrained model
lcsh:QC1-999
Computer Science Applications
lcsh:Biology (General)
lcsh:QD1-999
lcsh:TA1-2040
transformer
Language model
lcsh:Engineering (General). Civil engineering (General)
0503 education
Encoder
Computation and Language (cs.CL)
Utterance
lcsh:Physics
Electrical Engineering and Systems Science - Audio and Speech Processing
Subjects
Details
- Language :
- English
- ISSN :
- 20763417
- Volume :
- 10
- Issue :
- 4447
- Database :
- OpenAIRE
- Journal :
- Applied Sciences
- Accession number :
- edsair.doi.dedup.....139485a515aa0348abbcc3d74b695c0a