Back to Search Start Over

Multi-Fusion Residual Memory Network for Multimodal Human Sentiment Comprehension

Authors :
Sijie Mai
Haifeng Hu
Songlong Xing
Jia Xu
Source :
IEEE Transactions on Affective Computing. 13:320-334
Publication Year :
2022
Publisher :
Institute of Electrical and Electronics Engineers (IEEE), 2022.

Abstract

Multimodal human sentiment comprehension refers to recognizing human affection from multiple modalities. There exist two key issues for this problem. Firstly, it is difficult to explore time-dependent interactions between modalities and focus on the important time steps. Secondly, processing the long fused sequence of utterances is susceptible to the forgetting problem due to the long-term temporal dependency. In this paper, we introduce a hierarchical learning architecture to classify utterance-level sentiment. To address the first issue, we perform time-step level fusion to generate fused features for each time step, which explicitly models time-restricted interactions by incorporating information across modalities at the same time step. Furthermore, based on the assumption that acoustic features directly reflect emotional intensity, we pioneer emotion intensity attention to focus on the time steps where emotion changes or intense affections take place. To handle the second issue, we propose Residual Memory Network (RMN) to process the fused sequence. RMN utilizes some techniques such as directly passing the previous state into the next time step, which helps to retain the information from many time steps ago. We show that our method achieves state-of-the-art performance on multiple datasets. Results also suggest that RMN yields competitive performance on sequence modeling tasks.

Details

ISSN :
23719850
Volume :
13
Database :
OpenAIRE
Journal :
IEEE Transactions on Affective Computing
Accession number :
edsair.doi...........8cb2879a2d38d305faa335a490f9f39b