Back to Search
Start Over
Transformer-Based Interactive Multi-Modal Attention Network for Video Sentiment Detection.
- Source :
- Neural Processing Letters; Jun2022, Vol. 54 Issue 3, p1943-1960, 18p
- Publication Year :
- 2022
-
Abstract
- Social media allows users to express opinions in multiple modalities such as text, pictures, and short-videos. Multi-modal sentiment detection can more effectively predict the emotional tendencies expressed by users. Therefore, multi-modal sentiment detection has received extensive attention in recent years. Current works consider utterances from videos as independent modal, ignoring the effective interaction among diffence modalities of a video. To tackle these challenges, we propose transformer-based interactive multi-modal attention network to investigate multi-modal paired attention between multiple modalities and utterances for video sentiment detection. Specifically, we first take a series of utterances as input and use three separate transformer encoders to capture the utterances-level features of each modality. Subsequently, we introduced multimodal paired attention mechanisms to learn the cross-modality information between multiple modalities and utterances. Finally, we inject the cross-modality information into the multi-headed self-attention layer for making final emotion and sentiment classification. Our solutions outperform baseline models on three multi-modal datasets. [ABSTRACT FROM AUTHOR]
- Subjects :
- SENTIMENT analysis
MACHINE learning
SOCIAL media
VIDEOS
Subjects
Details
- Language :
- English
- ISSN :
- 13704621
- Volume :
- 54
- Issue :
- 3
- Database :
- Complementary Index
- Journal :
- Neural Processing Letters
- Publication Type :
- Academic Journal
- Accession number :
- 157134878
- Full Text :
- https://doi.org/10.1007/s11063-021-10713-5