Back to Search Start Over

Tri-CLT: Learning Tri-Modal Representations with Contrastive Learning and Transformer for Multimodal Sentiment Recognition.

Authors :
Zhiyong Yang
Zijian Li
Dongdong Zhu
Yu Zhou
Source :
Information Technology & Control; 2024, Vol. 53 Issue 1, p206-219, 14p
Publication Year :
2024

Abstract

Multimodal Sentiment Analysis (MSA) has become an essential area of research to achieve more accurate sentiment analysis by integrating multiple perceptual modalities such as text, vision, and audio. However, most previous studies failed to align the various modalities well and ignored the differences in semantic information, leading to inefficient fusion between modalities and generating redundant information. In order to solve the above problems, this paper proposes a transformer-based network model, Tri-CLT. Specifically, this paper designs Integrating Fusion Block to fuse modal features to enhance their semantic information and mitigate the secondary complexity of paired sequences in the transformer. Meanwhile, the cross-modal attention mechanism is utilized for complementary learning between modalities to enhance the model performance. In addition, contrastive learning is introduced to improve the model's representation of learning ability. Finally, this paper conducts experiments on CMU-MOSEI aligned and unaligned data, and the experimental results show that the proposed method outperforms the existing methods. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
1392124X
Volume :
53
Issue :
1
Database :
Complementary Index
Journal :
Information Technology & Control
Publication Type :
Academic Journal
Accession number :
176240725
Full Text :
https://doi.org/10.5755/j01.itc.53.1.35060