Tri-CLT: Learning Tri-Modal Representations with Contrastive Learning and Transformer for Multimodal Sentiment Recognition.

Authors :: Zhiyong Yang
Zijian Li
Dongdong Zhu
Yu Zhou
Source :: Information Technology & Control; 2024, Vol. 53 Issue 1, p206-219, 14p
Publication Year :: 2024
Abstract: Multimodal Sentiment Analysis (MSA) has become an essential area of research to achieve more accurate sentiment analysis by integrating multiple perceptual modalities such as text, vision, and audio. However, most previous studies failed to align the various modalities well and ignored the differences in semantic information, leading to inefficient fusion between modalities and generating redundant information. In order to solve the above problems, this paper proposes a transformer-based network model, Tri-CLT. Specifically, this paper designs Integrating Fusion Block to fuse modal features to enhance their semantic information and mitigate the secondary complexity of paired sequences in the transformer. Meanwhile, the cross-modal attention mechanism is utilized for complementary learning between modalities to enhance the model performance. In addition, contrastive learning is introduced to improve the model's representation of learning ability. Finally, this paper conducts experiments on CMU-MOSEI aligned and unaligned data, and the experimental results show that the proposed method outperforms the existing methods. [ABSTRACT FROM AUTHOR]

Subjects :: SENTIMENT analysis
LEARNING ability
MULTIMODAL user interfaces
USER-generated content
PROBLEM solving

Full Text Access

Tools