Start Over

CMHICL:基于跨模态分层交互网络和对比学习的多模态讽刺检测.

Authors :: 林洁霞
 朱小栋
Source :: Application Research of Computers / Jisuanji Yingyong Yanjiu. Sep2024, Vol. 41 Issue 9, p2620-2627. 8p.
Publication Year :: 2024
Abstract: The key to multimodal sarcasm detection is effective to align and fuse the features of different modes. However, the existing multimodal data fusion methods ignore the relationship between multimodal intercomponent structures. Also, the importance of common features associated with sarcastic emotions in multimodal data is overlooked in the process of recognizing sarcasm. To address the above problems, this paper proposed a model based on cross-modal hierarchical interaction networks and contrastive learning (CMHICL). Firstly, the cross-modal hierarchical interaction network employed a minimal unit alignment module based on the cross-attention mechanism and a compositional structure fusion module based on the graph attention network to identify inconsistencies between text and images at different levels, and determined the samples with low consistency as sarcasm samples. Secondly, two contrastive learning tasks, based on data enhancement and category enhancement, helped to learn common features related to sarcasm and reduce false correlations within the modality. The experimental results show that the CMHICL model has increased the Acc by 0.81% and the F₁ value by 1.6% compared to the baseline models, which verifies the key role of the hierarchical interactive network and contrastive learning method proposed in this paper in multimodal sarcasm detection. [ABSTRACT FROM AUTHOR]