Back to Search Start Over

A Multi-Modal ELMo Model for Image Sentiment Recognition of Consumer Data

Authors :
Rong, Lu
Ding, Yijie
Wang, Mengyao
Saddik, Abdulmotaleb El
Hossain, M. Shamim
Source :
IEEE Transactions on Consumer Electronics; February 2024, Vol. 70 Issue: 1 p3697-3708, 12p
Publication Year :
2024

Abstract

Recent advancements in consumer electronics as well as imaging technology have generated abundant multimodal data for consumer-centric AI applications. Effective analysis and utilization of such heterogeneous data hold great potential for consumption decisions. Hence, effective analysis of multi-modal consumer-generated content is a prominent research topic in the field of customer-centric artificial intelligence (AI). However, two key challenges that arise in this task are multi-modal representation and fusion. To address these issues, we propose a multi-modal embedding from the language model (MELMo) enhanced decision-making model. The main idea is to extend the ELMo to a multi-modal scenario by designing a deep contextualized visual embedding from the language model (VELMo) and modeling multi-modal fusion at the decision level by using the cross-modal attention mechanism. In addition, we also designed a novel multi-task decoder to learn the shared knowledge from related tasks. We evaluate our approach on two benchmark datasets, CMU-MOSI and CMU-MOSEI, and show that MELMo outperforms state-of-the-art approaches. The F1 scores on the CMU-MOSI and CMU-MOSEI datasets reach 86.1% and 85.2%, respectively, representing an improvement of approximately 1.0% and 1.3% over the state-of-the-art system, providing an effective technique for multimodal consumer analytics in electronics and beyond.

Details

Language :
English
ISSN :
00983063
Volume :
70
Issue :
1
Database :
Supplemental Index
Journal :
IEEE Transactions on Consumer Electronics
Publication Type :
Periodical
Accession number :
ejs66238365
Full Text :
https://doi.org/10.1109/TCE.2024.3357543