Start Over

A Multi-Modal ELMo Model for Image Sentiment Recognition of Consumer Data

Authors :: Rong, Lu
Ding, Yijie
Wang, Mengyao
Saddik, Abdulmotaleb El
Hossain, M. Shamim
Source :: IEEE Transactions on Consumer Electronics; February 2024, Vol. 70 Issue: 1 p3697-3708, 12p
Publication Year :: 2024
Abstract: Recent advancements in consumer electronics as well as imaging technology have generated abundant multimodal data for consumer-centric AI applications. Effective analysis and utilization of such heterogeneous data hold great potential for consumption decisions. Hence, effective analysis of multi-modal consumer-generated content is a prominent research topic in the field of customer-centric artificial intelligence (AI). However, two key challenges that arise in this task are multi-modal representation and fusion. To address these issues, we propose a multi-modal embedding from the language model (MELMo) enhanced decision-making model. The main idea is to extend the ELMo to a multi-modal scenario by designing a deep contextualized visual embedding from the language model (VELMo) and modeling multi-modal fusion at the decision level by using the cross-modal attention mechanism. In addition, we also designed a novel multi-task decoder to learn the shared knowledge from related tasks. We evaluate our approach on two benchmark datasets, CMU-MOSI and CMU-MOSEI, and show that MELMo outperforms state-of-the-art approaches. The F1 scores on the CMU-MOSI and CMU-MOSEI datasets reach 86.1% and 85.2%, respectively, representing an improvement of approximately 1.0% and 1.3% over the state-of-the-art system, providing an effective technique for multimodal consumer analytics in electronics and beyond.