Back to Search
Start Over
DM2S2: Deep Multimodal Sequence Sets With Hierarchical Modality Attention
- Source :
- IEEE Access. 10:120023-120034
- Publication Year :
- 2022
- Publisher :
- Institute of Electrical and Electronics Engineers (IEEE), 2022.
-
Abstract
- There is increasing interest in the use of multimodal data in various web applications, such as digital advertising and e-commerce. Typical methods for extracting important information from multimodal data rely on a mid-fusion architecture that combines the feature representations from multiple encoders. However, as the number of modalities increases, several potential problems with the mid-fusion model structure arise, such as an increase in the dimensionality of the concatenated multimodal features and missing modalities. To address these problems, we propose a new concept that considers multimodal inputs as a set of sequences, namely, deep multimodal sequence sets (DM$^2$S$^2$). Our set-aware concept consists of three components that capture the relationships among multiple modalities: (a) a BERT-based encoder to handle the inter- and intra-order of elements in the sequences, (b) intra-modality residual attention (IntraMRA) to capture the importance of the elements in a modality, and (c) inter-modality residual attention (InterMRA) to enhance the importance of elements with modality-level granularity further. Our concept exhibits performance that is comparable to or better than the previous set-aware models. Furthermore, we demonstrate that the visualization of the learned InterMRA and IntraMRA weights can provide an interpretation of the prediction results.<br />Comment: 12 pages, 3 figures. Accepted by IEEE Access on Nov. 3, 2022
- Subjects :
- FOS: Computer and information sciences
Computer Science - Machine Learning
Computer Science - Computation and Language
General Computer Science
Computer Science - Artificial Intelligence
Computer Vision and Pattern Recognition (cs.CV)
Computer Science - Computer Vision and Pattern Recognition
General Engineering
Multimedia (cs.MM)
Machine Learning (cs.LG)
Artificial Intelligence (cs.AI)
General Materials Science
Electrical and Electronic Engineering
Computation and Language (cs.CL)
Computer Science - Multimedia
Subjects
Details
- ISSN :
- 21693536
- Volume :
- 10
- Database :
- OpenAIRE
- Journal :
- IEEE Access
- Accession number :
- edsair.doi.dedup.....34d42ea27b3d4690625331266be15c89
- Full Text :
- https://doi.org/10.1109/access.2022.3221812