Back to Search Start Over

On-the-fly Modulation for Balanced Multimodal Learning.

Authors :
Wei Y
Hu D
Du H
Wen JR
Source :
IEEE transactions on pattern analysis and machine intelligence [IEEE Trans Pattern Anal Mach Intell] 2024 Sep 25; Vol. PP. Date of Electronic Publication: 2024 Sep 25.
Publication Year :
2024
Publisher :
Ahead of Print

Abstract

Multimodal learning is expected to boost model performance by integrating information from different modalities. However, its potential is not fully exploited because the widely-used joint training strategy, which has a uniform objective for all modalities, leads to imbalanced and under-optimized uni-modal representations. Specifically, we point out that there often exists modality with more discriminative information, e.g., vision of playing football and sound of blowing wind. They could dominate the joint training process, resulting in other modalities being significantly under-optimized. To alleviate this problem, we first analyze the under-optimized phenomenon from both the feed-forward and the back-propagation stages during optimization. Then, On-the-fly Prediction Modulation (OPM) and On-the-fly Gradient Modulation (OGM) strategies are proposed to modulate the optimization of each modality, by monitoring the discriminative discrepancy between modalities during training. Concretely, OPM weakens the influence of the dominant modality by dropping its feature with dynamical probability in the feed-forward stage, while OGM mitigates its gradient in the back-propagation stage. In experiments, our methods demonstrate considerable improvement across a variety of multimodal tasks. These simple yet effective strategies not only enhance performance in vanilla and task-oriented multimodal models, but also in more complex multimodal tasks, showcasing their effectiveness and flexibility.

Details

Language :
English
ISSN :
1939-3539
Volume :
PP
Database :
MEDLINE
Journal :
IEEE transactions on pattern analysis and machine intelligence
Publication Type :
Academic Journal
Accession number :
39321012
Full Text :
https://doi.org/10.1109/TPAMI.2024.3468315