Back to Search
Start Over
Modality Competition: What Makes Joint Training of Multi-modal Network Fail in Deep Learning? (Provably)
- Publication Year :
- 2022
-
Abstract
- Despite the remarkable success of deep multi-modal learning in practice, it has not been well-explained in theory. Recently, it has been observed that the best uni-modal network outperforms the jointly trained multi-modal network, which is counter-intuitive since multiple signals generally bring more information. This work provides a theoretical explanation for the emergence of such performance gap in neural networks for the prevalent joint training framework. Based on a simplified data distribution that captures the realistic property of multi-modal data, we prove that for the multi-modal late-fusion network with (smoothed) ReLU activation trained jointly by gradient descent, different modalities will compete with each other. The encoder networks will learn only a subset of modalities. We refer to this phenomenon as modality competition. The losing modalities, which fail to be discovered, are the origins where the sub-optimality of joint training comes from. Experimentally, we illustrate that modality competition matches the intrinsic behavior of late-fusion joint training.<br />Comment: 41 pages, 2 figures
- Subjects :
- Computer Science - Machine Learning
Subjects
Details
- Database :
- arXiv
- Publication Type :
- Report
- Accession number :
- edsarx.2203.12221
- Document Type :
- Working Paper