Back to Search
Start Over
Kernel Masked Image Modeling Through the Lens of Theoretical Understanding.
- Source :
-
IEEE transactions on neural networks and learning systems [IEEE Trans Neural Netw Learn Syst] 2024 Aug 27; Vol. PP. Date of Electronic Publication: 2024 Aug 27. - Publication Year :
- 2024
- Publisher :
- Ahead of Print
-
Abstract
- Masked image modeling (MIM) has been considered as the state-of-the-art (SOTA) self-supervised learning (SSL) technique in terms of visual pretraining. The impressive generalization ability of MIM also paves the way for the remarkable success of large-scale vision foundation models. In this article, we further discuss the validity and advantages of implementing MIM techniques in the reproducing kernel Hilbert spaces (RKHSs) and we associate the analysis with a novel MIM method named R-MIM (short for RKHS-MIM). Through the careful construction of an augmentation graph and by using spectral decomposition techniques, we establish a systematic theoretical understanding between the proposed R-MIM's generalization ability and the choice of kernel function used during training. Specifically, we reach a conclusion that both of the local Lipschitz constant of the resultant R-MIM model and the corresponding expected pretraining error can have a strong composite effect on bounding downstream task error, depending on the kernel options. We demonstrate that under mild mathematical assumptions, R-MIM method is guaranteed to return a lower bound on downstream tasks in comparison to vanilla MIM techniques, such as masked autoencoder (MAE) and SimMIM. Empirical justification well corroborates our theoretical hypothesis and analysis in showing the superior generalization of the proposed R-MIM and the theoretical link to kernel choices. The code is available at: https://github.com/yurui-q/R-MIM.
Details
- Language :
- English
- ISSN :
- 2162-2388
- Volume :
- PP
- Database :
- MEDLINE
- Journal :
- IEEE transactions on neural networks and learning systems
- Publication Type :
- Academic Journal
- Accession number :
- 39190525
- Full Text :
- https://doi.org/10.1109/TNNLS.2024.3443088