1. Joint-Embedding Predictive Architecture for Self-Supervised Learning of Mask Classification Architecture
- Author
-
Kim, Dong-Hee, Cho, Sungduk, Cho, Hyeonwoo, Park, Chanmin, Kim, Jinyoung, and Kim, Won Hwa
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
In this work, we introduce Mask-JEPA, a self-supervised learning framework tailored for mask classification architectures (MCA), to overcome the traditional constraints associated with training segmentation models. Mask-JEPA combines a Joint Embedding Predictive Architecture with MCA to adeptly capture intricate semantics and precise object boundaries. Our approach addresses two critical challenges in self-supervised learning: 1) extracting comprehensive representations for universal image segmentation from a pixel decoder, and 2) effectively training the transformer decoder. The use of the transformer decoder as a predictor within the JEPA framework allows proficient training in universal image segmentation tasks. Through rigorous evaluations on datasets such as ADE20K, Cityscapes and COCO, Mask-JEPA demonstrates not only competitive results but also exceptional adaptability and robustness across various training scenarios. The architecture-agnostic nature of Mask-JEPA further underscores its versatility, allowing seamless adaptation to various mask classification family., Comment: 27 pages, 5 figures
- Published
- 2024