Back to Search Start Over

StyLIP: Multi-Scale Style-Conditioned Prompt Learning for CLIP-based Domain Generalization

Authors :
Bose, Shirsha
Fini, Enrico
Jha, Ankit
Singha, Mainak
Banerjee, Biplab
Ricci, Elisa
Publication Year :
2023

Abstract

Large-scale foundation models (e.g., CLIP) have shown promising zero-shot generalization performance on downstream tasks by leveraging carefully designed language prompts. However, despite their success, most prompt learning techniques tend to underperform in the presence of domain shift. Our study addresses this problem and, to improve CLIP's generalization ability across domains, proposes \textsc{StyLIP}, a novel approach for Domain Generalization (DG) based on a domain-agnostic prompt learning strategy. In the absence of explicit domain knowledge, we aim to disentangle the visual style and the content information extracted from the pre-trained CLIP in the prompts so they can be effortlessly adapted to novel domains during inference. Furthermore, we consider a set of style projectors to learn the prompt tokens directly from these multi-scale style features, and the generated prompt embeddings are later fused with the multi-scale visual features learned through a content projector. The projectors are contrastively trained, given CLIP's frozen vision and text encoders. We present extensive experiments in five different DG settings on multiple benchmarks, demonstrating that \textsc{StyLIP} consistently outperforms the relevant state-of-the-art methods.<br />Comment: 23 pages, 7 figures, 9 tables

Details

Database :
arXiv
Publication Type :
Report
Accession number :
edsarx.2302.09251
Document Type :
Working Paper