MLIP: Medical Language-Image Pre-training with Masked Local Representation Learning

Authors :: Liu, Jiarun
Zhou, Hong-Yu
Li, Cheng
Huang, Weijian
Yang, Hao
Liang, Yong
Wang, Shanshan
Publication Year :: 2024
Abstract: Existing contrastive language-image pre-training aims to learn a joint representation by matching abundant image-text pairs. However, the number of image-text pairs in medical datasets is usually orders of magnitude smaller than that in natural datasets. Besides, medical image-text pairs often involve numerous complex fine-grained correspondences. This paper aims to enhance the data efficiency by introducing multiple-to-multiple local relationship modeling to capture denser supervisions. More specifically, we propose a Medical Language-Image Pre-training (MLIP) framework, which exploits the limited image-text medical data more efficiently through patch-sentence matching. Furthermore, we introduce a masked contrastive learning strategy with semantic integrity estimation to reduce redundancy in images while preserving the underlying semantics. Our evaluation results show that MLIP outperforms previous work in zero/few-shot classification and few-shot segmentation tasks by a large margin.<br />Comment: 5 pages, 3 figures

Tools