1. Improving Joint Learning of Chest X-Ray and Radiology Report by Word Region Alignment
- Author
-
Ji, Zhanghexuan, Shaikh, Mohammad Abuzar, Moukheiber, Dana, Srihari, Sargur, Peng, Yifan, and Gao, Mingchen
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,Computer Science - Computation and Language ,Artificial Intelligence (cs.AI) ,Computer Science - Artificial Intelligence ,Computer Vision and Pattern Recognition (cs.CV) ,Image and Video Processing (eess.IV) ,Computer Science - Computer Vision and Pattern Recognition ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,FOS: Electrical engineering, electronic engineering, information engineering ,Electrical Engineering and Systems Science - Image and Video Processing ,Computation and Language (cs.CL) ,Machine Learning (cs.LG) - Abstract
Self-supervised learning provides an opportunity to explore unlabeled chest X-rays and their associated free-text reports accumulated in clinical routine without manual supervision. This paper proposes a Joint Image Text Representation Learning Network (JoImTeRNet) for pre-training on chest X-ray images and their radiology reports. The model was pre-trained on both the global image-sentence level and the local image region-word level for visual-textual matching. Both are bidirectionally constrained on Cross-Entropy based and ranking-based Triplet Matching Losses. The region-word matching is calculated using the attention mechanism without direct supervision about their mapping. The pre-trained multi-modal representation learning paves the way for downstream tasks concerning image and/or text encoding. We demonstrate the representation learning quality by cross-modality retrievals and multi-label classifications on two datasets: OpenI-IU and MIMIC-CXR, Comment: 10 Pages, 1 Figure, 3 Tables, Accepted in 12th Machine Learning in Medical Imaging (MLMI 2021) workshop
- Published
- 2021
- Full Text
- View/download PDF