Start Over

Towards Better Caption Supervision for Object Detection.

Authors :: Chen, Changjian
Wu, Jing
Wang, Xiaohan
Xiang, Shouxing
Zhang, Song-Hai
Tang, Qifeng
Liu, Shixia
Source :: IEEE Transactions on Visualization & Computer Graphics; Apr2022, Vol. 284, p1941-1954, 14p
Publication Year :: 2022
Abstract: As training high-performance object detectors requires expensive bounding box annotations, recent methods resort to free-available image captions. However, detectors trained on caption supervision perform poorly because captions are usually noisy and cannot provide precise location information. To tackle this issue, we present a visual analysis method, which tightly integrates caption supervision with object detection to mutually enhance each other. In particular, object labels are first extracted from captions, which are utilized to train the detectors. Then, the objects detected from images are fed into caption supervision for further improvement. To effectively loop users into the object detection process, a node-link-based set visualization supported by a multi-type relational co-clustering algorithm is developed to explain the relationships between the extracted labels and the images with detected objects. The co-clustering algorithm clusters labels and images simultaneously by utilizing both their representations and their relationships. Quantitative evaluations and a case study are conducted to demonstrate the efficiency and effectiveness of the developed method in improving the performance of object detectors. [ABSTRACT FROM AUTHOR]