1. QLDT: adaptive Query Learning for HOI Detection via vision-language knowledge Transfer.
- Author
-
Wang, Xincheng, Gao, Yongbin, Yu, Wenjun, Wu, Chenmou, Chen, Mingxuan, Ma, Honglei, and Chen, Zhichao
- Subjects
TRANSFER of training ,SAMPLE size (Statistics) ,OPTIMISM ,ALGORITHMS ,FORECASTING - Abstract
Human-object interaction detection can be mainly categorized into two core problems, namely human-object association detection and interaction understanding. Firstly, for association detection, previous methods tend to directly detect obvious human-object interaction pairs, while ignoring some interaction pairs that may have potential interaction relationships, which is contrary to the actual situation. Secondly, for the interaction understanding problem, traditional methods face the challenges of long-tailed distribution and zero-shot detection, which cannot flexibly deal with complex and changing real-world scenarios. To this end, adaptive Query Learning for HOI Detection via vision-language knowledge Transfer(QLDT) is proposed. Specifically, a two-stage dynamic matching scoring algorithm based on dynamically changing thresholds and scores is designed to explore obscure H-O pairs and labeling to enlarge the sample size. Secondly, a visual-language pre-trained model GLIP (Grounded Language-Image Pre-training), is introduced to enhance the model's interactive comprehension ability, extract the visual and linguistic features of the images through GLIP, minimize the gap with the predicted values using cross-entropy loss, and take the maximum value of the score with the obscure H-O pairs as the final prediction, which ensures the model's positivity. The proposed method shows excellent performance on both HICO-DET and V-COCO datasets, for HICO-DET, QLDT achieved 35.37% mAP on the full category, 30.15% mAP on the rare category, and also improved on all five zero-shot metrics. For V-COCO, 62.74% mAP and 67.71% mAP were achieved under Scenario 1 and 2, respectively. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF