Back to Search Start Over

DeTAL: Open-Vocabulary Temporal Action Localization With Decoupled Networks

Authors :
Li, Zhiheng
Zhong, Yujie
Song, Ran
Li, Tianjiao
Ma, Lin
Zhang, Wei
Source :
IEEE Transactions on Pattern Analysis and Machine Intelligence; December 2024, Vol. 46 Issue: 12 p7728-7741, 14p
Publication Year :
2024

Abstract

Pre-trained visual-language (ViL) models have demonstrated good zero-shot capability in video understanding tasks, where they were usually adapted through fine-tuning or temporal modeling. However, in the task of open-vocabulary temporal action localization (OV-TAL), such adaption reduces the robustness of ViL models against different data distributions, leading to a misalignment between visual representations and text descriptions of unseen action categories. As a result, existing methods often strike a trade-off between action detection and classification. Aiming at this issue, this paper proposes DeTAL, a simple but effective two-stage approach for OV-TAL. DeTAL decouples action detection from action classification to avoid the compromise between them, and the state-of-the-art methods for close-set action localization can be handily adapted to OV-TAL, which significantly improves the performance. Meanwhile, DeTAL can easily tackle the scenario where action category annotations are unavailable in the training dataset. In the experiments, we propose a new cross-dataset setting to evaluate the zero-shot capability of different methods. And the results demonstrate that DeTAL outperforms the state-of-the-art methods for OV-TAL on both THUMOS14 and ActivityNet1.3.

Details

Language :
English
ISSN :
01628828
Volume :
46
Issue :
12
Database :
Supplemental Index
Journal :
IEEE Transactions on Pattern Analysis and Machine Intelligence
Publication Type :
Periodical
Accession number :
ejs67921361
Full Text :
https://doi.org/10.1109/TPAMI.2024.3395778