Back to Search Start Over

Long-Form Video Question Answering via Dynamic Hierarchical Reinforced Networks.

Authors :
Zhao, Zhou
Zhang, Zhu
Xiao, Shuwen
Xiao, Zhenxin
Yan, Xiaohui
Yu, Jun
Cai, Deng
Wu, Fei
Source :
IEEE Transactions on Image Processing; Dec2019, Vol. 28 Issue 12, p5939-5952, 14p
Publication Year :
2019

Abstract

Open-ended long-form video question answering is a challenging task in visual information retrieval, which automatically generates a natural language answer from the referenced long-form video contents according to a given question. However, the existing works mainly focus on short-form video question answering, due to the lack of modeling semantic representations from long-form video contents. In this paper, we introduce a dynamic hierarchical reinforced network for open-ended long-form video question answering, which employs an encoder–decoder architecture with a dynamic hierarchical encoder and a reinforced decoder. Concretely, we first propose a frame-level dynamic long-short term memory (LSTM) network with binary segmentation gate to learn frame-level semantic representations according to the given question. We then develop a segment-level highway LSTM network with a question-aware highway gate for segment-level semantic modeling. Furthermore, we devise the reinforced decoder with a hierarchical attention mechanism to generate natural language answers. We construct a large-scale long-form video question answering dataset. The extensive experiments on the long-form dataset and another public short-form dataset show the effectiveness of our method. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
10577149
Volume :
28
Issue :
12
Database :
Complementary Index
Journal :
IEEE Transactions on Image Processing
Publication Type :
Academic Journal
Accession number :
138433595
Full Text :
https://doi.org/10.1109/TIP.2019.2922062