Back to Search
Start Over
Long-Form Video Question Answering via Dynamic Hierarchical Reinforced Networks.
- Source :
- IEEE Transactions on Image Processing; Dec2019, Vol. 28 Issue 12, p5939-5952, 14p
- Publication Year :
- 2019
-
Abstract
- Open-ended long-form video question answering is a challenging task in visual information retrieval, which automatically generates a natural language answer from the referenced long-form video contents according to a given question. However, the existing works mainly focus on short-form video question answering, due to the lack of modeling semantic representations from long-form video contents. In this paper, we introduce a dynamic hierarchical reinforced network for open-ended long-form video question answering, which employs an encoder–decoder architecture with a dynamic hierarchical encoder and a reinforced decoder. Concretely, we first propose a frame-level dynamic long-short term memory (LSTM) network with binary segmentation gate to learn frame-level semantic representations according to the given question. We then develop a segment-level highway LSTM network with a question-aware highway gate for segment-level semantic modeling. Furthermore, we devise the reinforced decoder with a hierarchical attention mechanism to generate natural language answers. We construct a large-scale long-form video question answering dataset. The extensive experiments on the long-form dataset and another public short-form dataset show the effectiveness of our method. [ABSTRACT FROM AUTHOR]
Details
- Language :
- English
- ISSN :
- 10577149
- Volume :
- 28
- Issue :
- 12
- Database :
- Complementary Index
- Journal :
- IEEE Transactions on Image Processing
- Publication Type :
- Academic Journal
- Accession number :
- 138433595
- Full Text :
- https://doi.org/10.1109/TIP.2019.2922062