Back to Search
Start Over
Reasoning Step-by-Step: Temporal Sentence Localization in Videos via Deep Rectification-Modulation Network
- Source :
- COLING
- Publication Year :
- 2020
- Publisher :
- International Committee on Computational Linguistics, 2020.
-
Abstract
- Temporal sentence localization in videos aims to ground the best matched segment in an untrimmed video according to a given sentence query. Previous works in this field mainly rely on attentional frameworks to align the temporal boundaries by a soft selection. Although they focus on the visual content relevant to the query, these single-step attention are insufficient to model complex video contents and restrict the higher-level reasoning demand for this task. In this paper, we propose a novel deep rectification-modulation network (RMN), transforming this task into a multi-step reasoning process by repeating rectification and modulation. In each rectification-modulation layer, unlike existing methods directly conducting the cross-modal interaction, we first devise a rectification module to correct implicit attention misalignment which focuses on the wrong position during the cross-interaction process. Then, a modulation module is developed to capture the frame-to-frame relation with the help of sentence information for better correlating and composing the video contents over time. With multiple such layers cascaded in depth, our RMN progressively refines video and query interactions, thus enabling a further precise localization. Experimental evaluations on three public datasets show that the proposed method achieves state-of-the-art performance. Extensive ablation studies are carried out for the comprehensive analysis of the proposed method.
- Subjects :
- Relation (database)
business.industry
Computer science
ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION
Process (computing)
Pattern recognition
02 engineering and technology
010501 environmental sciences
01 natural sciences
Field (computer science)
Task (project management)
0202 electrical engineering, electronic engineering, information engineering
Selection (linguistics)
020201 artificial intelligence & image processing
Artificial intelligence
Layer (object-oriented design)
Focus (optics)
business
Sentence
0105 earth and related environmental sciences
Subjects
Details
- Database :
- OpenAIRE
- Journal :
- Proceedings of the 28th International Conference on Computational Linguistics
- Accession number :
- edsair.doi...........924cce67a10a3401f7ff771a5b18a09a
- Full Text :
- https://doi.org/10.18653/v1/2020.coling-main.167