Back to Search Start Over

STCM-Net: A symmetrical one-stage network for temporal language localization in videos

Authors :
Minglin Dong
Jingyu Ru
Sikai Yang
Chunbo Li
Lele Xue
Zixi Jia
Source :
Neurocomputing. 471:194-207
Publication Year :
2022
Publisher :
Elsevier BV, 2022.

Abstract

The task of temporal language localization in the video is to locate a video segment through natural language description for an untrimmed video. Compared with the general video localization task, it is more flexible and complex, which can accurately locate various scenes described by any natural language without making video labels in advance. It can be widely used for the field such as video retrieval and robot intelligent cognition. The main challenges of this task are the extraction of sentence semantics and the integration of contextual information in videos. Among them, contextual video integration can be optimized through the two-dimensional temporal adjacent network. Therefore, complete extraction of the potential information in the query sentence is necessary to solve the task more granularly. At the same time, we found a large amount of time-related information in the query sentence, which helps improve the localization accuracy. Thus, in this paper, we first define the time concept in a sentence and then propose a Sentence Time Concept Mining Network (STCM-Net), an symmetrical one-stage network. Can effectively extract the time concept contained in the query sentence, it can optimize the process of target localization and improve the localization performance. We also evaluate the proposed STCM-Net on three challenging public benchmarks: Charades-STA, ActivityNet Captions, and TACoS. Our STCM-Net gets encouraging improvements compared with the state-of-the-art approaches.

Details

ISSN :
09252312
Volume :
471
Database :
OpenAIRE
Journal :
Neurocomputing
Accession number :
edsair.doi...........6c8f63ec13e794c8523b779248b6470c