Start Over

Human–robot interaction-oriented video understanding of human actions.

Authors :: Wang, Bin
Chang, Faliang
Liu, Chunsheng
Wang, Wenqian
Source :: Engineering Applications of Artificial Intelligence. Jul2024:Part A, Vol. 133, pN.PAG-N.PAG. 1p.
Publication Year :: 2024
Abstract: This paper focuses on action recognition tasks oriented to the field of human–robot interaction, which is one of the major challenges in the robotic video understanding field. Previous approaches focus on designing temporal models, lack the ability to capture motion information and build contextual correlation models. This may result in robots being unable to effectively understand long-term video actions. To solve these two problems, this paper propose a novel video understanding framework including: an Adaptive Temporal Sensitivity and Motion Capture Network (ATSMC-Net) and a contextual scene reasoning module called Knowledge Function Graph Module (KFG-Module). The proposed ATSMC-Net can adaptively adjust the frame-level and pixel-level sensitive regions of temporal features to effectively capture motion information. To fuse contextual scene information for cross-temporal inference, the KFG-Module is introduced to achieve fine-grained video understanding based on the relationship between objects and actions. We evaluate the method using three public video understanding benchmarks, including Something-Something-V1&V2 and HMDB51. In addition, we present a dataset with real-world application scenarios of human–robot interactions to verify the effectiveness of our approach on mobile robots. The experimental results show that the proposed method can significantly improve the video understanding of the robots. [ABSTRACT FROM AUTHOR]