1. Symbiotic Attention for Egocentric Action Recognition With Object-Centric Alignment
- Author
-
Yi Yang, Yu Wu, Xiaohan Wang, and Linchao Zhu
- Subjects
business.industry ,Computer science ,Applied Mathematics ,Object (grammar) ,Verb ,02 engineering and technology ,0801 Artificial Intelligence and Image Processing, 0806 Information Systems, 0906 Electrical and Electronic Engineering ,Computational Theory and Mathematics ,Discriminative model ,Artificial Intelligence ,Human–computer interaction ,Noun ,0202 electrical engineering, electronic engineering, information engineering ,Feature (machine learning) ,Action recognition ,Artificial Intelligence & Image Processing ,020201 artificial intelligence & image processing ,Computer Vision and Pattern Recognition ,Artificial intelligence ,business ,Software - Abstract
In this paper, we propose to tackle egocentric action recognition by suppressing background distractors and enhancing action-relevant interactions. The existing approaches usually utilize two independent branches to recognize egocentric actions, i.e., a verb branch and a noun branch. However, the mechanism to suppress distracting objects and exploit local human-object correlations is missing. To this end, we introduce two extra sources of information, i.e., the candidate objects' spatial location and their discriminative features, to enable concentration on the occurring interactions. We design a Symbiotic Attention withObject-centric featureAlignmentframework (SAOA) to provide meticulous reasoning between the actor and the environment. First, we introduce an object-centric feature alignment method to inject the local object features to the verb branch and noun branch. Second, we propose a symbiotic attention mechanism to encourage the mutual interaction between the two branches and select the most action-relevant candidates for classification. The framework benefits from the communication among the verb branch, the noun branch, and the local object information. Experiments based on different backbones and modalities demonstrate the effectiveness of our method. Notably, our framework achieves the state-of-the-art on the largest egocentric video dataset.
- Published
- 2023