101. First-Person Hand Action Recognition Using Multimodal Data
- Author
-
Hongyu Wang, Hongye Xie, Na Cheng, Zhenyu Liu, and Rui Li
- Subjects
Modality (human–computer interaction) ,Computer science ,business.industry ,Pattern recognition ,Recurrent neural network ,Action (philosophy) ,Discriminative model ,Artificial Intelligence ,Simple (abstract algebra) ,RGB color model ,Graph (abstract data type) ,Artificial intelligence ,Representation (mathematics) ,business ,Software - Abstract
Extensive studies have been conducted on human action recognition, whereas relatively few methods have been proposed for hand action recognition. Although it is very natural and straightforward to apply a human action recognition method to hand action recognition, this approach cannot always lead to state-of-the-art performance. One of the important reasons is that both the between-class difference and the within-class difference in hand actions are much smaller than those in human actions. In this paper, we study first-person hand action recognition from RGB-D sequences. To explore whether pretrained networks substantially influence accuracy, 8 classic pretrained networks and one pretrained network designed by us are introduced for extracting RGB-D features. A Lie group is introduced for hand pose representation. Ablation studies are conducted to compare the discriminative power of the RGB modality, depth modality, pose modality, and their combinations. In our method, a fixed number of frames are randomly sampled to represent an action. This temporal modeling strategy is simple but is proven more effective than both the graph convolutional network (GCN) and the recurrent neural network (RNN), which are widely adopted by conventional methods. Evaluation experiments on two public datasets demonstrate that our method markedly outperforms recent baselines.
- Published
- 2022