Back to Search Start Over

Dynamic Gesture Recognition Based on Three-Stream Coordinate Attention Network and Knowledge Distillation

Authors :
Shanshan Wan
Lan Yang
Keliang Ding
Dongwei Qiu
Source :
IEEE Access, Vol 11, Pp 50547-50559 (2023)
Publication Year :
2023
Publisher :
IEEE, 2023.

Abstract

Gesture recognition has always been one of the important research directions in the field of computer vision. The dynamic gesture has the problems of complex backgrounds and many interference factors. The gesture recognition model based on deep learning usually has high computational cost and poor real-time performance. In addition, deep learning models are limited to recognizing existing categories in the training set and their performance largely depends on the amount of labeled data. To address the above problems, this paper presents a dynamic gesture recognition method named 3SCKI based on a three-stream coordinate attention (CA) network, knowledge distillation, and image-text contrastive learning. Specifically, 1) CA is utilized for feature fusion to make the model focus more on target gestures and reduce background interference, 2) traditional knowledge distillation loss is improved to reduce the amount of calculation and improve the real-time performance. Specifically, the guidance function is added to make the student network only learn the classification probability correctly identified by the teacher network, and 3) multi-granularity context prompt template integration method is proposed to construct an improved CLIP visual language model MG-CLIP. It aligns text and visual concepts from the image level to the object level to the part level. Through comparative learning of image features and text features, gesture classification is performed, enabling the model to identify image categories that have not appeared during the training phase. The proposed method is evaluated on the ChaLearn LAP large-scale isolated gesture dataset (IsoGD). The results show that our proposed method can obtain recognition rates of 65.87% on the validation set of IsoGD. For single mode data, 3SCKI can obtain the state-of-the-art recognition accuracy on RGB, Depth, and Optical Flow data (61.22%, 58.84%, and 50.30% of the validation set of IsoGD, respectively).

Details

Language :
English
ISSN :
21693536
Volume :
11
Database :
Directory of Open Access Journals
Journal :
IEEE Access
Publication Type :
Academic Journal
Accession number :
edsdoj.045b44268a284849b095614829b54d92
Document Type :
article
Full Text :
https://doi.org/10.1109/ACCESS.2023.3278100