Back to Search Start Over

3D hand pose estimation from a single RGB image by weighting the occlusion and classification.

Authors :
Mahdikhanlou, Khadijeh
Ebrahimnezhad, Hossein
Source :
Pattern Recognition. Apr2023, Vol. 136, pN.PAG-N.PAG. 1p.
Publication Year :
2023

Abstract

• We propose a big and real RGB 3D hand pose estimation dataset that contains the 2D and 3D coordinates of the joints, status of the joints occlusion, classes of different gestures, and segmentation of the 15 parts of the hand. • We propose semantic segmentation of the hand parts for estimating the weight of the occlusion at hand joints. • We exploit the occlusion information for estimating the 3D hand pose from a single RGB image. • The proposed framework is a hybrid method of classification and estimation networks to increase the validity and the accuracy of the predicted hand poses. • Classification of the hand poses is based on the different aspects including gesture, the direction of the palm, and direction of the hand. In this paper, a new framework for 3D hand pose estimation using a single RGB image is proposed. The framework is composed of two blocks. The first block formulates the hand pose estimation as a classification problem. Since the human hand can perform numerous poses, the classification network needs a huge number of parameters. So, we propose to classify hand poses based on three different aspects, including hand gesture, hand direction, and palm direction. In this way, the number of parameters will be significantly reduced. The motivation behind the classification block is that the model deals with the image as a whole and extracts global features. Furthermore, the output of the classification model is a valid pose that does not include any unexpected angle at joints. The second block estimates the 3D coordinates of the hand joints and focuses more on the details of the image pattern. RGB-based 3D hand pose estimation is an inherently ill-posed problem due to the lack of depth information in the 2D image. We propose to use the occlusion status of the hand joints to solve this problem. The occlusion status of the joints has been labeled manually. Some joints are partially occluded, and we propose to compute the extent of the occlusion by semantic segmentation. The existing methods in this field mostly used synthetic datasets. But all the models proposed in this paper are trained on more than 50 K real images. Extensive experiments on our new dataset and two other benchmark datasets show that the proposed method can achieve good performance. We also analyze the validity of the predicted poses, and the results show that the classification block increases the validity of the poses. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
00313203
Volume :
136
Database :
Academic Search Index
Journal :
Pattern Recognition
Publication Type :
Academic Journal
Accession number :
161280470
Full Text :
https://doi.org/10.1016/j.patcog.2022.109217