Back to Search Start Over

Efficient Multimodal Fusion for Hand Pose Estimation With Hourglass Network

Authors :
Dinh-Cuong Hoang
Phan Xuan Tan
Duc-Long Pham
Hai-Nam Pham
Son-Anh Bui
Chi-Minh Nguyen
An-Binh Phi
Khanh-Duong Tran
Viet-Anh Trinh
van-Duc Tran
Duc-Thanh Tran
van-Hiep Duong
Khanh-Toan Phan
van-Thiep Nguyen
van-Duc Vu
Thu-Uyen Nguyen
Source :
IEEE Access, Vol 12, Pp 113810-113825 (2024)
Publication Year :
2024
Publisher :
IEEE, 2024.

Abstract

Hand pose estimation is vital for various applications, including virtual reality (VR), augmented reality (AR), gesture recognition, human-computer interaction (HCI), and robotics. Achieving accurate and real-time hand pose estimation is challenging due to factors such as the high degree of articulation in the human hand and the variability in hand shapes and sizes. While multimodal data offers advantages, developing a fast and resource-efficient hand pose estimation system remains challenging. Current state-of-the-art methods often require powerful graphics processing units (GPUs) for high performance, limiting deployment on edge platforms with limited computational resources. There is a critical need for higher efficiency without compromising accuracy, especially in real-world applications like mobile devices and embedded systems. Additionally, real-time performance is essential for practical applications, where systems must respond immediately to user interactions. Unfortunately, most current methods struggle to achieve real-time speeds, even on powerful GPUs, let alone on resource-constrained devices. To address these challenges, we propose an efficient hand pose estimation system that leverages both red-green-blue (RGB) and depth (RGBD) data through a unified fusion strategy. Our method combines appearance and geometric data early in the processing pipeline, significantly reducing computational complexity while maintaining real-time performance on resource-constrained devices. Experimental results show that the proposed model runs at over 110 fps on GPU, and 30 fps on the edge platform of NVidia Jetson NX Xavier, which is 4 to 5 times faster than existing methods, while achieving competitive accuracy.

Details

Language :
English
ISSN :
21693536 and 14260093
Volume :
12
Database :
Directory of Open Access Journals
Journal :
IEEE Access
Publication Type :
Academic Journal
Accession number :
edsdoj.f2a986abc242aaa142600934b92122
Document Type :
article
Full Text :
https://doi.org/10.1109/ACCESS.2024.3444322