Start Over

Human pose estimation based on cross-view feature fusion.

Authors :: Sun, Dandan
Wang, Siqi
Xia, Hailun
Zhang, Changan
Gao, Jianlong
Mao, Mingyu
Source :: Visual Computer. Sep2024, Vol. 40 Issue 9, p6581-6597. 17p.
Publication Year :: 2024
Abstract: Multi-view human pose estimation can achieve high accuracy by leveraging complex spatial information from multiple perspectives. However, increasing the number of views can strain the network model, potentially compromising estimation accuracy under limited computing resources. Furthermore, in the current approach of using ResNet for feature extraction, traditional methods involve deconvolution to obtain large-sized feature maps, which can introduce artificial interference. To tackle the above challenges, we propose a perceptual network based on flexible combination view feature fusion. The network is comprised of three crucial modules. The flexible view combination policy module enables high accuracy from just a single reference view. It avoids the problem of increased complexity caused by a large number of views. The up-sampling module, based on sub-pixel convolution, is designed to achieve efficient high-resolution recovery. This resolves the issue of artificial interference introduced by deconvolution. Additionally, the feature fusion module maximizes the utilization of reference view cues to enhance the human pose estimation in the current view. Experiments conducted on the Human3.6m dataset demonstrate a reduction in the average MPJPE to 18.3 mm using our model. [ABSTRACT FROM AUTHOR]