3D human pose estimation aims to simultaneously localize and estimate articulated 3D joint locations of humans from 2D images, which facilities substantial practical applications of the human-computer interaction [1], [2], [3], [4], [5], [6], such as Virtual Reality (VR)/Augmented Reality (AR), human action recognition, telemedicine and telesurgery, visual impairment assistance, etc. Recently, great progress has been achieved thanks to the sophisticated design of models [7], [8], [9], [10] and the availability of large-scale datasets [11], [12], [13]. Nevertheless, these methods still have a limited performance under crowded scenes, which involve severe overlapping of human body parts and lead to incorrect detection or association of keypoints for multi-person 3D pose estimation.
Abstract:
The performance of existing methods for multi-person 3D pose estimation in crowded scenes is still limited, due to the challenge of heavy overlapping among persons. Attem...Show MoreMetadata
Abstract:
The performance of existing methods for multi-person 3D pose estimation in crowded scenes is still limited, due to the challenge of heavy overlapping among persons. Attempt to address this issue, we propose a progressive inference scheme, i.e., Articulation-aware Knowledge Exploration (AKE), to improve the multi-person 3D pose models on those samples with complex occlusions at the inference stage. We argue it is beneficial to explore the underlying articulated information/ knowledge of the human body, which helps to further correct the predicted poses in those samples. To exploit such information, we propose an iterative scheme to achieve a self-improving loop for keypoint association. Specifically, we introduce a kinematic validation module for locating unreasonable articulations and an occluded-keypoint discovering module for discovering occluded articulations. Extensive experiments on two challenging benchmarks under both weakly-supervised and fully-supervised settings demonstrate the superiority and generalization ability of our proposed method for crowded scenes.
Published in: IEEE Transactions on Multimedia ( Volume: 26)
Funding Agency:
References is not available for this document.
Select All
1.
A. Seth, J. M. Vance and J. H. Oliver, "Virtual reality for assembly methods prototyping: A review", Virtual Reality, vol. 15, no. 1, pp. 5-20, 2011.
2.
B. Jansen, F. Temmermans and R. Deklerck, "3D human pose recognition for home monitoring of elderly", Proc. IEEE 29th Annu. Int. Conf. Eng. Med. Biol. Soc., pp. 4049-4051, 2007.
3.
P. Wei, H. Sun and N. Zheng, "Learning composite latent structures for 3D human action representation and recognition", IEEE Trans. Multimedia, vol. 21, no. 9, pp. 2195-2208, Sep. 2019.
4.
P. Hu, E. S. Ho and A. Munteanu, "3DBodyNet: Fast reconstruction of 3D animatable human body shape from a single commodity depth camera", IEEE Trans. Multimedia, vol. 24, pp. 2139-2149, 2022.
5.
M. Garcia-Salguero, J. Gonzalez-Jimenez and F.-A. Moreno, "Human 3D pose estimation with a tilting camera for social mobile robot interaction", Sensors, vol. 19, no. 22, 2019.
6.
X. Liu and G. Zhao, "3D skeletal gesture recognition via discriminative coding on time-warping invariant Riemannian trajectories", IEEE Trans. Multimedia, vol. 23, pp. 1841-1854, 2021.
7.
J. Zhen et al., "SMAP: Single-shot multi-person absolute 3D pose estimation", Proc. Eur. Conf. Comput. Vis., pp. 550-566, 2020.
8.
G. Moon, J. Y. Chang and K. M. Lee, "Camera distance-aware top-down approach for 3D multi-person pose estimation from a single RGB image", Proc. IEEE/CVF Int. Conf. Comput. Vis., pp. 10132-10141, 2019.
9.
C. Wang, J. Li, W. Liu, C. Qian and C. Lu, "HMOR: Hierarchical multi-person ordinal relations for monocular multi-person 3D pose estimation", Proc. 16th Eur. Conf. Comput. Vis., pp. 242-259, 2020.
10.
M. Fabbri, F. Lanzi, S. Calderara, S. Alletto and R. Cucchiara, "Compressed volumetric heatmaps for multi-person 3D pose estimation", Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., pp. 7202-7211, 2020.
11.
H. Joo et al., "Panoptic studio: A massively multiview system for social motion capture", Proc. IEEE/CVF Int. Conf. Comput. Vis., pp. 3334-3342, 2015.
12.
D. Mehta et al., "Single-shot multi-person 3D pose estimation from monocular RGB", Proc. IEEE Int. Conf. 3D Vis., pp. 120-130, 2018.
13.
M. Fabbri et al., "Learning to detect and track visible and occluded body joints in a virtual world", Proc. Eur. Conf. Comput. Vis., pp. 430-446, 2018.
14.
H. Chu et al., "Part-aware measurement for robust multi-view multi-human 3D pose estimation and tracking", Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., pp. 1472-1481, 2021.
15.
H. Chen, P. Guo, P. Li, G. H. Lee and G. Chirikjian, "Multi-person 3D pose estimation in crowded scenes based on multi-view geometry", Proc. 16th Eur. Conf. Comput. Vis., pp. 541-557, 2020.
16.
J. Zhang et al., "Direct multi-view multi-person 3D pose estimation", Proc. Adv. Neural Inf. Process. Syst., pp. 13153-13164, 2021.
17.
Y. Cheng, B. Wang, B. Yang and R. T. Tan, "Monocular 3D multi-person pose estimation by integrating top-down and bottom-up networks", Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pp. 7645-7655, 2021.
18.
Y. Cheng, B. Wang, B. Yang and R. T. Tan, "Graph and temporal convolutional networks for 3D multi-person pose estimation in monocular videos", Proc. AAAI Conf. Artif. Intell., pp. 1157-1165, 2021.
19.
D. Mehta et al., "XNect: Real-time multi-person 3D motion capture with a single RGB camera", ACM Trans. Graph., vol. 39, no. 4, 2020.
20.
H. Joo, N. Neverova and A. Vedaldi, "Exemplar fine-tuning for 3D human model fitting towards in-the-wild 3D human pose estimation", Proc. Int. Conf. 3D Vis., pp. 42-52, 2021.
21.
N. Kolotouros, G. Pavlakos, M. J. Black and K. Daniilidis, "Learning to reconstruct 3D human pose and shape via model-fitting in the loop", Proc. IEEE/CVF Int. Conf. Comput. Vis., pp. 2252-2261, 2019.
22.
D. C. Luvizon, D. Picard and H. Tabia, "Consensus-based optimization for 3D human pose estimation in camera coordinates", Int. J. Comput. Vis., vol. 130, no. 3, pp. 869-882, 2022.
23.
Z. Cao, T. Simon, S.-E. Wei and Y. Sheikh, "Realtime multi-person 2D pose estimation using part affinity fields", Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pp. 1302-1310, 2017.
24.
W. Li et al., "Exploiting temporal contexts with strided transformer for 3D human pose estimation", IEEE Trans. Multimedia, vol. 25, pp. 1282-1293, 2023.
25.
M. Ghafoor and A. Mahmood, "Quantification of occlusion handling capability of 3D human pose estimation framework", IEEE Trans. Multimedia, Mar. 2022.
26.
G. Rogez, P. Weinzaepfel and C. Schmid, "LCR-Net: Localization-classification-regression for human pose", Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pp. 1216-1224, 2017.
27.
G. Rogez, P. Weinzaepfel and C. Schmid, "LCR-Net : Multi-person 2D and 3D pose detection in natural images", IEEE Trans. Pattern Anal. Mach. Intell., vol. 42, no. 5, pp. 1146-1161, May 2020.
28.
J. Lin and G. H. Lee, "HDNet: Human depth estimation for multi-person camera-space localization", Proc. 16th Eur. Conf. Comput. Vis., pp. 633-648, 2020.
29.
J. N. Kundu, A. Revanur, G. V. Waghmare, R. M. Venkatesh and R. V. Babu, "Unsupervised cross-modal alignment for multi-person 3D pose estimation", Proc. 16th Eur. Conf. Comput. Vis., pp. 35-52, 2020.
30.
X. Nie, J. Feng, J. Zhang and S. Yan, "Single-stage multi-person pose machines", Proc. IEEE/CVF Eur. Conf. Comput. Vis., pp. 6951-6960, 2019.