Loading [MathJax]/extensions/MathMenu.js
Multi-Person 3D Pose Estimation With Occlusion Reasoning | IEEE Journals & Magazine | IEEE Xplore

Multi-Person 3D Pose Estimation With Occlusion Reasoning


Abstract:

The performance of existing methods for multi-person 3D pose estimation in crowded scenes is still limited, due to the challenge of heavy overlapping among persons. Attem...Show More

Abstract:

The performance of existing methods for multi-person 3D pose estimation in crowded scenes is still limited, due to the challenge of heavy overlapping among persons. Attempt to address this issue, we propose a progressive inference scheme, i.e., Articulation-aware Knowledge Exploration (AKE), to improve the multi-person 3D pose models on those samples with complex occlusions at the inference stage. We argue it is beneficial to explore the underlying articulated information/ knowledge of the human body, which helps to further correct the predicted poses in those samples. To exploit such information, we propose an iterative scheme to achieve a self-improving loop for keypoint association. Specifically, we introduce a kinematic validation module for locating unreasonable articulations and an occluded-keypoint discovering module for discovering occluded articulations. Extensive experiments on two challenging benchmarks under both weakly-supervised and fully-supervised settings demonstrate the superiority and generalization ability of our proposed method for crowded scenes.
Published in: IEEE Transactions on Multimedia ( Volume: 26)
Page(s): 878 - 889
Date of Publication: 05 May 2023

ISSN Information:

Funding Agency:

References is not available for this document.

3D human pose estimation aims to simultaneously localize and estimate articulated 3D joint locations of humans from 2D images, which facilities substantial practical applications of the human-computer interaction [1], [2], [3], [4], [5], [6], such as Virtual Reality (VR)/Augmented Reality (AR), human action recognition, telemedicine and telesurgery, visual impairment assistance, etc. Recently, great progress has been achieved thanks to the sophisticated design of models [7], [8], [9], [10] and the availability of large-scale datasets [11], [12], [13]. Nevertheless, these methods still have a limited performance under crowded scenes, which involve severe overlapping of human body parts and lead to incorrect detection or association of keypoints for multi-person 3D pose estimation.

Select All
1.
A. Seth, J. M. Vance and J. H. Oliver, "Virtual reality for assembly methods prototyping: A review", Virtual Reality, vol. 15, no. 1, pp. 5-20, 2011.
2.
B. Jansen, F. Temmermans and R. Deklerck, "3D human pose recognition for home monitoring of elderly", Proc. IEEE 29th Annu. Int. Conf. Eng. Med. Biol. Soc., pp. 4049-4051, 2007.
3.
P. Wei, H. Sun and N. Zheng, "Learning composite latent structures for 3D human action representation and recognition", IEEE Trans. Multimedia, vol. 21, no. 9, pp. 2195-2208, Sep. 2019.
4.
P. Hu, E. S. Ho and A. Munteanu, "3DBodyNet: Fast reconstruction of 3D animatable human body shape from a single commodity depth camera", IEEE Trans. Multimedia, vol. 24, pp. 2139-2149, 2022.
5.
M. Garcia-Salguero, J. Gonzalez-Jimenez and F.-A. Moreno, "Human 3D pose estimation with a tilting camera for social mobile robot interaction", Sensors, vol. 19, no. 22, 2019.
6.
X. Liu and G. Zhao, "3D skeletal gesture recognition via discriminative coding on time-warping invariant Riemannian trajectories", IEEE Trans. Multimedia, vol. 23, pp. 1841-1854, 2021.
7.
J. Zhen et al., "SMAP: Single-shot multi-person absolute 3D pose estimation", Proc. Eur. Conf. Comput. Vis., pp. 550-566, 2020.
8.
G. Moon, J. Y. Chang and K. M. Lee, "Camera distance-aware top-down approach for 3D multi-person pose estimation from a single RGB image", Proc. IEEE/CVF Int. Conf. Comput. Vis., pp. 10132-10141, 2019.
9.
C. Wang, J. Li, W. Liu, C. Qian and C. Lu, "HMOR: Hierarchical multi-person ordinal relations for monocular multi-person 3D pose estimation", Proc. 16th Eur. Conf. Comput. Vis., pp. 242-259, 2020.
10.
M. Fabbri, F. Lanzi, S. Calderara, S. Alletto and R. Cucchiara, "Compressed volumetric heatmaps for multi-person 3D pose estimation", Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., pp. 7202-7211, 2020.
11.
H. Joo et al., "Panoptic studio: A massively multiview system for social motion capture", Proc. IEEE/CVF Int. Conf. Comput. Vis., pp. 3334-3342, 2015.
12.
D. Mehta et al., "Single-shot multi-person 3D pose estimation from monocular RGB", Proc. IEEE Int. Conf. 3D Vis., pp. 120-130, 2018.
13.
M. Fabbri et al., "Learning to detect and track visible and occluded body joints in a virtual world", Proc. Eur. Conf. Comput. Vis., pp. 430-446, 2018.
14.
H. Chu et al., "Part-aware measurement for robust multi-view multi-human 3D pose estimation and tracking", Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., pp. 1472-1481, 2021.
15.
H. Chen, P. Guo, P. Li, G. H. Lee and G. Chirikjian, "Multi-person 3D pose estimation in crowded scenes based on multi-view geometry", Proc. 16th Eur. Conf. Comput. Vis., pp. 541-557, 2020.
16.
J. Zhang et al., "Direct multi-view multi-person 3D pose estimation", Proc. Adv. Neural Inf. Process. Syst., pp. 13153-13164, 2021.
17.
Y. Cheng, B. Wang, B. Yang and R. T. Tan, "Monocular 3D multi-person pose estimation by integrating top-down and bottom-up networks", Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pp. 7645-7655, 2021.
18.
Y. Cheng, B. Wang, B. Yang and R. T. Tan, "Graph and temporal convolutional networks for 3D multi-person pose estimation in monocular videos", Proc. AAAI Conf. Artif. Intell., pp. 1157-1165, 2021.
19.
D. Mehta et al., "XNect: Real-time multi-person 3D motion capture with a single RGB camera", ACM Trans. Graph., vol. 39, no. 4, 2020.
20.
H. Joo, N. Neverova and A. Vedaldi, "Exemplar fine-tuning for 3D human model fitting towards in-the-wild 3D human pose estimation", Proc. Int. Conf. 3D Vis., pp. 42-52, 2021.
21.
N. Kolotouros, G. Pavlakos, M. J. Black and K. Daniilidis, "Learning to reconstruct 3D human pose and shape via model-fitting in the loop", Proc. IEEE/CVF Int. Conf. Comput. Vis., pp. 2252-2261, 2019.
22.
D. C. Luvizon, D. Picard and H. Tabia, "Consensus-based optimization for 3D human pose estimation in camera coordinates", Int. J. Comput. Vis., vol. 130, no. 3, pp. 869-882, 2022.
23.
Z. Cao, T. Simon, S.-E. Wei and Y. Sheikh, "Realtime multi-person 2D pose estimation using part affinity fields", Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pp. 1302-1310, 2017.
24.
W. Li et al., "Exploiting temporal contexts with strided transformer for 3D human pose estimation", IEEE Trans. Multimedia, vol. 25, pp. 1282-1293, 2023.
25.
M. Ghafoor and A. Mahmood, "Quantification of occlusion handling capability of 3D human pose estimation framework", IEEE Trans. Multimedia, Mar. 2022.
26.
G. Rogez, P. Weinzaepfel and C. Schmid, "LCR-Net: Localization-classification-regression for human pose", Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pp. 1216-1224, 2017.
27.
G. Rogez, P. Weinzaepfel and C. Schmid, "LCR-Net : Multi-person 2D and 3D pose detection in natural images", IEEE Trans. Pattern Anal. Mach. Intell., vol. 42, no. 5, pp. 1146-1161, May 2020.
28.
J. Lin and G. H. Lee, "HDNet: Human depth estimation for multi-person camera-space localization", Proc. 16th Eur. Conf. Comput. Vis., pp. 633-648, 2020.
29.
J. N. Kundu, A. Revanur, G. V. Waghmare, R. M. Venkatesh and R. V. Babu, "Unsupervised cross-modal alignment for multi-person 3D pose estimation", Proc. 16th Eur. Conf. Comput. Vis., pp. 35-52, 2020.
30.
X. Nie, J. Feng, J. Zhang and S. Yan, "Single-stage multi-person pose machines", Proc. IEEE/CVF Eur. Conf. Comput. Vis., pp. 6951-6960, 2019.

Contact IEEE to Subscribe

References

References is not available for this document.