Towards Accurate Multi-person Pose Estimation in the Wild | IEEE Conference Publication | IEEE Xplore

Towards Accurate Multi-person Pose Estimation in the Wild


Abstract:

We propose a method for multi-person detection and 2-D pose estimation that achieves state-of-art results on the challenging COCO keypoints task. It is a simple, yet powe...Show More

Abstract:

We propose a method for multi-person detection and 2-D pose estimation that achieves state-of-art results on the challenging COCO keypoints task. It is a simple, yet powerful, top-down approach consisting of two stages. In the first stage, we predict the location and scale of boxes which are likely to contain people, for this we use the Faster RCNN detector. In the second stage, we estimate the keypoints of the person potentially contained in each proposed bounding box. For each keypoint type we predict dense heatmaps and offsets using a fully convolutional ResNet. To combine these outputs we introduce a novel aggregation procedure to obtain highly localized keypoint predictions. We also use a novel form of keypoint-based Non-Maximum-Suppression (NMS), instead of the cruder box-level NMS, and a novel form of keypoint-based confidence score estimation, instead of box-level scoring. Trained on COCO data alone, our final system achieves average precision of 0.649 on the COCO test-dev set and the 0.643 test-standard sets, outperforming the winner of the 2016 COCO keypoints challenge and other recent state-of-art. Further, by using additional in-house labeled data we obtain an even higher average precision of 0.685 on the test-dev set and 0.673 on the test-standard set, more than 5% absolute improvement compared to the previous best performing method on the same dataset.
Date of Conference: 21-26 July 2017
Date Added to IEEE Xplore: 09 November 2017
ISBN Information:
Print ISSN: 1063-6919
Conference Location: Honolulu, HI, USA
Citations are not available for this document.

1. Introduction

Visual interpretation of people plays a central role in the quest for comprehensive image understanding. We want to localize people, understand the activities they are involved in, understand how people move for the purpose of Vir-tual/Augmented Reality, and learn from them to teach autonomous systems. A major cornerstone in achieving these goals is the problem of human pose estimation, defined as 2-D localization of human joints on the arms, legs, and key-ooints on torso and the face.

Cites in Papers - |

Cites in Papers - IEEE (306)

Select All
1.
Xiaodi Sun, Baojiang Zhong, Kai-Kuang Ma, "Perception-Enhanced Network for Accurate Human Pose Estimation", ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.1-5, 2025.
2.
Yuanzhe Ma, Hui Li, Hongqiao Yan, "Efficient Real-Time Sports Action Pose Estimation via EfficientPose and Temporal Graph Convolution", IEEE Access, vol.13, pp.39901-39911, 2025.
3.
Jong Woo Kim, Seo Yeong Mun, BoJeong Park, Younglim Choi, Hyunseok Kim, "Posture Recognition-Based Robot Control Using Media Pipe Pose", 2025 International Conference on Electronics, Information, and Communication (ICEIC), pp.1-3, 2025.
4.
Tsung-Han Tsai, Yi-Jhen Luo, "Monocular-Based 3-D Human Pose Estimation With Refinement Block and Special Loss Function", IEEE Sensors Journal, vol.25, no.3, pp.5679-5687, 2025.
5.
Yingying Chen, Zhitao Li, "FlexTrack3D: Advanced Single-Camera 3D Human Pose Tracking With FlexPoseNet and ZoeDepth", IEEE Access, vol.12, pp.171334-171347, 2024.
6.
Jin He, Liyan Quan, Miaomiao Cao, Xueyao Dong, Gaoyun An, "Instance Segmentation-Driven Pose Estimation for Automated Classroom Behavior Analysis", 2024 IEEE 17th International Conference on Signal Processing (ICSP), pp.234-237, 2024.
7.
Zichen Yang, Hanxin Chen, Ming Cheng, Wei Liu, Yan Chen, Yulian Cao, Daoiin Yao, "Design of a Lightweight Human Pose Estimation Algorithm Based on AlphaPose", 2024 International Conference on Networking, Sensing and Control (ICNSC), pp.1-6, 2024.
8.
Xiaoyu Tian, Li Li, Hongyu Cao, "SRSP: Sub-Random SuperPoint Based on Reprojection Error and Randomized Round Encoding", IEEE Access, vol.12, pp.111683-111693, 2024.
9.
Wenxiao Tang, Shiqi Chen, Minghui Wang, M. Saad Shakeel, Jian Jin, Wenxiong Kang, Weisi Lin, "Adaptive Positive Sample Selection and Dynamic Soft Label Assignment for Keypoint Detection", IEEE Transactions on Circuits and Systems for Video Technology, vol.34, no.12, pp.12665-12675, 2024.
10.
Wang Zeng, Sheng Jin, Lumin Xu, Wentao Liu, Chen Qian, Wanli Ouyang, Ping Luo, Xiaogang Wang, "TCFormer: Visual Recognition via Token Clustering Transformer", IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.46, no.12, pp.9521-9535, 2024.
11.
Luís Fernando de Souza Cardoso, Tobias Schwandt, Wolfgang Broll, "μPose: Synthetic Dataset for Human Pose Estimation in Microgravity Environments", 2024 IEEE Conference on Artificial Intelligence (CAI), pp.1557-1562, 2024.
12.
Minchul Kim, Yiyang Su, Feng Liu, Anil Jain, Xiaoming Liu, "KeyPoint Relative Position Encoding for Face Recognition", 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.244-255, 2024.
13.
Dongkai Wang, Shiliang Zhang, "Spatial-Aware Regression for Keypoint Localization", 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.624-633, 2024.
14.
Wei Yao, Hongwen Zhang, Yunlian Sun, Jinhui Tang, "STAF: 3D Human Mesh Recovery From Video With Spatio-Temporal Alignment Fusion", IEEE Transactions on Circuits and Systems for Video Technology, vol.34, no.11, pp.10564-10577, 2024.
15.
Mingxin Zhang, Qian Zhang, Ran Song, Paul L. Rosin, Wei Zhang, "Ship Landmark: An Informative Ship Image Annotation and Its Applications", IEEE Transactions on Intelligent Transportation Systems, vol.25, no.11, pp.17778-17793, 2024.
16.
Ying Huang, Shanfeng Hu, "MIMIC-Pose: Implicit Membership Discrimination of Body Joints for Human Pose Estimation", 2024 IEEE 18th International Conference on Automatic Face and Gesture Recognition (FG), pp.1-5, 2024.
17.
Yixin Zhang, Zejian Yuan, "Hierarchical Competition Learning for Pairwise Wheel Grounding Points Estimation", IEEE Transactions on Intelligent Transportation Systems, vol.25, no.10, pp.13876-13886, 2024.
18.
Mingya Zhang, Na Zhao, Yuqian Zhuang, Liang Wang, Xianing Tao, "FaLdViT: A Simple Yet effective framework to detect Cephalometric landmarks", 2024 27th International Conference on Computer Supported Cooperative Work in Design (CSCWD), pp.784-789, 2024.
19.
Jiaxu Song, Juan Wu, Kaiyan Yu, "Learning-Based Auto-Focus and 3D Pose Identification of Moving Micro- and Nanowires in Fluid Suspensions", IEEE Transactions on Automation Science and Engineering, vol.21, no.3, pp.2321-2334, 2024.
20.
Wonseok Lee, Seonghee Park, Taejoon Kim, "Denoising Graph Autoencoder for Missing Human Joints Reconstruction", IEEE Access, vol.12, pp.57381-57389, 2024.
21.
Stefano Aldegheri, Michele Boldo, Chiara Bozzini, Mirco De Marchi, Roberto Di Marco, Enrico Martini, Nicola Bombieri, "A Verification Platform for Human Pose Estimation Models", 2024 IEEE 25th Latin American Test Symposium (LATS), pp.1-6, 2024.
22.
Kevin Patel, M Kaif Qureshi, Dhruva Chaudhari, Krishnam Raja, Abha Tewari, "Leveraging Artificial Intelligence and Computer Vision for Effective Exercise Form Assessment", 2024 IEEE 9th International Conference for Convergence in Technology (I2CT), pp.1-5, 2024.
23.
Hao Zhang, Yujie Dun, Yixuan Pei, Shenqi Lai, Chengxu Liu, Kaipeng Zhang, Xueming Qian, "HF-HRNet: A Simple Hardware Friendly High-Resolution Network", IEEE Transactions on Circuits and Systems for Video Technology, vol.34, no.8, pp.7699-7711, 2024.
24.
Pratik K. Mishra, Alex Mihailidis, Shehroz S. Khan, "Skeletal Video Anomaly Detection Using Deep Learning: Survey, Challenges, and Future Directions", IEEE Transactions on Emerging Topics in Computational Intelligence, vol.8, no.2, pp.1073-1085, 2024.
25.
Shenglun Chen, Hong Zhang, Xinzhu Ma, Zhihui Wang, Haojie Li, "Learning Pixel-Wise Continuous Depth Representation via Clustering for Depth Completion", IEEE Transactions on Circuits and Systems for Video Technology, vol.34, no.7, pp.6303-6317, 2024.
26.
Zhewei Zhang, Mingen Liu, Junyu Shen, Yujun Cheng, Shengjin Wang, "Lightweight Whole-Body Human Pose Estimation With Two-Stage Refinement Training Strategy", IEEE Transactions on Human-Machine Systems, vol.54, no.1, pp.121-130, 2024.
27.
Zhuoran Yu, Manchen Wang, Yanbei Chen, Paolo Favaro, Davide Modolo, "Denoising and Selecting Pseudo-Heatmaps for Semi-Supervised Human Pose Estimation", 2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pp.6268-6277, 2024.
28.
Huabo Zhu, Zhihao Zhou, Bowen Liang, Xu Han, Yourui Tao, "Sub-Pixel Checkerboard Corner Localization for Robust Vision Measurement", IEEE Signal Processing Letters, vol.31, pp.21-25, 2024.
29.
Zehua Fu, Wenhang Zuo, Zhenghui Hu, Qingjie Liu, Yunhong Wang, "Improving Multi-Person Pose Tracking With a Confidence Network", IEEE Transactions on Multimedia, vol.26, pp.5223-5233, 2024.
30.
Yufei Xu, Jing Zhang, Qiming Zhang, Dacheng Tao, "ViTPose++: Vision Transformer for Generic Body Pose Estimation", IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.46, no.2, pp.1212-1230, 2024.

Cites in Papers - Other Publishers (297)

1.
Toan D. Gian, Tien Dac Lai, Thien Van Luong, Kok-Seng Wong, Van-Dinh Nguyen, "HPE-Li: WiFi-Enabled Lightweight Dual Selective Kernel Convolution for\\xa0Human Pose Estimation", Computer Vision – ECCV 2024, vol.15089, pp.93, 2025.
2.
Yalan Li, Yongsheng Teng, Yuqi Huang, Lingfeng Huang, Shilong Yang, Jing Liu, Hao Zou, Yaoqin Xie, "AIR-Net: Acupoint image registration network for automatic acupoint recognition and localization", Displays, pp.102743, 2024.
3.
Yanxia Wang, Renjie Wang, Hu Shi, Dan Liu, "MS-HRNet: multi-scale high-resolution network for human pose estimation", The Journal of Supercomputing, 2024.
4.
Jiang Liu, Huasheng Wang, Katarzyna Stawarz, Shiyin Li, Yao Fu, Hantao Liu, "Vision-based human action quality assessment: A systematic review", Expert Systems with Applications, pp.125642, 2024.
5.
Jiaqi Han, Xinlong Ma, Yiou Lyu, Haohao Bai, "Automatic Landmark Detection for Preoperative Planning of High Tibial Osteotomy Using Traditional Feature Extraction and Deep Learning Methods", The International Journal of Medical Robotics and Computer Assisted Surgery, vol.20, no.6, 2024.
6.
Chen Cheng, Huahu Xu, "Human pose estimation in complex background videos via transformer-based multi-scale feature integration", Displays, pp.102805, 2024.
7.
Seungju Lee, Gooman Park, Sungjei Kim, Suhyeon Lee, "Proposal of a Recommendation System for Time-space Matching of Motion Capture Device and Camera", JOURNAL OF BROADCAST ENGINEERING, vol.29, no.5, pp.729, 2024.
8.
Wentao Jiang, Yige Zhang, Shaozhong Zheng, Si Liu, Shuicheng Yan, "Data augmentation in human-centric vision", Vicinagearth, vol.1, no.1, 2024.
9.
Lianwu Guan, Xiaoqi Zhang, Xiaodan Cong, Zibin Zhang, Zaizhu Yang, Ningbo Li, Aboelmagd Noureldin, "Current situations and development tendencies for the body measurement technology in digital Skiing: A review", Measurement, pp.115682, 2024.
10.
Xinrui Chi, Zhanbin Guo, Fu Cheng, "Dynamic obstacle avoidance model of autonomous driving with attention mechanism and temporal residual block", Alexandria Engineering Journal, vol.105, pp.538, 2024.
11.
Wei Liang, Zhang Cheng, Junjia Han, Yanxia Wang, "EfficientPose: A Lightweight and Efficient Model with Transformer for Human Pose Estimation", Advanced Intelligent Computing Technology and Applications, vol.14864, pp.120, 2024.
12.
Yixin Chen, Qingnan Li, "Vehicle Behavior Discovery and Three-Dimensional Object Detection and Tracking Based on Spatio-Temporal Dependency Knowledge and Artificial Fish Swarm Algorithm", Biomimetics, vol.9, no.7, pp.412, 2024.
13.
Takanori Oku, Shinichi Furuya, André Lee, Eckart Altenmüller, "Video-based diagnosis support system for pianists with Musician’s dystonia", Frontiers in Neurology, vol.15, 2024.
14.
Zeyu Liu, Jiangjiang Wu, Xu Gao, Zhipeng Qin, Run Tian, Chunsheng Wang, "Deep learning-based automatic measurement system for patellar height: a multicenter retrospective study", Journal of Orthopaedic Surgery and Research, vol.19, no.1, 2024.
15.
Zhi Liu, Shengzhao Hao, Yunhua Lu, Lei Liu, Cong Chen, Ruohuang Wang, "SD-Pose: facilitating space-decoupled human pose estimation via adaptive pose perception guidance", Multimedia Systems, vol.30, no.3, 2024.
16.
Jishi Liu, Huanyu Wang, Junnian Wang, Dalin He, Ruihan Xu, Xiongfeng Tang, "Thermal infrared action recognition with two-stream shift Graph Convolutional Network", Machine Vision and Applications, vol.35, no.4, 2024.
17.
Federico Roggio, Sarah Di Grande, Salvatore Cavalieri, Deborah Falla, Giuseppe Musumeci, "Biomechanical Posture Analysis in Healthy Adults with Machine Learning: Applicability and Reliability", Sensors, vol.24, no.9, pp.2929, 2024.
18.
Ryota Goto, Ari Aharari, Farhad Mehdipour, "Development of an AI-Powered Interactive Hand Rehabilitation System", AI Technologies and Virtual Reality, vol.382, pp.429, 2024.
19.
Dongpo Xu, Yunqing Liu, Qian Wang, Liang Wang, Qiuping Shen, "Cross-Modal Supervised Human Body Pose Recognition Techniques for Through-Wall Radar", Sensors, vol.24, no.7, pp.2207, 2024.
20.
Rui Li, Qi Li, Shiqiang Yang, Xin Zeng, An Yan, "An efficient and accurate 2D human pose estimation method using VTTransPose network", Scientific Reports, vol.14, no.1, 2024.
21.
Evans Aidoo, Xun Wang, Zhenguang Liu, Abraham Opanfo Abbam, Edwin Kwadwo Tenagyei, Victor Nonso Ejianya, Seth Larweh Kodjiku, Esther Stacy E. B. Aggrey, "GITPose: going shallow and deeper using vision transformers for human pose estimation", Complex & Intelligent Systems, 2024.
22.
Keshen Zhang, Wei Wu, Yongsheng Liu, Yong Huang, Min Zhang, Hehua Zhu, "OCM: an intelligent recognition method of rock discontinuity based on optimal color mapping of 3D Point cloud via deep learning", Rock Mechanics and Rock Engineering, 2024.
23.
Longsheng Wei, Xuefu Yu, Zhiheng Liu, "Human pose estimation in crowded scenes using Keypoint Likelihood Variance Reduction", Displays, pp.102675, 2024.
24.
Yanli Ma, Qingxuan Shi, Fan Zhang, "A Lightweight Context-Aware Feature Transformer Network for Human Pose Estimation", Electronics, vol.13, no.4, pp.716, 2024.
25.
Qian Zheng, Hualing Guo, Yunhua Yin, Bin Zheng, Hongxu Jiang, "LFSimCC: Spatial fusion lightweight network for human pose estimation", Journal of Visual Communication and Image Representation, pp.104093, 2024.
26.
Min Dang, Gang Liu, Qijie Xu, Ke Li, Di Wang, Lihuo He, "Multi-object behavior recognition based on object detection for dense crowds", Expert Systems with Applications, pp.123397, 2024.
27.
Zhiming Cai, Liping Zhuang, Jin Chen, Jinhua Jiang, "Lightweight high-performance pose recognition network: HR-LiteNet", Electronic Research Archive, vol.32, no.2, pp.1145, 2024.
28.
Yingjie Tian, Duo Su, Shilin Li, "Adaptive robust loss for landmark detection", Information Fusion, vol.101, pp.102013, 2024.
29.
Ce Zheng, Wenhan Wu, Chen Chen, Taojiannan Yang, Sijie Zhu, Ju Shen, Nasser Kehtarnavaz, Mubarak Shah, "Deep Learning-based Human Pose Estimation: A Survey", ACM Computing Surveys, vol.56, no.1, pp.1, 2024.
30.
Bo Sheng, Xiaohui Chen, Yanxin Zhang, Jing Tao, Yueli Sun, "Structural topic model-based comparative review of human pose estimation research in the United States and China", Multimedia Tools and Applications, 2023.
Contact IEEE to Subscribe

References

References is not available for this document.