Loading [MathJax]/extensions/MathMenu.js
Efficient Pose Estimation via a Lightweight Single-Branch Pose Distillation Network | IEEE Journals & Magazine | IEEE Xplore

Efficient Pose Estimation via a Lightweight Single-Branch Pose Distillation Network


Abstract:

Accurate lightweight (LW) pose estimation is still a challenging task influenced by different human poses and various complex backgrounds in 2-D human images. To address ...Show More

Abstract:

Accurate lightweight (LW) pose estimation is still a challenging task influenced by different human poses and various complex backgrounds in 2-D human images. To address the above problems, we propose a lightweight single-branch pose distillation network, termed LSPD, which is a lightweight powerful fully convolutional pose network that can be executed quickly with a low computational cost for accurate pose estimation. First, we introduced an efficient end-to-end pose distillation sequence framework, which utilizes a small number of lightweight and strong pose estimation stages to effectively transfer the pose knowledge of our teacher model. Second, we constructed a compact and strong pose estimation stage that uses a type of lightweight multiscale residual block to enhance the image features and the image-dependent spatial features representation ability of the model. At the same time, it reduces the computational cost. Finally, when training is complete, we used the backbone network and the first student stage as the simple architecture to deploy. Extensive experiments demonstrated that the proposed method obtains excellent performance with high accuracy and low model parameters.
Published in: IEEE Sensors Journal ( Volume: 23, Issue: 22, 15 November 2023)
Page(s): 27709 - 27719
Date of Publication: 13 October 2023

ISSN Information:

Funding Agency:

No metrics found for this document.

I. Introduction

Single-person pose estimation, also known as human keypoints detection, which is to locate the coordinates of keypoints or joints of the human body using image sensor input data, has become a fundamental challenging problem in computer vision. It has many application scenarios, including human behavior recognition [1], human-computer interaction, distracted driving behavior detection [2], etc. With the development of deep convolutional neural networks (DCNNs) and their excellent performance, human pose estimation based on DCNNs has also made significant progress. Most existing state-of-the-art (SOTA) pose estimation methods [3], [4], [5], [6] can achieve good detection accuracy, however, they are usually accompanied by a complex network structure and high resource consumption, which limits their promotion in resource-limited devices, such as robots, cars, monitoring equipments, etc. To achieve good accuracy, low cost, and real-time performance, many efficient pose estimation methods have been proposed, which can be mainly divided into two categories: conventional lightweight (LW) networks [7], [8], [9] and efficient knowledge distillation networks [10], [11], [12], [13]. Although conventional lightweight networks are generally concise, pose estimation methods based on knowledge distillation have received more and more attention, and have had a good balance between detection accuracy and deployment cost. Traditional two-stage offline pose distillation schemes [10], [11] could distill pose knowledge from a heavy pre-trained pose estimator (teacher model) to a lightweight compact pose estimator (student model). It is usually time-consuming, and strong teacher models are not always available. So one-stage online multibranch pose distillation schemes [12] are proposed to reduce the complexity and the tediousness of model training in the traditional distillation process. There is also no need for a large pre-trained teacher model. Although these methods compress model parameters to reduce the training cost of the model by the means of knowledge distillation and maintain high accuracy, there are still several problems to be solved. First, current top-performing pose distillation methods rely on complex and heavy basic building blocks and neglect to design or use lightweight structures for reducing computational cost and model parameters. Second, the existing online pose distillation schemes rely on a teacher model composed of redundant student models and do not explore the impact of the number of student models on the performance of the final target model. Finally, it is more difficult to detect invisible keypoints due to blurry appearance, occlusion, etc.

Usage
Select a Year
2025

View as

Total usage sinceOct 2023:418
051015202530JanFebMarAprMayJunJulAugSepOctNovDec151121000000000
Year Total:47
Data is updated monthly. Usage includes PDF downloads and HTML views.
Contact IEEE to Subscribe

References

References is not available for this document.