Conferences >2024 IEEE 17th International ...

SST: Simplified Space-Time Transformer Based on Time-Assisted Spatial MSA for 3D Human Pose Estimation

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Depth ambiguity in 2D human joint estimation is a persistent issue for 2D-3D human pose estimation networks. To cope with this challenge, temporal dimensions are adopted ...Show More

Metadata

Abstract:

Depth ambiguity in 2D human joint estimation is a persistent issue for 2D-3D human pose estimation networks. To cope with this challenge, temporal dimensions are adopted in existing models, however, none of them are able to fully utilize the information embedded in the input data. In this paper, we present a Time-assisted Spatial (TaS) MSA & Simplified Space-Time Transformer (SST) to better capture the spatial-temporal relationships. First, we design a new Time-assisted Spatial (TaS) MSA to comprehensively model spatial-temporal relationships. Secondly, we combine TaS MSA and Temporal MSA in parallel to enhance modeling capability and to build Simplified Space-Time Transformer (SST) model. Thirdly, we find an optimal pipeline of SST through contrasting the impact of parallel blocks and intermediate feature dimensions on the model's performance. Experimental results show that our model achieves the highest accuracy on Human3.6M dataset, with 0.4mm gain against current methods and 9% improvement in difficult positions.

Published in: 2024 IEEE 17th International Conference on Solid-State & Integrated Circuit Technology (ICSICT)

Date of Conference: 22-25 October 2024

Date Added to IEEE Xplore: 16 January 2025

ISBN Information:

ISSN Information:

DOI: 10.1109/ICSICT62049.2024.10831177

Conference Location: Zhuhai, China

References is not available for this document.

Contents

I. Introduction

3D human pose estimation is pervasively utilized in human-machine interaction, autonomous driving, auxiliary medical care, etc. Traditional monocular 3D human pose estimation methods normally utilize convolutional and fully connected layers to predict 3D human joints. To better utilize 2D human pose estimation and to improve the accuracy, a typical process of 3D human pose estimation generally has two stages. First, obtaining 2D human joint positions with a 2D pose estimation. Second, mapping them to the corresponding 3D joint positions, e.g. SimpieBaseline3D [1] and VideoPose3D [2]. With the introduction of PoseFormer [3], Transformer [4] becomes a promising foundational architecture with ascendant performance [1], [5], [6], [7]. However, the persistent challenges of location uncertainty and depth ambiguity remain unsolved due to the absence of depth information.

References is not available for this document.

SST: Simplified Space-Time Transformer Based on Time-Assisted Spatial MSA for 3D Human Pose Estimation

Abstract:

Metadata

Abstract:

ISSN Information:

I. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

SST: Simplified Space-Time Transformer Based on Time-Assisted Spatial MSA for 3D Human Pose Estimation

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

I. Introduction

Authors

Figures

References

Keywords

Metrics

Footnotes

References

IEEE Account

Purchase Details

Profile Information

Need Help?