1. Introduction
Urban-level human motion capture is attracting more and more attention, which targets acquiring consecutive fine- grained human pose representations, such as 3D skeletons and parametric mesh models, with accurate global locations in the physical world. It is essential for human action recognition, social-behavioral analysis, and scene perception and further benefits many downstream applications, including Augmented/Virtual Reality, simulation, autonomous driving, smart city, sociology, etc. However, capturing extra large-scale dynamic scenes and annotating detailed 3D representations for humans with diverse poses is not trivial.
Using the head-mounted lidar and camera to scan the IMUs wearer, we construct sloper4d, a large scene-aware dataset for global 4d human pose estimation in urban environments.