Conferences >2018 IEEE International Confe...

Hierarchical Dropped Convolutional Neural Network for Speed Insensitive Human Action Recognition

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Human action recognition using skeleton data has lots of potential applications in content-based action retrieval and intelligent surveillance, with wide usage of depth s...Show More

Metadata

Abstract:

Human action recognition using skeleton data has lots of potential applications in content-based action retrieval and intelligent surveillance, with wide usage of depth sensors and robust skeleton estimation algorithms. Previous methods describe spatial temporal skeleton joints as a compact color image and then use Convolutional Neural Network (CNN) to extract more discriminative deep features. However, these methods ignore the effect of speed variation, which is a common phenomenon and can bring severe intra-varieties to same types of actions. To solve this problem, this paper presents a novel hierarchical dropped CNN architecture, which is constructed in two stages. Dropped CNN (d-CNN) is firstly developed to extract deep features from a probabilistic speed insensitive color image. This image expresses both spatial distributions and temporal evolutions of skeleton joints meanwhile avoids the effect of speed variations. To enhance the temporal discriminative power, we extend d-CNN to a hierarchical structure (h-CNN), where multiple scales of temporal information are encoded. Extensive experiments on benchmark MSRC-12 dataset and the largest NTU RGB+D dataset verify the effectiveness and robustness of the proposed method.

Published in: 2018 IEEE International Conference on Multimedia and Expo (ICME)

Date of Conference: 23-27 July 2018

Date Added to IEEE Xplore: 11 October 2018

ISBN Information:

ISSN Information:

DOI: 10.1109/ICME.2018.8486477

Conference Location: San Diego, CA, USA

Contents

1. Introduction

Human action recognition has been widely explored, bringing applications to many fields, such as content-based action re-trieval [1], intelligent surveillance [2], gaming [3] and so on. The first attempt of this task uses RGB data, since RGB sensor is cheap and has been used in various scenarios. Since RGB sensor cannot capture depth information, it is rather difficult for algorithms to detect human bodies from cluttered background. Moreover, the lost of depth data brings ambiguities for distinguishing similar actions. With the progress of depth sensor, i.e., Microsoft Kinect, researchers begin using depth data for human action recognition. Compared with RGB data, human bodies can be segmented from backgrounds more easily, since the complex and confusing textures or illuminations are ignored by depth sensor. More importantly, additional information from depth data provides a new view to distinguish actions whose appearances are similar from the view of X-Y plane but different in the depth (Z axis) direction. The drawbacks of depth data are mainly two folds. First, the depth data contains jumping noises. Second, depth data is usually redundant for mapping a complex depth sequence to a simple action label. Recently, robust skeleton estimation algorithms can extract skeleton joints from depth data in realtime, which opens a new way for understanding human actions using 3D skeleton data. Compared with depth data, skeleton joints estimated by any robust algorithm [4] is more compact and suffers less from jumping noise.

References is not available for this document.

MIT Libraries

MIT Libraries

Hierarchical Dropped Convolutional Neural Network for Speed Insensitive Human Action Recognition

Abstract:

Metadata

Abstract:

ISSN Information:

1. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

MIT Libraries

MIT Libraries

Hierarchical Dropped Convolutional Neural Network for Speed Insensitive Human Action Recognition

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

1. Introduction

References