RGBD-HuDaAct: A color-depth video database for human daily activity recognition | IEEE Conference Publication | IEEE Xplore

RGBD-HuDaAct: A color-depth video database for human daily activity recognition


Abstract:

In this paper, we present a home-monitoring oriented human activity recognition benchmark database, based on the combination of a color video camera and a depth sensor. O...Show More

Abstract:

In this paper, we present a home-monitoring oriented human activity recognition benchmark database, based on the combination of a color video camera and a depth sensor. Our contributions are two-fold: 1) We have created a publicly releasable human activity video database (i.e., named as RGBD-HuDaAct), which contains synchronized color-depth video streams, for the task of human daily activity recognition. This database aims at encouraging more research efforts on human activity recognition based on multi-modality sensor combination (e.g., color plus depth). 2) Two multi-modality fusion schemes, which naturally combine color and depth information, have been developed from two state-of-the-art feature representation methods for action recognition, i.e., spatio-temporal interest points (STIPs) and motion history images (MHIs). These depth-extended feature representation methods are evaluated comprehensively and superior recognition performances over their uni-modality (e.g., color only) counterparts are demonstrated.
Date of Conference: 06-13 November 2011
Date Added to IEEE Xplore: 16 January 2012
ISBN Information:
Conference Location: Barcelona

1. Introduction

Being able to recognize and analyze human daily activities (e.g., go to bed, mop the floor and eat meal etc.) in a low cost and intelligent way (e.g., vision-based) for elderly people living-alone is essential for further providing them with appropriate health and medical services [1]. Video-based (color camera) Human activity (action) recognition has been an active research topic in computer vision over the last decade. However, the inherent limitation of the sensing device (i. e., color camera) restricts previous methods [5], [11], [3], [24] to be only capable of describing lateral motions. As human bodies and motions are in essence three-dimensional, the information loss in the depth channel could cause significant degradation of the representation and discriminating capability for these feature representations. Recent emergence of depth sensor (e.g., Microsoft Kinect) has made it feasible and economically sound to capture in real-time not only the color images, but also depth maps with appropriate resolution (e.g., ) and accuracy . It can provide three-dimensional structure information of the scene as well as the three-dimensional motion information of the subjects/objects in the scene. Therefore the motion ambiguity of the color camera, i.e., projection of the three-dimensional motion onto the two-dimensional image plane, could be bypassed.

References

References is not available for this document.