1. Introduction
Being able to recognize and analyze human daily activities (e.g., go to bed, mop the floor and eat meal etc.) in a low cost and intelligent way (e.g., vision-based) for elderly people living-alone is essential for further providing them with appropriate health and medical services [1]. Video-based (color camera) Human activity (action) recognition has been an active research topic in computer vision over the last decade. However, the inherent limitation of the sensing device (i. e., color camera) restricts previous methods [5], [11], [3], [24] to be only capable of describing lateral motions. As human bodies and motions are in essence three-dimensional, the information loss in the depth channel could cause significant degradation of the representation and discriminating capability for these feature representations. Recent emergence of depth sensor (e.g., Microsoft Kinect) has made it feasible and economically sound to capture in real-time not only the color images, but also depth maps with appropriate resolution (e.g., ) and accuracy . It can provide three-dimensional structure information of the scene as well as the three-dimensional motion information of the subjects/objects in the scene. Therefore the motion ambiguity of the color camera, i.e., projection of the three-dimensional motion onto the two-dimensional image plane, could be bypassed.