1. Introduction
Recognizing human action is a key component in many computer vision applications, such as video surveillance, human-computer interface, video indexing and browsing, recognition of gestures, analysis of sports events and dance choreography. Some of the recent work done in the area of action recognition [7], [21], [11], [17] have shown that it is useful to analyze actions by looking at the video sequence as a space-time intensity volume. Analyzing actions directly in the space-time volume avoids some limitations of traditional approaches that involve the computation of optical flow [2], [8] (aperture problems, smooth surfaces, singularities, etc.), feature tracking [20], [4] (self-occlusions, re-initialization, change of appearance, etc.), key frames [6] (lack of information about the motion). Most of the above studies are based on computing local space-time gradients or other intensity based features and thus might be unreliable in cases of low quality video, motion discontinuities and motion aliasing. Space-time shapes of “jumping-jack”, “walking” and “running” actions.