I. Introduction
Human action recognition has attracted many research efforts for several years due to its wide applications in surveillance systems, human-computer interaction, etc. Most existing methods [1], [2] perform action recognition on RGB videos. However, as variations in the subjects’ clothing, lighting, and viewing angle may bring in intra-class variety, using only appearance information is insufficient for action characterization. Relatively, spatial information contains more intrinsic characteristics of human actions.