1 Introduction
Action recognition is receiving more and more attention in computer vision due to its potential applications such as video surveillance, human-computer interaction, virtual reality, and multimedia retrieval. Descriptor matching and classification-based schemes have been common for action recognition. However, for large-scale action retrieval and recognition where the training database consists of thousands of action videos, such a matching scheme may require tremendous amounts of computation. Recognizing actions viewed against a dynamic varying background is another important challenge. Many studies have been performed on effective feature extraction and categorization methods for robust action recognition. Detailed surveys were reported in [1], [2], [3].