Recognizing realistic actions from videos “in the wild” | IEEE Conference Publication | IEEE Xplore

Recognizing realistic actions from videos “in the wild”


Abstract:

In this paper, we present a systematic framework for recognizing realistic actions from videos “in the wild.” Such unconstrained videos are abundant in personal collectio...Show More

Abstract:

In this paper, we present a systematic framework for recognizing realistic actions from videos “in the wild.” Such unconstrained videos are abundant in personal collections as well as on the web. Recognizing action from such videos has not been addressed extensively, primarily due to the tremendous variations that result from camera motion, background clutter, changes in object appearance, and scale, etc. The main challenge is how to extract reliable and informative features from the unconstrained videos. We extract both motion and static features from the videos. Since the raw features of both types are dense yet noisy, we propose strategies to prune these features. We use motion statistics to acquire stable motion features and clean static features. Furthermore, PageRank is used to mine the most informative static features. In order to further construct compact yet discriminative visual vocabularies, a divisive information-theoretic algorithm is employed to group semantically related features. Finally, AdaBoost is chosen to integrate all the heterogeneous yet complementary features for recognition. We have tested the framework on the KTH dataset and our own dataset consisting of 11 categories of actions collected from YouTube and personal videos, and have obtained impressive results for action recognition and action localization.
Date of Conference: 20-25 June 2009
Date Added to IEEE Xplore: 18 August 2009
ISBN Information:
Print ISSN: 1063-6919
Conference Location: Miami, FL, USA

1. Introduction

Automatically recognizing human actions is receiving increasing attention due to its wide range of applications such as video indexing and retrieval, human-computer interaction, and activity monitoring. Although a large amount of research has been reported on action categorization, recognizing actions from realistic video still remains a quite challenging problem due to the significant intra-class variations, occlusion, and background clutter. In order to obtain reliable features, most early work made a number of strong assumptions about the videos, such as the availability of reliable human body tracking, slight or no camera motion, and limited number of viewpoints [3], [5]. The commonly used KTH dataset contains relatively complicated scenarios, and many methods employing this dataset have been reported [8], [9], [10]. However, very few attempts have been made to recognize actions from videos “in the wild,” as shown by the examples in Fig. 1. Here, a video “in the wild” refers to a video captured under uncontrolled conditions, such as videos recoded by an amateur using a hand-held camera. Owing to the diverse video sources such as YouTube, TV broadcast and personal video collections, this type of video generally contains significant camera motion, background clutter, and changes in object appearance, scale, illumination conditions, and viewpoint. In this paper, our goal is to offer a generic framework for recognizing this type of realistic actions. Since we collected most of these videos from YouTube, hereafter, YouTube videos refer to videos “in the wild.” Examples of our YouTube action dataset consist of 11 categories with about 1160 videos. Detailed description is in section

Contact IEEE to Subscribe

References

References is not available for this document.