Loading [MathJax]/extensions/MathMenu.js
Spatio-temporal Shape and Flow Correlation for Action Recognition | IEEE Conference Publication | IEEE Xplore

Spatio-temporal Shape and Flow Correlation for Action Recognition


Abstract:

This paper explores the use of volumetric features for action recognition. First, we propose a novel method to correlate spatio-temporal shapes to video clips that have b...Show More

Abstract:

This paper explores the use of volumetric features for action recognition. First, we propose a novel method to correlate spatio-temporal shapes to video clips that have been automatically segmented. Our method works on over-segmented videos, which means that we do not require background subtraction for reliable object segmentation. Next, we discuss and demonstrate the complementary nature of shape- and flow-based features for action recognition. Our method, when combined with a recent flow-based correlation technique, can detect a wide range of actions in video, as demonstrated by results on a long tennis video. Although not specifically designed for whole-video classification, we also show that our method's performance is competitive with current action classification techniques on a standard video classification dataset.
Date of Conference: 17-22 June 2007
Date Added to IEEE Xplore: 16 July 2007
ISBN Information:
Print ISSN: 1063-6919
Conference Location: Minneapolis, MN, USA

1. Introduction

The goal of action recognition is to localize a particular event of interest in video, such as a tennis serve, both in space and in time. Just as object recognition is a key problem in image understanding, action recognition is a fundamental challenge for interpreting video. A recent trend in action recognition has been the emergence of techniques based on the volumetric analysis of video, where a sequence of images is treated as a three-dimensional space-time volume. Eschewing the building of explicit models of the actor or environment (e.g., kinematic models of humans), these approaches attempt to perform recognition directly on the raw video. An obvious benefit is that recognition need not be limited to a specific set of actors or actions but can, in principle, extend to a variety of events - given appropriate training data. The drawback is that volumetric representations do not easily generalize across appearance changes due to different actors, varying environmental conditions and camera viewpoint. This observation has motivated the employment of video features that are robust to appearance; these can be broadly categorized as shape-based (e.g., background subtracted human silhouettes) and flow-based (e.g., motion fields generated using optical flow). However, as discussed below, both of these types of methods have significant limitations. Our goal is to detect specific actions in realisitic videos with cluttered environments. First, we segment input video into space-time volumes. Then, we correlate action templates with the volumes using shape and flow features. We are able to localize events in space-time without the need for background-subtracted videos.

Contact IEEE to Subscribe

References

References is not available for this document.