1 Introduction
Different people with similar behaviors induce completely different space-time intensity patterns in a recorded video sequence. This is because they wear different clothes, and their surrounding backgrounds are different. What is common across such sequences of the same behaviors is the underlying induced motion fields. This observation was used in [9], where low-pass filtered optical-flow fields (between pairs of frames) were used for action recognition. However, dense, unconstrained, and nonrigid motion estimation is highly noisy and unreliable. Clothes worn by different people performing the same action often have very different spatial properties (different color, texture, and so forth). Uniform-colored clothes induce local aperture effects, especially when the observed acting person is large (which is why Efros et al. [9] analyze small people "at a glance"). Dense flow estimation is even more unreliable when the dynamic event contains unstructured objects like running water, flickering fire, and so forth.