1. Introduction
Articulated human pose estimation plays a key role in many computer vision applications, including activity recognition and video understanding [30], [34]. Several factors make this task challenging, such as the diversity of appearances, changes in scene illumination and camera view-point, background clutter, and occlusion. In recent years, a significant effort has been devoted to estimating human poses in single images [3], [7], [24], [3]. Although these methods perform well on certain body parts, e.g., head, their performance on localizing parts corresponding to lower arms, i.e., elbows and wrists, is poor in general. The focus of this paper is to improve human pose estimation, and in particular to localize lower-arm parts accurately by modeling interactions between body parts across time.