1 Introduction
Object tracking is a fundamental problem in computer vision with applications in a wide range of domains. Whereas significant progress has been made in tracking specific objects (e.g., faces [22], humans [11], and rigid objects [15]), tracking generic objects remains hard. Since manually annotating sufficient examples of all objects in the world is prohibitively expensive and time-consuming, recently, approaches for model-free tracking have received increased interest [2], [12]. In model-free tracking, the object of interest is manually annotated in the first frame of a video sequence (using a rectangular bounding box). The annotated object needs to be tracked throughout the remainder of the video. Model-free tracking is a challenging task because (1) little information is available about the object to be tracked, (2) this information is ambiguous in the sense that the initial bounding box only approximately distinguishes the object of interest from the background, and (3) the object appearance may change drastically over time.