I. Introduction
Object tracking is an important computer vision task which can be applied to many domains such as visual surveillance [1]–[3], human computer interaction [4], [5], video compression [6] and medical imaging [7], [8]. Thereby, developing a robust tracking method is an critical issue for these applications. Given the initialized state of a target in one frame of a video, the purpose of tracking is to estimate the states of the target in subsequent frames. In the last decades, a huge body of color-image-based approaches for object tracking has been proposed. In [9], an ensemble tracker was presented to formulate the tracking as a pixel-based binary classification between the target and the background. The pixel-based representation is rather limited which constrains its ability to handle heavy occlusions and background clutters. In [10], an appearance model was extracted in the compressed domain which was simple but real-time, and it was able to get robust performance in constrained conditions. However, similar backgrounds and occlusions may lead to tracking failure and model drift. In [11], the authors presented a method to represent the template object with histograms of multiple rectangular sub-regions of the template, which was able to handle partial occlusions and pose changes. But it may fail when the object is fully occluded.