I. Introduction
Visual object tracking is a critical technology in the computer vision field, that is widely used in intelligent driving [1], human-computer interaction [2] and video surveillance [3]. The core task is to automatically estimate the position and shape of the target in subsequent frames after the target is given an initial position in the first frame of the video. However, the presence of a complex background during tracking can cause problems such as occlusion and similar target interference, which renders the task of object tracking challenging.