I. Introduction
The supposedly simple task of locating an object in successive frames of an image sequence is called tracking. In certain fields like autonomous driving or surveillance systems, this task can be solved robustly and satisfactorily. Tracking algorithms are developed for these single tasks and they often fail in other scenarios with different characteristics. Some examples of these challenges can be illumination changes, partial or complete occlusion with other objects, rapid movement of camera or tracked object, deformation or simple rotation. For example a centroid based tracking algorithm can easily handle object deformation but struggles with partly occlusion where a correlation based tracker can be helpful. The main objective here is to create an algorithm that can track an arbitrary object in a great variety of these challenges. Tracking algorithms can use different approaches and different features for object tracking. To get an overview of current object tracking algorithms we suggest articles like [23] and [24]. Another resource for state of the art is the Visual Object Tracking Challenge [25] where modern algorithms compete with each other every year. [19] describes an evaluation methodology for tracking algorithms. In addition to creating one single tracking algorithm that can handle all challenges, a possible solution seems to be a fusion of multiple algorithms like in Fig. 2. This fusion was supposed to detect outliers and combine the advantages of the different trackers. In the tracking procedure fusion can take place at different levels. In [17] Teutsch et al. presented a framework for Multi-Target tracking. The fusion was made in feature level of the tracking procedure. Region and point-features were fused followed by the use of a Kalman Filter that reconstructed object related measurements. The aim was to handle splitting and merging of different maritime vessels. Another relevant work was presented in [18]. Several independent trackers were combined online and those most effective for the current scene conditions were selected. For this reason a Scene Context Classifier was developed that uses intensity, chromatic and motion-features from the video. Here the fusion was made in the the higher decision level of the tracking procedure. In [12] Bailer et al. introduced another possibility of a high level fusion with attraction-fields: Several trackers run parallel. Their results were fused and outliers were detected. These outliers get reinitialized or removed permanently from the tracking procedure. A similar procedure was developed by Becker et al. [13]. This fusion is called MAD-Fusion inspired by the criteria for outlier detection. Instead of using mean and standard deviation, median and median deviation were used. In this paper we focus on comparing the three different high level fusion methods: “Weighted Mean” [11], “Attraction-fields” [12] and “MAD” [13].
Left: blue rectangles indicate algorithms which result in correct object tracking, red rectangles show outliers. Right: optimal fusion result.