Loading [MathJax]/extensions/MathMenu.js
Multi-Pedestrian Tracking Based on Improved Two Step Data Association | IEEE Journals & Magazine | IEEE Xplore

Multi-Pedestrian Tracking Based on Improved Two Step Data Association


Multi-objects tracking is an important problem in computer vision. It has wide applications in visual surveillance, traffic safety and robotics and so on. To improve the ...

Abstract:

The multi-object tracking (MOT) algorithms based on tracking by detection framework are the state-of-the-art trackers in recent years. Association optimization and associ...Show More

Abstract:

The multi-object tracking (MOT) algorithms based on tracking by detection framework are the state-of-the-art trackers in recent years. Association optimization and association affinity model are two key parts in MOT, which have attracted attention to build effective association model to overcome ambiguous detection responses. In this paper, we have proposed an online multi-pedestrian tracking algorithm that uses a two-step data association with the help of improved sparse based appearance affinity model and rank based motion affinity model. The association framework is constructed by fusing the trajectory dynamic information and confidence based two-step data associations. The missing frames of a tracklet are counted based on the sparse reconstruction error of a target. An incremental SVD and downdate SVD decomposition is devised to estimate the rank of the Hankel matrix in rank based motion model. The estimated result is fed back to compute the tracklets confidence during association optimization. Both those strategies are beneficial to eliminate ambiguous detection responses during association. By this association optimization strategy, the fragmented tracklets in online tracking are reduced in some extent. We evaluate our method on four public available challenging datasets. The experimental results, both qualitative and quantitative, demonstrate that the proposed tracking algorithm has a good tracking performance compared with several state-of-the-art multi-object trackers.
Multi-objects tracking is an important problem in computer vision. It has wide applications in visual surveillance, traffic safety and robotics and so on. To improve the ...
Published in: IEEE Access ( Volume: 7)
Page(s): 100780 - 100794
Date of Publication: 16 July 2019
Electronic ISSN: 2169-3536

Funding Agency:

Citations are not available for this document.

CCBY - IEEE is not the copyright holder of this material. Please follow the instructions via https://creativecommons.org/licenses/by/4.0/ to obtain full-text articles and stipulations in the API documentation.
SECTION I.

Introduction

Multi-objects tracking (MOT) is an important problem in computer vision, which has wide applications in visual surveillance, traffic safety and robotics and so on. It aims to locate multiple objects, maintain their identities and yield their individual trajectories throughout an image sequence [1]. While the tracked objects can be pedestrian, vehicle or animal, we mainly focus on pedestrians in our work since it is a more challenging problem in crowed scenes. The crowded scenes always have similar appearance with occlusion, intersecting trajectories, missing data and camera motion.

Due to the development of pedestrian detectors [2]–​[4], the tracking-by-detection (TBD) MOT methods are the state-of-the-art tracking in recent years. Data association is a key issue in TBD methods. With the detection responses, the MOT task is formulated as an association optimization problem, in where short tracklets or detection responses are linked to yield long trajectories by learning the appearance, motion, location and size of the targets through the video sequences. There are two main components in data association based MOT system. The first one is association affinity model, which is used to measure the similarity or likelihood between tracklets in consecutive frames. The second one is association optimization model, which determines the tracklets linking between short tracklets or detection responses according to their association affinity model.

Several association optimization methods have been proposed to improve association problem in MOT [1]. Most existing association optimization methods either devise for batch tracking or online tracking. The global association method, considering past and future information, is proposed to particularly solve association optimization in batch tracking. While the local association method, only considering up to time information, mainly focuses on solving association optimization in online tracking.

To improve MOT performance, in this paper, an online multi-pedestrian tracking method is proposed, which formulates the association optimization problem into two step association by relying on an improved association affinity model. The framework of the proposed method is shown in Figure 1. The improved association affinity model is calculated by sparse-based appearance affinity model and rank-based motion affinity model. In our work, the motion affinity model is computed by a rank based motion model relying on the history position of a trajectory. Different from the general rank calculation method, an incremental SVD and downdate SVD decomposition method is devised to calculate the rank of the Hankel matrix in rank based motion model. This is specially devised for online tracking since the data explosion with time from stream data and it is too large to be stored or even buffered. In addition, we pay attention on sparse-based appearance model, use the sparse reconstruction error to analysis the missing frames in tracklet confidence calculation. To sum up, the main contributions of our work are in three folds:

  1. Association optimization is implemented by two step data association considering the tracklet with high confidence and low confidence, which is guided by sparse-based appearance model and rank-based motion model of the target.

  2. We construct the appearance affinity model based on sparse representation and rely on the reconstruction error to analysis the occlusion or detection failure. The analyzed result is fed back to compute the tracklet confidence during association optimization.

  3. The rank based motion affinity model is learned online and updated by adaptively deciding the order of history trajectory of a tracked target related with the new frame motion model. We devise an incremental SVD and downdate SVD decomposition method to online estimate the rank of the Hankel matrix, the estimated result is fed back to compute the tracklets confidence during association optimization.

FIGURE 1. - The tracking framework of the proposed method.
FIGURE 1.

The tracking framework of the proposed method.

SECTION II.

Related Works

In recent years, many existing MOT works focus on devising a robust appearance model to improve the tracking performance since the appearance features are very important factors influenced the MOT performance. thus, several features, such as histogram-based features, sparse features and CNN-based features, are used to build appearance model of the tracked target by integrating different affinity measures [5]. Kuo et al. [6] devised an online learned discriminative appearance model called OLDAMs to improve MOT performance. Bae et al. [7] design a deep appearance model based on the Siamese network and introduce online transfer learning into deep appearance model for discriminating different object. In [8], an instance-aware tracker is proposed by integrating single object tracking algorithm to MOT with CNN-based dynamically refreshing model to eliminate noise. In [9], Sadeghian et al. propose a MOT method by integrating deep CNN based appearance model and LSTM based motion model. In addition, significant improvement have achieved in sparse-based tracking. In [10], Zhang et al. proposed a robust structural sparse representation (RSST) method to handle occlusion and illumination changes. Fagot-Bouquet et al. [11] introduce sparse representations into multi-frame data association to gain the MOT performance.

To further improve MOT performance, there are also many works pay their attention on fusing motion affinity and appearance affinity of the tracked target. For example, Yang et al. [12] integrate the non-linear motion model with appearance model to construct a robust association affinity model under the framework of multiple instance learning algorithm. Bae and Yoon [13] exploit trajectory confidence and incremental linear discriminate appearance to assist their two-step data association. Collins et al. [14] showed that higher-order motion constraint has a major effect on improving the quality of data association in MOT, especially when multi-object with similar appearance. In [15], both individual and mutual relation models are introduced in MOT to build graph model. In [16], the pairwise relative motion model is introduced as an additional term to construct CRF energy function. Most recently, the notion of relative motion network is proposed in [17], which is designed to improve the data association performance by utilizing the relative spatial constraint between objects.

However, these methods either devise a discriminative appearance affinity model by exactly extracting the targets features or build pairwise motion constraint to distinguish the target in neighborhood of an image. It will cost huge time to achieve the discriminative appearance feature of the targets in MOT. While the motion constraint often depend on the position of the target estimated by Kalman or particle filter. This kind of motion affinity model has a good performance in where continuous detections without occlusion and non-stationary camera in short period time. However, the MOT tracking system is with different kinds of noise, e.g., missing detections, occlusions and unstable camera motion.

The improve MOT performance, in our work, we formulate the association affinity model by sparse-based appearance affinity model and rank-based motion affinity model as follow:\begin{equation*} S_{link} (T\vert z)=S_{app} (T\vert z)S_{m} (T\vert z)\tag{1}\end{equation*} View SourceRight-click on figure for MathML and additional features. where the association affinity model $S_{link} $ is comprised by two affinity terms from appearance $S_{app} $ and motion $S_{m} $ . $T$ and $z$ represent tracklets and detections, respectively. In our work, sparse representation is not only introduced to build discriminative appearance model of the tracked target, but also used to analysis the occlusion or detection failure. In addition, an improved rank based motion affinity model is proposed to compute the affinity score between the associated tracklets and detections. It considers noise in the trajectory, and takes the new arriving data and ‘too old’ data in online tracking into consideration.

SECTION III.

Online Tracking With Improved Two-Step Data Association

A. Problem Formulation

The task of online MOT is to implement tracklets association between the existing trajectories and a set of detections. By estimating the object state from the current detection responses, it generates long trajectories for the tracked targets. Let ${\mathbb {X}}_{t} =\{x_{t}^{1},\cdots,x_{t}^{M} \}$ and ${\mathbb {Z}}_{t} =\{z_{t}^{1},\cdots,z_{t}^{N} \}$ be the set of object states and detection responses at frame $t$ , respectively. For online MOT, $T_{1:t}^{i} =\{x_{k}^{i} \vert 1\le t_{s}^{i} \le k\le t_{e}^{i} \le t\}$ denotes the trajectory for object $i$ up to frame $t$ , where $T_{1:t}^{i} $ is start from $t_{s}^{i} $ and end at frame $t_{e}^{i} $ . In addition, the set of object trajectories up to frame $t$ is represented as ${\mathbb { T}}_{1:t} $ . Then, the online multi-object tracking is regarded as the optimal tracklet association ${\mathbb { T}}_{1:t} $ by maximizing the posterior probability of ${\mathbb {T}}_{1:t} $ given all detections ${\mathbb {Z}}_{1:t} $ up to frame $t$ :\begin{equation*} {\mathbb {T}}_{1:t}^{\ast } =\mathop {\arg \max }\limits _{{\mathbb {T}}_{1:t}} p({\mathbb {T}}_{1:t},\vert {\mathbb {Z}}_{1:t})\tag{2}\end{equation*} View SourceRight-click on figure for MathML and additional features.

In Eq. (2), the optimal tracklet association is solved relied on the association affinity model and association optimization strategy. In our work, association affinity model is constructed by sparse appearance model and rank based adaptively dynamic motion affinity model. The association optimization strategy is performed by two step data association, high confidence tracklets association in two consecutive frames and low confidence tracklets association by considering the history tracklet information. Furthermore, we introduce the sparse reconstruction error for occlusion analysis and an incremental SVD and downdate SVD method is devised to compute the rank for Hankel matrix in the rank based motion model. The rank also determines the order of history object states used to perform the association between tracklets and detections. This is quite different with the most similar work in [13] (more details works are described in section III.C and section IV).

B. Improved Two-Step Data Association

To perform the two-step data association, we have bipartition the tracklets into high confidence tracklets and low confidence tracklets based on the tracklet confidence. As defined in [13], the tracklet confidence is mainly determined by three factors: the length of a tracklet, occlusion and affinity score between the tracklet and the associated detection. In our work, we define the tracklet confidence relied above factors as follows:\begin{equation*} conf(T^{i})=\left({\frac {1}{L_{i}}\sum \limits _{t\in [t_{s}^{i},t_{e}^{i}]} {S_{link} (T^{i},z^{j})} }\right)\times \exp \left({-\beta \cdot \frac {W}{L_{i} }}\right)\tag{3}\end{equation*} View SourceRight-click on figure for MathML and additional features.

The major difference between our work and the most similar work [13] in tracklet confidence calculation is the way to calculate the association affinity model $S_{link} $ , the length of tracklet $L_{i} $ and the missing frames $W$ of the object $i$ . In our work, $S_{link} $ is calculated by sparse appearance model and rank based dynamic motion affinity model.$L_{i} $ is adaptively determined by the rank based dynamic motion affinity model in section IV rather than simple takes the whole history tracklet as in [13], which is not considered the error caused by false detections or false associations between the tracklets and the detections. The missing frames $W$ is often caused by occlusion or detection failure, which is analyzed by the sparse reconstruction error in our work.$\beta $ is determined by the performance of object detector. Since the trajectory confidence ranges from 0 to 1, $conf(T)>0.5$ denotes that the trajectory $T$ is a high-confidence trajectory.

1) The First Step Association for High Confidence Tracklets in Two Consecutive Frames

By the tracklet confidence measurement in Eq. (3), $k$ high confidence tracklets ${\mathrm{ T}}^{(hi)} $ are associated with the detections ${\mathbb {Z}}_{t} $ in frame $t$ , ${\mathbb {Z}}_{t} =\{z_{t}^{1},\cdots,z_{t}^{n} \}$ . Then, the score affinity matrix $X_{k\times n}^{l} $ is defined as \begin{equation*} X_{k\times n}^{l} =[x_{ij}]_{k\times n},x_{ij} =-\log (S_{link} (T_{L_{i} }^{i(h_{i})},z^{j}))\tag{4}\end{equation*} View SourceRight-click on figure for MathML and additional features. where $S_{link} (T_{L_{i}}^{i(h_{i})},z^{j})$ represents the affinity similarity defined in Eq. (1), $z^{j} \in {\mathbb {Z}}_{t}$ . The Hungarian algorithm [18] is used to determine the pairwise association for $z^{j} $ and $T^{(hi)} $ by minimum the cost of $X_{k\times n}^{l} =[x_{ij}]$ . After associating the $z^{j} $ to the tracklet $T^{(hi)} $ , the position and confidence of the trajectory in Eq. (3) are updated by the associated $z^{j}$ .

2) The Second Step Association for Low Confidence Tracklets by Considering the History Tracklet Information

The low confidence tracklets ${\mathrm{ T}}^{(lo)} $ are more likely to be fragmented, it may be associated with detections or high confidence tracklets, or terminated. Let $q$ low confidence tracklets, $h$ un-matched high confidence tracklets from the first step association, ${\mathbb {C}}_{t} =\{c_{t}^{j} \}_{j=1}^{q'} \subseteq {\mathbb {Z}}_{t} $ is the remained detections not assigned with $T^{(hi)} $ in first step association. The second step association is performed among ${\mathrm{ T}}^{(lo)} $ , $T^{(hi)}$ and $c_{t}^{j} $ , let $T^{i(lo)} $ is associated with the un-matched high confidence tracklets $T^{i(hi)} $ as $A$ ,$T^{i(lo)} $ is associated with the retained detections $c_{t}^{j} $ as $B$ , $T^{i(lo)} $ is terminated as $D$ .

The cost matrix in the second step association is defined as follows:\begin{equation*} X_{(q+{q}')\times (h+q)}^{g} =\begin{bmatrix} {A_{q\times h}} & {D_{q\times q}} \\ {-\log (\tau)_{q'\times h}} & {B_{q'\times q}} \\ \end{bmatrix}\tag{5}\end{equation*} View SourceRight-click on figure for MathML and additional features. where $A=[a_{ij}], a_{ij} =-\log (S_{link} (T_{L_{i}}^{i(l_{o})},T_{L_{j}}^{j(h_{i})}))$ and $B=[b_{ij}]$ , $b_{ij} =-\log (S_{link} (T_{L_{i}}^{i(l_{o})},c^{j(h_{i})})). L_{i} $ and $L_{j} $ denote the length of history trajectory. $D=diag[d_{1},\cdots,d_{q}], d_{i} =-\log (1-conf(T_{L_{i}}^{i(lo)}))$ is the cost to terminate for $T_{L_{i}}^{i(lo)}. \tau $ is a threshold. $X^{g}$ is solved by the Hungarian algorithm, the tracklets and its confidence are updated with the associated results.

C. Association Affinity Model With Sparse Representation and Rank Based Motion Estimation

Follow Eq. (1), the association affinity model in our work is consisted by two affinity terms. The first term in Eq. (1) is the appearance affinity score between the tracklet and the associated detection, which is calculated based on sparse representation. The second term is the motion affinity computed by rank based dynamics motion model in section IV. For a detection $z$ in frame $t$ , the sparse representation for $z$ can be represented as follows:\begin{equation*} \alpha =\mathop {\arg \min }\limits _{\boldsymbol {\alpha }} \frac {1}{2}\left \|{ {z-F\alpha } }\right \|_{2}^{2} +\lambda \left \|{ \alpha }\right \|_{2,1}\tag{6}\end{equation*} View SourceRight-click on figure for MathML and additional features. where $F$ is a dictionary that includes all feature vectors belonged to each trajectory, $F=\{z_{1},z_{2},\cdots,z_{n} \}$ is consisted by the detections $z^{j} $ assigned to trajectory $T^{i} $ . The main idea behind our sparse-based appearance affinity model is that if $z^{j}$ is a part of $T^{i} $ , $z^{j}$ should be best represented by the detections of its own trajectory $T^{i} $ rather than using detections from other trajectories. Therefore, the reconstruction error $R_{e} (z)$ between $z^{j}$ and $T^{i} $ can reflect the similarity between them, which is defined as follows:\begin{equation*} R_{e} (z)=\left \|{ {z-F\alpha } }\right \|_{2}^{2}\tag{7}\end{equation*} View SourceRight-click on figure for MathML and additional features.

1) The Appearance Affinity Score for a Pairwise Tracklet-Detection Association in High Confidence Tracklets Association

As described in section III.B, the association optimization is performed by two-step. Since the high confidence tracklets association is only implemented between the detection $z^{j} $ and high confidence trajectory $T^{(hi)} $ in two consecutive frames, the appearance affinity score between them is defined as follows:\begin{equation*} S_{app} (T^{(hi)},z^{j})=\frac {1}{\xi }\exp (-R_{e} (z^{j}))\tag{8}\end{equation*} View SourceRight-click on figure for MathML and additional features. where $\xi $ is a normalization factor. $z^{j} $ is considered to be the most likely candidate for $T_{i}^{H}$ if they achieved the smallest reconstruction error for all detections with respect to the assigned tracklets $T_{i}^{H}$ .

2) The Appearance Affinity Score for the Low Confidence Tracklets Association

Since the appearance affinity score in low confidence tracklets association involved two kind associations, the pairwise tracklet-tracklet association between the low confidence tracklets $T^{(lo)} $ and the un-matched high confidence tracklets $T^{(hi)} $ , and the tracklet-detection pairs association for the low confidence tracklets $T^{(lo)} $ and the retained detections $c_{t}^{j} $ , which is same in the high confidence tracklets association stage. We define the appearance affinity score $S_{app} (T_{L_{i}}^{(l_{o})},T_{L_{j}}^{(h_{i})})$ for tracklet-tracklet association by averaging the value of all reconstruction errors as in Eq. (9), which describe the likelihood that whether they belong to the same trajectory or not.\begin{equation*} S_{app} (T_{L_{i}}^{(l_{o})},T_{L_{j}}^{(h_{i})})=\frac {1}{\xi }\exp \left ({{-\frac {\sum \limits _{j=1}^{N_{s}} {R_{e} (z^{j})}}{N_{s} }} }\right)\tag{9}\end{equation*} View SourceRight-click on figure for MathML and additional features. where $z^{j}\in T_{L_{i}}^{(l_{o})} $ , $j\in N_{s} $ is the number of detections in trajectory $T_{L_{i}}^{i(l_{o})} $ . $R_{e} (z^{j})$ is computed as in Eq. (7).

In terms of tracklet-detection pairs association between $T^{(lo)}$ and $c_{t}^{j} $ , the appearance affinity score $S_{app} (T_{L_{i} }^{(l_{o})},c_{t}^{j})$ for them is calculated as Eq. (8).

3) Missing Frames Analysis Based on Sparse Reconstruction Error

Due to the main concept behind our sparse-based appearance affinity model is that a detection response should be best represented by the detections of its own trajectory rather than detections from other trajectories [19]. Therefore, the sparse reconstruction error between a detection and a tracklet can reflect the similarity between them, then, it is naturally to use the sparse reconstruction error analyzed the missing frames in Eq. (2), the missing frames often caused by occlusion or detection failure, in our work, if the smallest reconstruction error $R(z)$ computed in Eq. (7) make the $S_{app} $ in Eq. (8) and (9) is smaller than a pre-defined threshold (set 0.6 in our work), it will no detection assigned to the tracklets, the number of missing frames $W$ for tracklet $T_{i} $ is increased by 1.

The motion affinity $S_{m} $ in Eq. (1) between tracklets or between a tracklet and a detection is computed by rank based dynamics motion affinity model in section IV.

SECTION IV.

Improved Rank Based Motion Affinity Model

The rank based motion affinity model proposed in [20] is defined as follows:\begin{equation*} y_{t} =\sum \limits _{i=1}^{m} {a_{i} y_{t-i}},\quad m\le \mathbf {L},~t\ge s+m\tag{10}\end{equation*} View SourceRight-click on figure for MathML and additional features. where $y_{t} $ is the position of a trajectory in any frame $t$ , $a_{i} $ is the regression coefficient, $\mathbf {L}$ is the length of a trajectory. $m$ is the order of regression model.

As defined in [20], for an ordered tracklet sequence $\{y_{t} \}$ generated from Eq. (10), $m=rank(H_{T^{i}}) $ equals the rank of the corresponding Hankel matrix. Specially, the Hankel matrix can be used to approximate the high-order linear state-space models of the dynamic system [21]. The Hankel matrix $H_{T^{i}} $ is constructed by the position of a tracklet with $\mathbf {L}\ge m$ columns as follows:\begin{equation*} H_{T^{i}} =\begin{bmatrix} {y_{s}} & {y_{s+1}} & \cdots & {y_{s+\mathbf {L}_{i} /2}} \\ {y_{s+1}} & {y_{s+2}} & \cdots & {y_{s+\mathbf {L}_{i} /2+1}} \\ \vdots & \vdots & \vdots & \vdots \\ {y_{s+\mathbf {L}_{i} /2}} & {y_{s+\mathbf {L}_{i} /2+1}} & \cdots & {y_{s+\mathbf {L}_{i}}} \\ \end{bmatrix}\tag{11}\end{equation*} View SourceRight-click on figure for MathML and additional features. where $\mathbf {L}_{i} =t-s+1$ is the length of tracklet $T_{i} $ , starting from frame $s$ .

One major challenging for motion affinity in [20] is how to effectively estimate the rank of a Hankel matrix for a tracklet dynamics model in Eq. (10). The other challenging is that the Hankel matrix constructed by the positions of a trajectory is corrupted by noise, which is caused by false associations between tracklets or missing detections. In addition, most existing methods use the fully singular value decomposition (SVD) to compute the rank of Hankel, which is exponentially growth as the length $\mathbf {L}$ of the trajectory increases. Therefore, the existed methods such as [20]–​[22] may not be appropriate for online MOT since they perform a full SVD decomposition with additional constraints and the length of a trajectory increases with time, and this situation is even more worse if the data are corrupted by noise.

Aim to solve above issues, in our work, we follow the idea of [20] and define the improved rank based motion affinity model as Eq. (12), which introduce the incremental SVD and downdate SVD decomposition for real-time streaming data to estimate the rank for Hankel matrix.\begin{equation*} S_{m} (T^{i},T^{j})=\frac {rank(\mathrm {H}(T^{i})+\mathrm {rank}(\mathrm {H}(T^{i})))}{rank(H(T^{ij}))}-1\tag{12}\end{equation*} View SourceRight-click on figure for MathML and additional features. where $S_{m} (T^{i},T^{j})$ is the motion affinity between two tracklets $T^{i} $ and $T^{j} $ in frame $t$ , $T^{ij} $ is the joint tracklet between $T^{i} $ and $T^{j} $ .

In our work, the motion affinity $S_{m} (T_{L_{i}}^{i(l_{o})},T_{L_{j}}^{j(h_{i})})$ between the low confidence tracklet $T_{L_{i} }^{i(l_{o})} $ and the high confidence tracklet $T_{L_{j}}^{j(h_{i})} $ is computed by the $L_{i} $ and $L_{j} $ length of history trajectory information, respectively. $L_{i} $ and $L_{j} $ are equal to the rank of Hankel matrix in Eq. (11). While the motion affinity $S_{m} (T_{L_{i}}^{i},z^{j})$ between a tracklet and a detection is calculated by using history trajectory of $T_{L_{i}} $ and $T^{z^{j} }$ , where $T_{L_{i}} $ can be either high confidence tracklet or low confidence tracklet. $H(T_{L_{i}})$ is represented as $H(T_{L_{i}})=[y_{t-L_{i} +1},\cdots,y_{t}]$ , where $y_{t} $ is the predicted position for target $i$ computed in Eq. (10).

1) Online Rank Estimation Uses Incremental SVD Decomposition With New Data Arrival

In online tracking, we devise a scheme for rank estimation of the Hankel matrix by an incremental SVD decomposition of real-time streaming data [23], it incorporate the new arrive data into the SVD as it arrives. The incremental SVD decomposition of a $p\times q$ matrix can be computed in linear time,$O(p\cdot q\cdot n)$ , where $n$ is the rank for a given matrix.

In order to consideration the noise in trajectory, let $\mathbf {y}_{t} =[y_{s},\cdots,y_{e}]$ denotes a trajectory of length $\mathbf {L}$ start frame $s$ to $e$ with noise caused by false associations or missing detections, $e\le t$ at frame $t$ . Then, the Hankel matrix is decomposed into two parts, the noiseless position ${\hat {\boldsymbol {y}}}_{t} =[\hat {y}_{s},\cdots,\hat {y}_{e}]$ and noise $\boldsymbol {\eta }_{t} =[\hat {\eta }_{s},\cdots,\hat {\eta }_{e}]$ as follows:\begin{equation*} \mathbf {y}_{t} ={\hat {\boldsymbol {y}}}_{t} +\boldsymbol {\eta }_{t}\tag{13}\end{equation*} View SourceRight-click on figure for MathML and additional features. From Eq. (10) and (11), we can rewrite Eq. (13) as follows:\begin{equation*} H=Y+E\tag{14}\end{equation*} View SourceRight-click on figure for MathML and additional features. where $H\in S_{H},S_{H} $ is the set of Hankel matrix for a tracklets. $Y$ and $E$ are the corresponding Hankel matrix consist by $\hat {\boldsymbol {y}}_{t} $ and $\boldsymbol {\eta }_{t} $ , respectively.

Eq. (14) is rewritten via the incremental PCP algorithm as follows:\begin{equation*} \mathop {argmin}\limits _{A,E} \left \|{ Y }\right \|_{\ast } +\lambda \left \|{ E }\right \|_{1},\quad s.t.~ H=Y+E\tag{15}\end{equation*} View SourceRight-click on figure for MathML and additional features.

By solving Eq. (15), the noiseless trajectory of a target is achieved. The convex relation form of Eq. (15) is as follows:\begin{equation*} \mathop {argmin}\limits _{\mathbf {a},\eta }~ rank(Y)+\lambda \left \|{ E }\right \|_{0},\quad s.t.~ H=Y+E\tag{16}\end{equation*} View SourceRight-click on figure for MathML and additional features.

For frame $t$ , the low-rank $Y$ and sparse $E$ satisfy:\begin{equation*} \mathbf {y}_{t} =\hat {\boldsymbol {y}}_{t}^{n} +\boldsymbol {\eta }_{t}^{n}\tag{17}\end{equation*} View SourceRight-click on figure for MathML and additional features. where $\mathbf {y}_{t} =[y_{t-n+1},\cdots,y_{t}]$ , $y_{t} $ is estimated by Eq. (10). $n$ denotes that $y_{t} $ is achieved by the past $n$ frames’ position information. The low rank components, $H^{T_{i}} $ and $H^{T_{j}} $ , for two positions $y_{t}^{i} $ and $y_{t}^{j} $ , will be different if $y_{t}^{i} $ and $y_{t}^{j} $ belong to different trajectories.

Given $H_{t} \in \Re ^{(L-n+1)\times n} $ , $Y_{t} +E_{t} =H_{t}$ , $t$ is the frame index. The SVD decomposition for $H_{t-1} $ is as $H_{t-1} =U\sum V^{T} $ , where $\sum \in R^{r\times r}$ . In online tracking, with a new arriving data $y_{t} $ at the $t\_{}th$ frame, we need to repeatedly update $H$ by appending a row or a column. After each update, we compute the SVD of the resulting matrix. Then, by appending the new coming data $y_{t} $ at the last column, the incremental SVD can be rewritten as $H_{t} ={U}'{\sum }'{V}'$ , where ${\sum }'\in R^{r\times r}$ or ${\sum }'\in R^{(r+1)\times (r+1)}$ .\begin{equation*} H_{t} =[H_{t-1},\mathbf {y}]=[H_{t-1} \;\mathbf {0}]+\mathbf {ye}^{T}\tag{18}\end{equation*} View SourceRight-click on figure for MathML and additional features. where $H_{t} $ is the updated Hankel matrix, $\mathbf {y}$ is the appending last column in Hankel matrix with the new arriving position at frame $t$ . $\mathbf {e}$ is a unit vector, 0 is a zero column vector.

We compute the SVD of $H_{t} $ by taking advantage of known the SVD of $H_{t-1} $ . Then, \begin{equation*} H_{t} =\begin{bmatrix} U & {\mathbf {y}} \\ \end{bmatrix}\begin{bmatrix} \sum & {\mathbf {0}} \\ {\mathbf {0}^{T}} & 1 \\ \end{bmatrix}\begin{bmatrix} {V^{T}} & {\mathbf {0}} \\ {\mathbf {0}^{T}} & 1 \\ \end{bmatrix}\tag{19}\end{equation*} View SourceRight-click on figure for MathML and additional features.

By Gram-Schmidt orthonormalization of $\mathbf {y}$ , Eq. (19) can be rewritten as:\begin{equation*} H_{t} =\begin{bmatrix} U & {\mathbf {p}} \\ \end{bmatrix}\begin{bmatrix} \sum & {U^{T}\mathbf {y}} \\ {\mathbf {0}^{T}} & \rho \\ \end{bmatrix}\begin{bmatrix} {V^{T}} & {\mathbf {0}} \\ {\mathbf {0}^{T}} & 1 \\ \end{bmatrix}\tag{20}\end{equation*} View SourceRight-click on figure for MathML and additional features. where\begin{equation*} \rho =\left \|{ {\mathbf {y}-UU^{T}\mathbf {y}} }\right \|_{2},\mathbf {p}=\frac {\mathbf {y}-UU^{T}\mathbf {y}}{\rho }.\end{equation*} View SourceRight-click on figure for MathML and additional features.

Let the middle matrix of the right-hand-side of Eq. (20) is \begin{equation*} {M}'=\begin{bmatrix} \sum & {U^{T}\mathbf {y}} \\ {\mathbf {0}^{T}} & \rho \\ \end{bmatrix}\in \Re ^{(n+1)\times (n+1)}.\end{equation*} View SourceRight-click on figure for MathML and additional features. Computing the full SVD of ${M}'$ by:\begin{equation*} SVD({M}')=G\hat {\sum }Q^{T}\tag{21}\end{equation*} View SourceRight-click on figure for MathML and additional features.

Then the SVD of $H_{t} $ can be rewritten as follows:\begin{equation*} H_{t} =\begin{bmatrix} U & {\mathbf {p}} \\ \end{bmatrix}\left({G\hat {\sum }Q^{T}}\right)\begin{bmatrix} {V^{T}} & {\mathbf {0}} \\ {\mathbf {0}^{T}} & 1 \\ \end{bmatrix}\tag{22}\end{equation*} View SourceRight-click on figure for MathML and additional features.

From Eq. (21) and (22), we can see that ${M}'$ is not related to $U$ , the singular values and the singular vectors of $H_{t} $ can be updated without it, where \begin{equation*} {U}'=\begin{bmatrix} U & {\mathbf {p}} \\ \end{bmatrix}G,\quad {\sum }'=\hat {\sum },~{V}'=\begin{bmatrix} {V^{T}} & {\mathbf {0}} \\ {\mathbf {0}^{T}} & 1 \\ \end{bmatrix}Q.\end{equation*} View SourceRight-click on figure for MathML and additional features.

2) Online Rank Estimation Uses Downdate SVD Decomposition With Deleting ‘Too Old’ Data

Due to the data are too large to be stored or even buffered with repeatedly update $H_{t} $ in online tracking, we need to ‘forget’ some frames that are ‘too old’ and delete a row or a column of $H_{t} $ . Given $H_{t} =[\mathbf {y}_{t-r+1},H_{t-r\;:t}]=U\sum V$ , we compute downdate $SVD(H_{t-r:t})$ with $r$ singular values.\begin{equation*} [\mathbf {0},H_{t-r\;:t}]=[\mathbf {y}_{t-r+1},H_{t-r\;:t}]+(-\mathbf {y}_{t-r+1})\mathbf {e}^{T}\tag{23}\end{equation*} View SourceRight-click on figure for MathML and additional features. where $\mathbf {y}_{t-r+1} $ is one column vector of the Hankel matrix in Eq. (11). We compute the SVD of $H_{t-r\;:t} $ by using $H_{t}$ , then, \begin{align*} [\mathbf {0},H_{t-r\;:t}]=&[\mathbf {y}_{t-r+1},H_{t-r\;:t}]+(-\mathbf {y}_{t-r+1})\mathbf {e}^{T} \\=&\begin{bmatrix} U & {\mathbf {-y}_{t-r+1}} \\ \end{bmatrix}\begin{bmatrix} \sum & {\mathbf {0}} \\ {\mathbf {0}^{T}} & 1 \\ \end{bmatrix}\begin{bmatrix} {V^{T}} \\ {\mathbf {e}^{T}} \\ \end{bmatrix}\tag{24}\end{align*} View SourceRight-click on figure for MathML and additional features.

By Gram-Schmidt orthonormalization of $\mathbf {y}_{t-r+1} $ and $\mathbf {e}$ , Eq. (24) can be rewritten as:\begin{align*} [\mathbf {0},H_{t-r\;:t}]=&[\mathbf {y}_{t-r+1},H_{t-r\;:t}]+(-\mathbf {y}_{t-r+1})\mathbf {e}^{T} \\=&\begin{bmatrix} U & {\mathbf {0}} \\ \end{bmatrix}\begin{bmatrix} {\sum -\sum \mathbf {ww}^{T}} & {-\rho _{\mathbf {w}} \cdot \sum \cdot \mathbf {w}} \\ {\mathbf {0}^{T}} & {\mathbf {0}} \\ \end{bmatrix} \\&\times \,\begin{bmatrix} {V^{T}} \\ {\mathbf {q}^{T}} \\ \end{bmatrix}\tag{25}\end{align*} View SourceRight-click on figure for MathML and additional features. where \begin{align*} \mathbf {w}=&V^{T}\mathbf {e}=V(t-r+1,:)^{T},\rho _{\mathbf {w}} =\sqrt {1-\mathbf {w}^{T}\mathbf {w}},\\ \mathbf {q}=&\frac {\mathbf {e}-V\mathbf {w}}{\rho _{\mathbf {w}} },\mathbf {y}_{t-r+1} =U\sum \mathbf {w}.\end{align*} View SourceRight-click on figure for MathML and additional features.

Let the middle matrix of the right-hand-side of Eq. (25) is\begin{equation*} M=\begin{bmatrix} {\sum -\sum \mathbf {ww}^{T}} & {-\rho _{\mathbf {w}} \cdot \sum \cdot \mathbf {w}} \\ {\mathbf {0}^{T}} & {\mathbf {0}} \\ \end{bmatrix}.\end{equation*} View SourceRight-click on figure for MathML and additional features. Computing full SVD of $M$ by:\begin{equation*} SVD(M)=G\hat {\sum }Q^{T}\tag{26}\end{equation*} View SourceRight-click on figure for MathML and additional features.

Then the downdate SVD of $H_{t-r\;:t} $ is rewritten as:\begin{equation*} [\mathbf {0}, H_{t-r\;:t}]=\begin{bmatrix} U & {\mathbf {0}} \\ \end{bmatrix}\cdot \left({G\hat {\sum }Q^{T}}\right)\cdot \begin{bmatrix} {V^{T}} \\ {\mathbf {q}^{T}} \\ \end{bmatrix}\tag{27}\end{equation*} View SourceRight-click on figure for MathML and additional features.

From Eq. (26) and (27), we see that $M$ is not related to $U$ , the singular values and the singular vector of $H_{t-r\;:t} $ can be updated without it. In downdate SVD of $H_{t-r\;:t} $ , \begin{equation*} {U}'=\begin{bmatrix} U & {\mathbf {0}} \\ \end{bmatrix}G,\quad {\sum }'=\hat {\sum },~{V}'=\begin{bmatrix} {V^{T}} \\ {\mathbf {q}^{T}} \\ \end{bmatrix}\cdot Q.\end{equation*} View SourceRight-click on figure for MathML and additional features.

From the above analysis, we can find the optimal rank of the Hankel matrix in Eq. (11) and achieve the cleaned trajectory at any frame $t$ . Although the cleaned trajectory $\hat {\boldsymbol {y}}_{t} =[\hat {y}_{s},\cdots,\hat {y}_{e}]$ cannot be used to correct the past tracking results in online tracking, it is useful for future tracking since the correct trajectory is used to predict the future object state, which is beneficial to form a reliable motion affinity model for the tracked objects. In addition, by the rank-based motion affinity estimation method, we can accurately achieve the order history trajectory information that is related with the current object state. In this way, the limited but more accurate history information is considered to implement the association between the tracklets and the detections, instead of taking the whole history trajectory of the object states into consideration as in [13].

Finally, after the association optimization between the tracklets and the detections, the non-associated detections are remained to initialize new potential trajectories. A new trajectory is found when it grows over five consecutive frames. The non-associated trajectories are terminated if they are unassociated in five consecutive frames. The main steps of the proposed online multi-object tracking method are summarized in Algorithm 1.

SECTION Algorithm 1

The Overall Algorithm for the Proposed Online Multi-Object Tracking

Input:

object state set ${\mathbb {X}}_{t} $ and the set of detection responses ${\mathbb {Z}}_{t} $ in frame $t$

Output:

updated trajectory set Tt

Step1:

construct association affinity model

1:

sparse-based appearance affinity model:

2:

calculate the appearance affinity score for pairwise tracklet-detection association and tracklet-tracklet association as in Eq. (8) and Eq. (9), respectively;

3:

compute the rank based motion affinity as Eq. (12);

Step2:

data association optimization

1:

calculate the tracklet confidence according to Eq. (3), bipartition tracklets into high and low confidence tracklets;

2:

missing frames analysis based on sparse reconstruction error as in section III.C;

3:

perform the first step data association for high confidence tracklets in two consecutive frames;

4:

perform the second step data association for low confidence tracklets by considering the history tracklet information as in section III.B;

Step3:

construct association matrix $X_{k\times n}^{l} $ and $X_{(q+{q}')\times (h+q)}^{g} $ for two step associations. Data association is solved by the Hungarian algorithm;

Step4:

update the trajectory confidence and the tracklets after assigning the associated results.

SECTION V.

Experiments

In this section, we validate the effectiveness of the proposed method and make a comparison with several state-of-the-art multi-object tracking methods.

A. Datasets

To evaluate the performance of the proposed method, experiments are conducted on three typical public dataset: the PETS 2009 dataset, the TUD dataset,and MOT 2015 dataset. The PETS 2009 dataset contains pedestrian sequences with different crowded density, and we use the S2 dataset, which is devised for pedestrian tracking (S1 focuses on person counting and density estimation while S3 is devised for flow analysis and event recognition). The TUD dataset contain sequences captured on streets at a very low camera angle in close range. Due to the low viewpoint of camera, there are often full occlusions among pedestrians. The crossing, campus and Stadtmitte sequences from TUD dataset are used for evaluation. The MOT2015 dataset include 11 challenging sequences cover various scenes, such as occlusions, clutter background, scale and shape changes, moving camera and stationary camera.

B. Evaluation Metrics

To comprehensively evaluate the performance for multi-object tracking algorithm, the CLEAR MOT metrics [24] as well as the metrics defined in [25] are used for quantitative evaluation, including MOTP($\uparrow$ ), MOTA($\uparrow$ ), FP($\downarrow$ ), FN($\downarrow$ ), IDS($\downarrow$ ), GT, MT($\uparrow$ ), ML($\downarrow$ ), PT and FM($\downarrow$ ).For those metrics, $\uparrow $ means that the higher score is the better the performance is. $\downarrow $ means that the lower score is the better results.

C. Parameter Setting and Baseline Methods

The proposed online MOT algorithm is performed on MATLAB with an Intel Core i7 8GHz PC. In the experimental, we set the pre-defined threshold $\tau =0.4$ for two step data association, $\beta =1.2$ in Eq. (3).

We compare our work with 21 state-of-the-art MOT methods, including online trackers and batch trackers. Ten of them are online methods (RMOT [17], SCEA [26], CMOT [13], TSML [20], HDAOH [27], MHMHT [28], PRIMPT [29], PHD_GSDL [30], TENSOR [31], SAS_MOT15 [32], RNN_LSTM [33]), the rest 11 are batch methods (DCT [34], SMOT [35], CEM [36], KSP [37], [20], DTLE [38], H2T [39], MTTCEM [40], DP_NMS [41], JPDA_100 [42], SiameseCNN [43], CNNTCM [44]). Because some of the state-of-the-art methods do not provide publicly available source codes, we just use the results reported in their published papers.

D. Results and Analyses

1) Evaluation on the Pets Dataset

Some exemplars of the tracking results from the PETS 2009 S2.L1, S2.L2 and S2.L3 sequences are shown in Fig. 2. The quantitative evaluation results are shown in Table 1. This dataset contains different levels of crowded density (low to high crowded), frequent interactions between pedestrians and illumination changes.

TABLE 1 Performance Comparison on PETS 2009 Dataset
Table 1- 
Performance Comparison on PETS 2009 Dataset
FIGURE 2. - The tracking results of the proposed method on PETS 2009 Dataset sequences. At each frame, objects are attached with bounding boxes and ID labels with different colors.
FIGURE 2.

The tracking results of the proposed method on PETS 2009 Dataset sequences. At each frame, objects are attached with bounding boxes and ID labels with different colors.

a: Low-Density Sequence

The S2.L1 sequence is one of the widely used low-density but long-frame multi-pedestrian tracking sequences with frequently occlusion, closely moved pedestrians with similar appearance and complicated motion. Table 1 shows that the proposed method outperforms the state-of-the-art trackers with higher scores in MOTA and MT, lower scores in IDs and FM. Compared with the CMOT tracker [13], the most related work, our method has improved performance, the MOTA and MT scores are improved 9.06% and 26.3%, respectively. The number of ID switches and FM are only 3 and 6, respectively. The significant low IDS and FM indicates that our association affinity model is superior to the method by [13]. This is owing to the rank based motion affinity model, which is used to build the affinity association model, can effectively handle missing detections and complicated motion.

b: Medium-Density Sequence

The S2.L2 sequence contains 436 frames with large number of pedestrians, who moved in different directions with significantly appearance changes due to occlusions and illumination changes. The frequently occlusions include pedestrian occluding each other and static occlusion with traffic sign. As expected, the two-step association optimization with sparse-based appearance model and rank-based motion model can increase the tracking performance, this is because the sparse based appearance reconstruction is used to count the missing frames, and the rank based motion model use useful tracklet information to calculate the tracklet confidence during association optimization. The MT and MOTA scores of the proposed method are 79% and 71.9%, respectively. Some of the online trackers, such as RMOT, SCEA, CMOT and TSML, have poor tracking performance with MT 9.5%, 0, 71.62% and 31%, respectively. The MOTA scores are 37.2%, 21.1%, 70.12% and 59.7%, respectively. Meanwhile, the offline trackers, such as CEM, KSP, DTLE, DP_NMS, JPDA_100, H2T and MTTCEM also not have satisfied tracking performance. The MT scores for those offline trackers are all less than 40%, the MOTA scores are smaller than 65%. The good tracking performance of the proposed method shows that the rank based motion affinity model in our work, which takes the high order object states to guide association optimization during tracking, is different from the traditional motion models like Kalman or Markov model. It is beneficial to deal with the ambiguities caused by occlusions or detection failures.

c: High-Density Sequence

The S2.L3 sequence is a high crowded pedestrian sequence contains occlusions, shadows and illumination changes. The proposed method achieves superior performance comparing with other competing methods with higher scores in MT, MOTA and lower scores in ML, IDs and FM. Our low IDs indicates that our method can effectively overcome the tracking fragments, proves the effectiveness and robustness of the improved association affinity model. The improved association affinity, consisted by the sparse appearance representation and the rank based motion affinity, aims to depress the ambiguous detection responses in online tracking. The sparse reconstruction error plays an occlusion indicator in our work. In addition, we devise the incremental SVD and downdate SVD decomposition method to calculate the rank of the Hankel matrix in rank based motion model. It considers the noise in the trajectory, and achieves the noiseless trajectory information of a target in the rank estimation process. The cleaned trajectory information is used to predict object state for next frame, this is useful to eliminate ambiguous caused by similar appearance of the targets and occlusions. These strategies are helpful for reducing the link switches of the association between tracklets, hence, our method has relatively low IDs.

2) Evaluation on the TUD Dataset

Some exemplars of the tracking results for the TUD Stadtmitte, crossing and campus sequences in the TUD datasets are shown in Fig. 3. The quantitative evaluation results are shown in Table 2. From the results, we can see that our method has a superior performance. Compared with the state-of-the-art trackers, the proposed method has higher scores in MOTA and MT, lower scores in FN, FP, IDs and FM. The good performance for our work is owned to the proposed affinity association model, which is calculated by fusing the sparse-based appearance affinity model and a discriminative dynamic motion affinity model. Moreover, the proposed rank-based dynamic motion affinity model, which is different from the general rank calculation method, is based on the incremental SVD and downdate SVD decomposition method. This is specially devised for online tracking because of the data explosion with time from stream data or it is too large to be stored or even buffered. Then the rank-based motion affinity model is used to distinguish targets with similar appearance and deal with noise. By using the noiseless high order trajectory information to build dynamic motion model of the target, it is beneficial to recover fragmented tracklets and maintain target identities.

TABLE 2 Performance Comparison on TUD Dataset
Table 2- 
Performance Comparison on TUD Dataset
FIGURE 3. - The tracking results of the proposed method on TUD Dataset sequences. At each frame, objects are attached with bounding boxes and ID labels with different colors.
FIGURE 3.

The tracking results of the proposed method on TUD Dataset sequences. At each frame, objects are attached with bounding boxes and ID labels with different colors.

3) Evaluation on the PET2005 Dataset

We compare our algorithm with 15 state-of-the-art MOT trackers in 2D MOT 2015 dataset, including 8 online trackers and 7 offline tracking. In addition, 4 deep learning methods are included to compare with our MOT tracker. Some exemplars of the tracking results for MOT2015 dataset are shown in Fig. 4. The quantitative evaluation results are shown in Table 3. As shown in Table 3, our tracker achieves the best performance in MT and ML and the second best MOTA and IDs with 32.14% and 438, respectively. The good tracking performance of the proposed method indicates that the association optimization model in our method is more effective than other trackers. Our two-step association optimization method pays more attention on how to accurately compute the tracklet confidence in order to bipartite tracklets into high confidence and low confidence tracklets. The tracklet confidence computation is guided by rank based motion estimation to analysis the useful history trajectory information and relied on the sparse reconstruction to analysis the missing frames caused by occlusion or detection failure. This is helpful for eliminating ambiguous association, recovering fragment tracklets being tracked and preventing objects identities from switches. Meanwhile, the order of history trajectory is online determined based on the rank estimation method, which is different from the state-of-the-art methods. In our work, the incremental SVD and downdate SVD decomposition method is specially devised for online tracking by considering new data and forgetting ‘too old’ data. Both those strategies are helpful for eliminating ambiguous association and preventing objects identities from switches.

TABLE 3 Performance Comparison on 2D MOT 2015 Dataset
Table 3- 
Performance Comparison on 2D MOT 2015 Dataset
FIGURE 4. - The tracking results of the proposed method on MOT2015 Dataset sequences. At each frame, objects are attached with bounding boxes and ID labels with different colors.
FIGURE 4.

The tracking results of the proposed method on MOT2015 Dataset sequences. At each frame, objects are attached with bounding boxes and ID labels with different colors.

4) Ablation Analysis

For better exploration the contribution of each part in our proposed method, we have conducted ablation studies to validate the effectiveness of the sparse based appearance affinity model and the rank-based motion affinity model in 2D MOT 2015 training set. The 2D MOT 2015 training set is split into a training set with 5 sequences and a validation set with 6 sequences. In terms of the rank-based motion affinity model, we divide it into the higher order motion model based fully SVD decomposition and the improved rank based motion model based incremental SVD and downdate SVD decomposition.

First, we use color histogram to calculate the appearance affinity in Eq. (1) and denotes it as P1 tracker. Then, to validate how the motion affinity in our work influenced the tracking performance, the motion affinity model in Eq. (10) is firstly replaced by the Markov model, donates it as P2 tracker. Then, the fully SVD decomposition is used to estimate the rank of Hankel matrix in Eq. (12) instead of the incremental SVD and downdate SVD decomposition in our work, we denote it as P3 tracker.

Table 4 shows the evaluation results of P1-P3 trackers and the proposed method. As expected, the proposed method has an improved performance than P1 and P2 trackers and has a comparable tracking result with P3. The poor performance for P1 tracker is caused by the appearance representation. Since the proposed method not only use the sparse representation to construct appearance affinity in Eq.(1), but also rely on the sparse reconstruction to analysis the occlusion, the appearance affinity model build by sparse representation is more robust than the color histogram as in P1 tracker. Hence, an apparent decline of MOTA and MT are witnessed in P1 tracker. Similar effect is happened to P2 tracker, in addition, the number of IDs is increased in P2 tracker since the Markov motion model is used to replace the high order motion in our work. The reason for this phenomenon is that the limit history states information is used to estimate the target states. When the occlusion or miss detection occurs, seldom reliable information is available to predict object state, especially frequently occlusions or ambiguities occur in long period of time. As a result, the association affinity model calculated based on the confused object state, has no ability to distinguish object from similar appearance or occlusion. Hence, it easily cause identity switches. Contrast, the higher order motion model in our work, different from the Markov motion model, it can use the history reliable information to guide the tracker prediction object states, especially in situation where no detection is available due to missing detections or occlusions. The proposed motion affinity model allows the tracking system correctly maintaining identities of the tracked targets and recovers trackets from occlusion, which guarantees our algorithm with low IDs. Table 4 supports this analysis. Furthermore, by comparing our method with P3 tracker, as shown in Table 4, the performance of our approximation algorithm, using incremental SVD and downdate SVD decomposition to compute the motion affinity, is not fade the tracking performance. The MOTA, MT and IDs for the proposed method are slightly poorer than P3 tracker, which imply that the tracking performance in our work is seldom influenced by the approximation algorithm.

TABLE 4 Ablation Analysis on MOT2015 Validation Set
Table 4- 
Ablation Analysis on MOT2015 Validation Set

5) Runtime Performance

We implement the proposed online MOT algorithm on MATLAB with an Intel Core i7 8GHz PC without any optimization and parallel programming. In the experiment, given the detection responses, we present the average runtime performance for trackers by averaging 5 times for all the evaluated sequences. The results are shown in Table 5 and Fig.5. The velocity of the trackers is measured by frame-per-second (FPS). From Table 5 and Fig. 5, we can see that the proposed method performs well in running time, which has a comparative result with the state-of-the-art methods in MOT. Our average velocity reaches 17.02 FPS, which is able to satisfy many tracking applications.

TABLE 5 Runtime Performance (FPS)
Table 5- 
Runtime Performance (FPS)
FIGURE 5. - The performances compare between the proposed method and several competing trackers. Each marker denotes a tracker accuracy and speed measured in FPS, the higher and more right is better.
FIGURE 5.

The performances compare between the proposed method and several competing trackers. Each marker denotes a tracker accuracy and speed measured in FPS, the higher and more right is better.

SECTION VI.

Conclusion

In this paper, we propose an online multi-object tracking algorithm that aims to gain the tracking performance by building an optimized data association framework and association affinity model. The data association task is performed by two step association based on the tracklet confidence, which relies on the improved association affinity model. The association affinity model, consisted by the sparse appearance representation and the rank based motion affinity, aims to depress the ambiguous detection responses in online tracking. The sparse reconstruction error plays an occlusion indicator in our work. We have devised the incremental SVD and downdate SVD decomposition method to calculate the rank of the Hankel matrix in rank based motion model. It considers the noise in the trajectory, and achieves the noiseless trajectory information of a target in the rank estimation process. Experimental results on three challenging datasets show that the proposed method has a good tracking performance.

Cites in Papers - |

Cites in Papers - IEEE (2)

Select All
1.
Francisco Javier Alvarez-Prieto, Mario I. Chacon-Murguia, Juan A. Ramirez-Quintana, "Analysis of CNN Models to develop a New Appearance Model for Multiple-Object-Tracking", 2020 17th International Conference on Electrical Engineering, Computing Science and Automatic Control (CCE), pp.1-6, 2020.
2.
Kwangjin Yoon, Jeonghwan Gwak, Young-Min Song, Young-Chul Yoon, Moon-Gu Jeon, "OneShotDA: Online Multi-Object Tracker With One-Shot-Learning-Based Data Association", IEEE Access, vol.8, pp.38060-38072, 2020.

Cites in Papers - Other Publishers (1)

1.
Shaojian Song, Yuanchao Li, Qingbao Huang, Gang Li, "A New Real-Time Detection and Tracking Method in Videos for Small Target Traffic Signs", Applied Sciences, vol.11, no.7, pp.3061, 2021.

References

References is not available for this document.