Journals & Magazines >IEEE Geoscience and Remote Se... >Volume: 19

Moving Object Detection in Satellite Videos via Spatial–Temporal Tensor Model and Weighted Schatten p-Norm Minimization

Abstract:

Low-rank matrix decomposition approaches have achieved significant progress in small and dim object detection in satellite videos. However, it is still challenging to ach...Show More

Metadata

Abstract:

Low-rank matrix decomposition approaches have achieved significant progress in small and dim object detection in satellite videos. However, it is still challenging to achieve robust performance and fast processing under complex and highly heterogeneous backgrounds since satellite video data can neither adequately fit the foreground structure nor the background model in the existing matrix decomposition models. In this letter, we propose a novel object detection method based on a spatial–temporal tensor data structure. First, we construct a tensor data structure to exploit the inner spatial and temporal correlation within a satellite video. Second, we extend the decomposition formulation with bounded noise to achieve robust performance under complex backgrounds. This formulation integrates low-rank background, structured sparse foreground, and their noises into a tensor decomposition problem. For background separation, a weighted Schatten

$p$ -norm is incorporated to provide adaptive threshold to obtain the singular value of the background tensor. Finally, the proposed model is solved using the alternative direction method of multipliers (ADMM) scheme. Experimental results on various real scenes demonstrate the superiority of the proposed method against the compared approaches.

Published in: IEEE Geoscience and Remote Sensing Letters ( Volume: 19)

Article Sequence Number: 8022405

Date of Publication: 01 October 2021

ISSN Information:

DOI: 10.1109/LGRS.2021.3117054

Funding Agency:

Contents

CCBY - IEEE is not the copyright holder of this material. Please follow the instructions via https://creativecommons.org/licenses/by/4.0/ to obtain full-text articles and stipulations in the API documentation.

SECTION I.

Introduction

As a new Earth observation technology, satellite video is able to provide a period of continuous observation over an area, providing rich dynamic information of an object, such as the moving trajectory, speed, and directions. Satellite video is important for numerous applications, such as space-based surveillance [1], traffic monitoring, and disaster rescue.

As an important task based on satellite videos, small and dim moving object detection (MOD) has attracted increasing attention in recent years. However, this task is highly challenging due to several facts.

Low Spatial Resolution: Due to the long distance between a target and the imaging platform, the object is extremely small. Besides, the appearance of objects changes significantly between the consecutive frames.
Large Field of View: Each frame in a satellite video is typically on the order of several to hundreds of megapixels, resulting in a large searching space and illumination variation.
Heterogeneous Backgrounds and Complex Noise: Objects are usually immersed and densely packed in highly heterogeneous and complex backgrounds.

Most state-of-the-art MOD methods for satellite videos follow a motion-based paradigm, that is, the background subtraction (BS) technique is used to separate a frame into foreground and background components. Typical BS methods include statistical models [2], [3] and robust principal component analysis (RPCA)-based models [4]–[6]. The RPCA-based methods can be categorized into batch-based methods [4], [5] and online methods [6].

Statistical methods (i.e., median (mean) model and statistical model (VIBE) [2]) usually compare each video frame with an adaptive background model (which is free of moving objects). Ahmadi et al. [3] employed a median background model to detect objects and used the nearest neighbor algorithm to produce trajectories. However, these statistical methods do not consider the structure knowledge of a video (e.g., temporal similarity of background and spatial contiguity of foreground). Consequently, their detection performance cannot be further improved, especially in complex and dynamic backgrounds.

To address this limitation, RPCA [4], [5], [7] was introduced to encode the temporal similarities of video backgrounds and mostly useful foreground prior structures (e.g., sparsity and spatial continuity). Pflugfelder et al. [6] and Zhang et al. [8] proposed several methods based on the low-rank and structured sparse decomposition (LSD) framework [5] to achieve MOD in satellite videos. However, these matrix RPCA-based methods can only convert the videos with a natural 3-D structure to a 2-D data, which can destroy the structure information and reduce the detection performance. In addition, these methods cannot achieve robust performance and fast processing speed in complex and highly heterogeneous backgrounds.

Motivated by the work for exploiting spatial–temporal and structural information in [9], [10], we incorporate a spatial–temporal tensor with RPCA (tensor RPCA) and employ the weighted Schatten $p$ -norm minimization (WSNM) [11] to obtain the optimal results. In summary, the contributions of this letter can be summarized as follows.

We introduce a tensor representation to preserve the spatial–temporal information of pixels within a satellite video.
We propose a tensor RPCA analysis framework with bounded noise and a generalized WSNM to separate objects from the background by estimating the low-rank components. In addition, we adopt tensor singular value decomposition (t-SVD) for efficient inference.
We employ the alternating direction method of multipliers (ADMM) to solve the low-rank component recovery problem in our tensor RPCA analysis framework. Extensive experiments have demonstrated the superiority of our WSNM-STTN to the state-of-the-art methods.

SECTION II.

Proposed Model

A. Matrix Decomposition Model for MOD

The extended matrix decomposition model (E-LSD) [8] considered foreground detection from a viewpoint of decomposition and optimization problem, which can be defined as

$\begin{equation*} \bf {D} = \bf {B} +{\mathbf{S}} + \bf {E}.\tag{1}\end{equation*}$ View Source

Here, $\bf {D} \,\, \in \,\, {\mathbb {R}^{s \times n}}$ is an observed video, where $s$ and ${n}$ represent the number of pixels in a frame and the number of frames in a sequence, respectively. $\bf {B}\,\, \in \,\, {\mathbb {R}^{s \times n}}$ , $\bf {S}\,\, \in \,\,{\mathbb {R}^{s \times n}}$ , and $\bf {E}\,\, \in \,\,{\mathbb {R}^{s \times n}}$ are the estimated background, foreground, and residuals, respectively.

In E-LSD, an optimization problem is defined as

$\begin{align*} (\mathbf{ {B^{*},S^{*},E^{*}}})=&\mathop {\arg \min }_{\mathbf{ {B,S,E}}} ||{\mathbf{B}}|{|_{*}} + \lambda _{1}||{\mathbf{S}}|{|_{\ell 1/\ell \infty }} + \lambda _{2}||{\mathbf{E}}||_{F}^{2} \\&\text {s.t.} ~\mathbf{ {D = B + S + E}}\tag{2}\end{align*}$ View Source

where

$\lambda _{1} >$

0 and

$\lambda _{2} >$

0 are the weights of sparsity term

$||{\mathbf{S}}|{|_{\ell 1/\ell \infty }}$

and the residual term

$||{\mathbf{E}}||_{F}^{2}$

, respectively.

$||\bf {B}||_{*}$

means the nuclear norm of matrix

$\bf {B}$

, i.e., the sum of its singular values.

$|| \cdot ||_{\ell 1/\ell \infty }$

is a norm to induce the structural sparsity,

$||\cdot ||_{F}$

represents the Frobenius norm.

However, the matrix decomposition model cannot preserve the structural information of the input video. It also cannot make good use of the spatiotemporal correlation prior to the background and spatiotemporal continuity of the foreground. In addition, E-LSD adopts convex nuclear norm minimization (NNM) to characterize the low-rank background, while NNM treats singular values equally. As a result, the accuracy of the estimated low-rank component is reduced in highly noisy scenarios [11], [12], and the low-rank component shrinks too much, which is called the over contraction problem [11].

B. Spatial–Temporal Tensor Model for MOD

Since a satellite video has a 3-D structure, a matrix extension of RPCA to Tensor RPCA can be used to address the aforementioned problem. Furthermore, we propose a tensor RPCA analysis framework with bounded noise to preserve the structure information in a satellite video and dig out interframe correlations within a satellite video. The problem of MOD in satellite videos can be formulated as

$\begin{equation*} \mathcal {D}= \mathcal {B}+ \mathcal {T}+ \mathcal {N}\tag{3}\end{equation*}$ View Source

where

$\mathcal {D}, \mathcal {B}, \mathcal {T}, \mathcal {N} \in {\mathbb {R}^{n_{1} \times {n_{2}} \times {n_{3}}}}$

represent the original patch-tensor, background tensor, target tensor, and noise tensor, respectively.

In order to recover the low-rank component more accurately and separate the object from background more perfectly, we incorporate WSNM [11] into the low-rank tensor approximation model. This is because the principle of WSNM is to assign different weights to the $\ell _{p}$ norm of singular values, which can adjust the power $p$ to obtain a more suitable value to recover the background. The WSNM for a matrix is defined as

$\begin{equation*} {\left \|{ \mathcal {X} }\right \|_{w,{S_{p}}}} = {\left ({{\sum _{i = 1}^{\min \left \{{ {n,m} }\right \}} {w_{i}\sigma _{i}^{p}} } }\right)^{\frac {1}{p}}}\tag{4}\end{equation*}$ View Source

where

$\mathcal {X} \in {\mathbb {R}^{m \times n}}$

represents the input matrix and

$w = [{w_{1}, \ldots,{w_{\min \{ {n,m} \}}}}]$

represents the weight values satisfying an nondescending order and the nonnegativity requirement.

${\sigma _{i}}$

represents the

$i \text {th}$

singular value of

$\mathcal {X} \in {\mathbb {R}^{m \times n}}$

and the value of power

$p$

satisfies

$0 < p \leq 1$

. Both convex NNM and weighted NNM (WNNM) are the special cases of WSNM when

$w = [{1, \ldots,1}]$

and

$w = [{w_{1}, \ldots,{w_{\min \{ {n,m} \}}}}]$

with

$p = 1$

, respectively.

In our model, we generalize the definition of WSNM to tensor $\mathcal {B} \in {\mathbb {R}^{n_{1} \times {n_{2}} \times {n_{3}}}}$ , that is

$\begin{align*} \left \|{ \mathcal {B} }\right \|_{\mathcal {W},{S_{p}}}^{p}=&\frac {1}{L}{\sum _{i = 1}^{r} {\sum _{j = 1}^{n_{3}} {\left ({{\mathcal {W}\left ({{i,i,j} }\right){{\left ({{\bar {\mathcal {S}}\left ({{i,i,j} }\right)} }\right)}^{p}}} }\right)} } ^{\frac {1}{p}}} \tag{5}\\ \mathcal {W}\left ({{i,i,j} }\right)=&\frac {{C\sqrt {mn} }}{{\bar {\mathcal {S}}\left ({{i,i,j} }\right) + \varepsilon }}\tag{6}\end{align*}$ View Source

where

$r = {\mathrm{ran}}{\mathrm{k}_{t}}(\mathcal {B})$

denotes the tensor tubal rank and

$\mathcal {S} ({i,i,1})$

is the entries on the diagonal of the first slice of

$\mathcal {S}$

(

$\mathcal {S} \in {\mathbb {R}^{n_{1} \times {n_{2}} \times {n_{3}}}}$

is a diagonal tensor). The discrete Fourier transformation (DFT) of

$\mathcal {S}$

is denoted as

$\bar {\mathcal {S}} \in {\mathbb {R}^{n_{1} \times {n_{2}} \times {n_{3}}}}$

. The entries on the diagonal of

$\bar {\mathcal {S}}({:,:,j})$

are the singular values of

$\bar {\mathcal {B}}({:,:,j})$

$C$

is a tuning parameter,

$\varepsilon$

is a positive constant, and

$\mathcal {W}$

denotes a weight tensor.

Then, the overall framework can be formulated as

$\begin{align*} \min _{\mathcal {B},\mathcal {T},\mathcal {N}} \left \|{ \mathcal {B} }\right \|_{\mathcal {W},{\mathcal {S}_{p}}}^{p} + \lambda {\left \|{ \mathcal {T} }\right \|_{1}} + \beta \left \|{ \mathcal {N} }\right \|_{F}^{2} \quad \text {s.t.}~\mathcal {D} = \mathcal {B} + \mathcal {T} + \mathcal {N} \\\tag{7}\end{align*}$ View Source

where

$\lambda$

and

$\beta$

represent the positive regularization parameters for the target and noise components, respectively.

C. Solution of the Proposed Model

To solve the proposed model, we adopt ADMM [13] and the inexact augmented Lagrangian multiplier (IALM) [14]. The problem in (7) can be rewritten by IALM as

$\begin{align*} L\left ({{\mathcal {B},\mathcal {T},\mathcal {N},y,\mu } }\right)=&\left \|{ \mathcal {B} }\right \|_{ \mathcal {W},{ \mathcal {S}_{p}}}^{p} + \lambda {\left \|{ \mathcal {T} }\right \|_{1}} + \beta \left \|{ \mathcal {N} }\right \|_{F}^{2} \\&+ \left \langle{ {y,\mathcal {D}-\mathcal {B}-\mathcal {T}-\mathcal {N}} }\right \rangle \\&+ \frac {\mu }{2}\left \|{ {\mathcal {D}-\mathcal {B}-\mathcal {T}-\mathcal {N}} }\right \|_{F}^{2}\tag{8}\end{align*}$ View Source

where

$y \in {\mathbb {R}^{n_{1} \times {n_{2}} \times {n_{3}}}}$

denotes the Lagrangian multiplier tensor,

$\mu$

represents a penalty factor, and

$\langle \cdot \rangle$

denotes the inner product operation. ADMM can decompose the problem in (8) into three optimization subproblems, including

$\mathcal {B}$

$\mathcal {T}$

, and

$\mathcal {N}$

. Since it is hard to optimize all three variables simultaneously, we approximately solve this optimization problem by alternatively minimizing one variable with the others fixed. The detailed process is given in the following.

Updating $\mathcal {B}$ with other variables fixed, the formulation (8) can be defined as
$\begin{align*}&\hspace {-0.3pc}{\mathcal {B}^{k + 1}} \\&= \!\mathop {\arg \min }_{\mathcal {B}} \left \|{ \mathcal {B} }\right \|_{\mathcal {W},{\mathcal {S}_{p}}}^{p} \!+\! \frac {\mu ^{k}}{2}\left \|{ {\mathcal {D} \!-\! {\mathcal {B}^{k}} \!-\! {\mathcal {T}^{k}} \!-\! {\mathcal {N}^{k}} + \frac {y^{k}}{\mu ^{k}}} }\right \|_{F}^{2}. \\\tag{9}\end{align*}$ View Source To solve the problem in (9), we incorporate the generalized soft-thresholding (GST) method [11] into tensor singular value thresholding (t-SVT) [15], [16]. Consequently, (9) can be rewritten as $\begin{equation*} {\mathcal {B}^{k + 1}} = {\mathcal {D}_{\mathcal {W},{\mathcal {S}_{p}}{{\left ({{\mu ^{k}} }\right)}^{ - 1}}}}\left ({{\mathcal {D} - {\mathcal {T}^{k}} - {\mathcal {N}^{k}} + \frac {y^{k}}{\mu ^{k}}} }\right)\tag{10}\end{equation*}$ View Source where ${\mathcal {D}_{\mathcal {W},{\mathcal {S}_{p}}{{({\mu ^{k}})}^{ - 1}}}}(\cdot)$ denotes the ADMM algorithm. It should be noticed that the weights $w = [{w_{1}, \ldots,{w_{r}}}]$ are in a nondescending order, and the singular values satisfy a nonascending order: ${\sigma _{1}} \ge {\sigma _{2}} \ge \cdots \ge {\sigma _{r}}$ .
Updating $\mathcal {T}$ with other variables fixed, the formulation can be defined as
$\begin{align*}&\hspace {-0.3pc}{\mathcal {T}^{k + 1}} \\&=\!\mathop {\arg \min }_{\mathcal {T}} \lambda {\left \|{ \mathcal {T} }\right \|_{1}} \!+\! \frac {\mu ^{k}}{2}\left \|{ {\mathcal {D} \!-\! {\mathcal {B}^{k + 1}} - {\mathcal {T}} \!-\! {\mathcal {N}^{k}} + \frac {y^{k}}{\mu ^{k}}} }\right \|_{F}^{2}. \!\!\!\!\! \\\tag{11}\end{align*}$ View Source The problem in (11) is a typical ${l_{1}}$ regularized minimization problem. Therefore, we can obtain the overall optimal solution through an elementwise shrinkage operation [17] $\begin{equation*} {\mathcal {T}^{k + 1}} = {\mathcal {F}_{{\lambda / {\mu {}^{k}}}}}\left ({{\mathcal {D} - {\mathcal {B}^{k + 1}} - {\mathcal {N}^{k}} + \frac {y^{k}}{\mu ^{k}}} }\right)\tag{12}\end{equation*}$ View Source where $\mathcal {F}_{\lambda /{\mu {}^{k}}}(\cdot)$ represents the elementwise shrinkage operator.
Updating $\mathcal {N}$ with other variables fixed, the formulation can be defined as
$\begin{align*}&\hspace {-2.5pc} {\mathcal {N}^{k + 1}} = \mathop {\arg \min }_{\mathcal {N}} \beta \left \|{ \mathcal {N} }\right \|_{F}^{2} \\&+ \frac {\mu ^{k}}{2}\left \|{ {\mathcal {D} - {\mathcal {B}^{k + 1}} \!-\! {\mathcal {T}^{k + 1}} - \mathcal {N} + \frac {y^{k}}{\mu ^{k}}} }\right \|_{F}^{2}.\!\!\! \\\tag{13}\end{align*}$ View Source The solution of the above problem can be obtained by $\begin{equation*} {\mathcal {N}^{k + 1}} = \frac {{\mu \left ({{\mathcal {D} - {\mathcal {B}^{k + 1}} - {\mathcal {T}^{k + 1}}} }\right) + {y^{k}}}}{{2\beta + {\mu ^{k}}}}.\tag{14}\end{equation*}$ View Source
Updating multipliers $y$ with other variables fixed
$\begin{equation*} {y^{k + 1}} = {y^{k}} + {\mu ^{k}}\left ({{\mathcal {D}- {\mathcal {B}^{k + 1}} - {\mathcal {T}^{k + 1}} - {\mathcal {N}^{k + 1}}} }\right).\tag{15}\end{equation*}$ View Source
Updating $\mu ^{k + 1}$ by the following equation:
$\begin{equation*} \mu ^{k + 1} = \min \left ({{\rho {\mu ^{k}},{\mu _{\max }}} }\right).\tag{16}\end{equation*}$ View Source

Finally, the proposed method is summarized in Algorithm 1.

Algorithm 1 Process of WSNM-STTN

$\mathbf {Input}$ : The image sequence ${d_{1}},\ldots,{d_{P}} \in {\mathbb {R}^{n_{1} \times {n_{2}}}}$ ,

number of frames L, tunning parameter $H$ , parameters

${\lambda,\beta }$ , $p$ , $\mu > 0$

$\mathbf {Initialize}$ : Transform the image sequence ${d_{1}},\ldots,{d_{P}} \in {R^{n_{1} \times {n_{2}}}}$

into the tensor $\mathcal {D},{\mathcal {B}^{0}} = {\mathcal {T}^{0}} = {\mathcal {N}^{0}}= 0 \in {R^{n_{1} \times {n_{2}} \times {n_{3}}}}$ ,

$y^{0} = 0$ , ${\mu _{0}}$ = $1e$ -2, ${\mu _{\max }} = 1e7$ , $k = 0$ ,

$\rho =1.5$ , ${\zeta }$ = $1e$ -6, $\beta = 100$ .

$\mathbf {While:}$ not converged do

Update ${\mathcal {B}^{k + 1}}$ according to Eq. 10.

Update ${\mathcal {T}^{k + 1}}$ according to Eq. 11.

Update ${\mathcal {N}^{k + 1}}$ according to Eq. 14.

Update multipliers $y$ according to Eq. 15.

Update ${\mu ^{k + 1}}$ according to Eq. 16.

Check the convergence conditions

$\frac {{\left \|{ {\mathcal {D} - {\mathcal {B}^{k + 1}} - {\mathcal {T}^{k + 1}} - {\mathcal {N}^{k + 1}}} }\right \|_{F}^{2}}}{{\left \|{ \mathcal {D} }\right \|_{F}^{2}}} \le \zeta$ .

Update $k = k + 1$ .

$\mathbf {end\:While}$

$\mathbf {Output:} \,\, {\mathcal {B}^{k + 1}}, {\mathcal {T}^{k + 1}}, {\mathcal {N}^{k + 1}}$ .

SECTION III.

Experimental Results and Analysis

A. Dataset and Metrics

We evaluated the proposed WSNM-STTN on nine satellite video datasets (as listed in Table I). The first two videos (i.e., Video 001 and Video 002) were captured by SkySat.¹ Their spatial resolution is 1.0 m, while their frame rate is 30 frames per second (FPS). Videos 003–009 are provided by Chang Guang Satellite Technology Company Ltd.² Their spatial resolution is 1.0 m and their frame rate is 10 FPS. All these datasets mainly cover traffic scenarios of urban areas. Note that, MOD in videos 003–009 is a challenge due to the complex background. In contrast, the backgrounds of videos 001 and 002 captured by SkySat are mainly composed of roads, which is relatively easy to achieve good detection performance. In our experiments, moving cars are selected as the targets of interest.

TABLE I Details of Nine Satellite Video Datasets

We use three evaluation metrics, including precision, recall, and $F_{1}$ score [18], to evaluate the performance of our WSNM-STTN algorithm.

B. Parameter Setting

In the proposed WSNM-STTN algorithm, parameters are properly set to achieve good object detection performance. The regularized parameter $\lambda$ in (8) represents the influence of the object tensor. $\lambda$ is set to $(H/ ({\mathrm{ max}}(m,n)\times L)^{1/2})$ , where $m$ and $n$ are the width and the height of the input image, respectively, and $L$ represents the number of input frames used to dig out the interframe information in the model. We use tuning parameter $H$ to control $\lambda$ . Fig. 1(a)–(c) shows the recall, precision, and $F1$ curves with respect to the power $p$ , the number of frames $L$ , and the tuning parameter $H$ on the test datasets, respectively. Based on the tuning results, we set $p =0.9$ , $L =8$ , and $H =$ 4 in the following experiments to obtain the optimal detection performance.

Fig. 1.

Recall, precision, and $F1$ results achieved by our model with different values of (a) $p$ , (b) $L$ , and (c) $H$ .

Show All

C. Comparison With the State-of-the-Art Methods

We conduct extensive experiments to demonstrate the robustness of our method to various scenarios in real applications:

1) SkySat Dataset:

To test the effectiveness of our WSNM-STTN on Skysat satellite videos, following [6] and [8], we compare our method with five batch-based state-of-the-art approaches (i.e., RPCA [19], GoDec [4], DECOLOR [7], LSD [5], and E-LSD [8]) and one state-of-the-art online approaches (i.e., O-LSD [6]). As shown in Tables II and III, the WSNM-STTN method achieves the highest overall performance among these batch methods and online method, with an average $F1$ (Avg- $F1$ ) of 0.86 being achieved. This is because the tensor RPCA in our model can dig out interframe information in consecutive frames to boost the detection performance. In addition, comparing to the state-of-the-art online approaches O-LSD, WSNM-STTN achieves a comparable detection performance with significantly reduced processing time, that is, the processing time for each frame of WSNM-STTN is 30 times shorter than O-LSD. This is because the t-SVD operation in our method can speed up the inference process.

TABLE II Detection Performance Achieved by Our Model and Batch-Based Algorithms on Skysat Satellite Videos. The Best Results Are Shown in Red and the Second Best Results are Shown in Blue (Re: Recall, Pre: Precision)

TABLE III Detection Performance Achieved by Our Model and the Other Methods (i.e., E-LSD and O-LSD) on Skysat Satellite Videos

2) Jilin-1 Dataset:

To test the effectiveness of our WSNM-STTN method on Jilin-1 satellite videos, we compare our method with three batch RPCA-based state-of-the-art approaches (i.e., GoDec [4], DECOLOR [7], and E-LSD [8]) and three statics modeling-based methods (i.e., MDTT [3], VIBE [2], and D&T [18]). As shown in Table IV, WSNM-STTN achieves the highest overall performance against other methods, with an average precision of 0.90 and an average $F1$ of 0.83 being reported. Compared to the matrix decomposition method E-LSD, the proposed method even improves the performance by 0.12 and 0.13 in terms of average precision and average $F1$ , respectively.

TABLE IV Quantitative Results Achieved by Different Methods on Jilin-1 Satellite Videos. The Best Results Are Shown in Red and the Second Best Results are Shown in Blue (Re: Recall and Pre: Precision)

In summary, the proposed WSNM-STTN model can achieve robust performance and fast processing in complex and highly heterogeneous backgrounds.

D. Ablation Study

We have demonstrated the effectiveness of introducing bounded noises $\mathcal {N}$ in (8). In this section, we also conduct ablation experiments and visualize the results of WSNM-STTN with noises (i.e., WSNM-STTN w/noises) and without noises (i.e., WSNM-STTN w/o noise) in Fig. 2. It can be observed that our WSNM-STTN method achieves a high detection rate and low false alarms rate by introducing bounded noises.

Fig. 2.

Demonstration on the importance of $N$ in WSNM-STTN.

Show All

SECTION IV.

Conclusion

In this letter, we propose a WSNM-STTN model to detect dim and small moving objects in satellite video. With the STTN model, the proposed model can dig out temporal information within a sequence. Besides, we propose an extended tensor RPCA with bounded noise and incorporate WSNM to solve the overshrink problem in low-rank estimation, which is superior to noiseless modeling methods. Then, we optimize our model by ADMM to detect objects. Extensive experiments show that WSNM-STTN can achieve a high detection rate and a low false alarms rate under complex background with heavy noise. In addition, WSNM-STTN converges faster than the matrix decomposition approach by a large margin.

References is not available for this document.

Moving Object Detection in Satellite Videos via Spatial–Temporal Tensor Model and Weighted Schatten p-Norm Minimization

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

Introduction