Introduction
Infrared small target detection remains a challenging issue with the rapid development of infrared guidance systems. Small targets are often submerged in nonconstant complex backgrounds with low signal-noise ratios and low contrast. Moreover, infrared small targets always have unremarkable features, uncertain brightness, and weak intensity because of the long imaging distance in the atmosphere [1], [2]. Researchers have exerted considerable efforts in the past decade, but infrared small target detection is still a challenging task worth exploring [3]–[6].
In general, infrared small target detection methods can be classified into two categories: single frame and sequential detection. Sequential detection methods, such as the interframe difference method [7], optical flow method [8], [9], three-dimensional directional filtering [10], and Bayesian theory [11], perform well when the target has prior knowledge of the shape and position in adjacent frames. However, obtaining prior knowledge in practical military applications is extremely difficult. Considering fast detection speed and short initialization time [12], researchers often focus on single frame detection.
Typical single frame image detection methods, such as the maximum mean and maximum median filters [13], [14], two-dimensional minimum mean square filter [15], background regression estimation method [16], morphological method [17], and bilateral filter [18], can effectively detect targets in simple background. However, when small targets are submerged in infrared scenarios with highly heterogeneous backgrounds, these algorithms fail to obtain satisfactory detection results. Recently, matrix decomposition and the saliency detection method have shown substantial advantages in single frame infrared images detection. For the typical matrix decomposition method, the robust principal component analysis (RPCA) method [19] based on convex optimization is used to separate the foreground target matrix and the background matrix accurately from the original infrared image. The infrared patch-image (IPI) model [20] generalizes the traditional image model to a new patch-image model based on local patch construction. The core idea of the IPI model is to split an infrared image into the patch-image model, separate the foreground target matrix from the background matrix by stable principle component analysis, and finally reconstruct the image. However, some residual background edges remain when the target is submerged in heavy noise due to the defect of the
In recent years, methods based on the contrast mechanism of saliency detection have been proposed in the open literature for infrared small target detection. These methods consider noticeable differences between targets and background regions [23]–[26]. In the meanwhile, numerous saliency detection methods have been proposed in the field of small target detection. Spectral residual (SR) [27] is a typical algorithm for saliency detection that extracts the target region by utilizing the spectral residual information in the spectral domain and finally obtaining the salient target region. However, the SR algorithm focuses on target extraction and lacks suppression ability, especially on infrared images with heavy noises and highly heterogeneous backgrounds. The phase spectrum of quaternion Fourier transform (PQFT) [28] is proposed to calculate spatiotemporal saliency maps, which validates the necessity of phase spectra in saliency detection. A new bottom-up paradigm named spectrum scale space (SSS) algorithm [29], which is based on scale space analysis, is proposed for saliency detection. SSS analyzes the infrared image at a multiscale level and obtains different scale maps with various detailed information. However, for small target detection in complex backgrounds, SSS often detects not only target regions but also background edges. In this case, a dual multiscale filter [30] with SSS and Gabor wavelets (GW) is proposed for efficient infrared small target detection. SSS is used as the preprocessing procedure to obtain the multiscale saliency maps, and GW is utilized to suppress the high frequency noise remained and next, non-negative matrix factorization method fuses all the GW maps into one final detection image. Some saliency detection methods based on local contrast have been proposed in the open literature. Typically, local contrast measure (LCM) [31] utilizes the characteristics of HVS and a derived kernel model to highlight target regions well. However, the LCM algorithm needs to conduct numerous pixel-by-pixel calculations; thus, the efficiency is relatively low. A multiscale algorithm utilizing the relative local contrast measure (RLCM) [32] is proposed for infrared small target detection. The raw infrared image is first calculated by multiscale RLCM and then an adaptive threshold is utilized to extract target area. Recently, some LCM-based methods have shown considerable potential in target detection area. For example, a fast target detection method guided by visual saliency (TDGS) [33] is proposed, which contains fast SSS as the coarse-detection stage and adaptive LCM as the fine-detection stage. TDGS performs well for small and dim target detection in infrared search and track (IRST) systems and practical applications. A coarse to fine (CF) [34] framework combining the matrix decomposition method with multiscale modified LCM (MLCM) detects small target from structured edges, unstructured clutter and noise gradually. Inspired by the multiscale gray difference and local entropy operator, a small target detection method based on the novel weighted image entropy (NWIE) [35] is proposed, which performs well to process small target images with low SNR.
This study proposes an improved SSS (ISSS) algorithm via precise feature matching and scale selection strategy for infrared small target detection. The inexact augmented Lagrange multiplier (IALM) by exploiting the nonlocal self-similarity prior is set as the preprocessing step to extract the sparse image matrix from the original infrared image matrix. After the candidate target region is extracted from the first step, most backgrounds with self-correlation property are suppressed. Then, the improved SSS (ISSS) algorithm is proposed as the postprocessing step to further enhance the target contrast and eliminate the highly heterogeneous backgrounds via the proposed elaborate scale division strategy and optimal scale selection mechanism. The ISSS algorithm generates multi-scale saliency maps by the proposed scale division strategy, which caters to the features of infrared small targets. Finally, the optimal scale map where the small target is most highlighted needs to be screened out. Combined with the idea of saliency measurement [36], [37], the gray value difference and the maximum value of local information entropy are defined and used as the judgment criteria for optimal scale selection. Extensive experiments demonstrate that the proposed algorithm not only performs satisfactorily in visual and quantitative evaluations, but also outperforms other contrast algorithms, especially on infrared images with thick clouds and high-brightness buildings. Real experimental data also indicate that IALM and ISSS are essential steps for the proposed method and that the execution order cannot be changed in order to achieve a high detection performance. Given these effective improvements, the proposed algorithm is robust and efficient against occlusion and complex noise for infrared small target detection.
This paper is organized as follows. Section 2 reviews the related work. Section 3 introduces the proposed algorithm, including the IALM algorithm and the ISSS algorithm. Section 4 shows the experimental results. Lastly, Section 5 presents the conclusion.
Related Work
In this section, we review the basic concepts of matrix decomposition and saliency detection for infrared small target detection.
A. Matrix Decomposition
Inspired by the non-local self-correlation configuration of the background [20], matrix decomposition method has been turned out to be an effective and accurate method to separate the small target from the original image. The society of photo-optical instrumentation engineers (SPIE) defines a target with the pixel points of no more than \begin{equation*} \left \|{ I_{T} }\right \|_{0} < k,\tag{1}\end{equation*}
According to a general low-rank assumption previously proposed, all background patches come from a mix of low-rank subspace clusters [38], and infrared background \begin{equation*} rank ~(I_{B}{) \le } r,\tag{2}\end{equation*}
\begin{equation*} I =I_{T}+I_{B}+I_{n}.\tag{3}\end{equation*}
\begin{equation*} \mathop {\mathrm {min}}\limits _{L,E} \left \|{ L }\right \|_{F}\mathrm {, }~subject ~to~rank~\left ({L }\right)\le r,D =L +E\tag{4}\end{equation*}
\begin{equation*} \left ({L_{k+1},E_{k+1} }\right)=\arg \mathop {\mathrm {min}}\limits _{L, E}{L (L,E,Y_{k},\mu _{k})},\tag{5}\end{equation*}
Nevertheless, solving this sub-problem exactly is proved to be time-consuming and unnecessary. Then the Inexact ALM [43] algorithm (IALM) is proposed, which is the improved EALM algorithm. IALM algorithm converges as fast as EALM, but the number of partial SVDs is less. Consequently, IALM algorithm is widely used to solve the constrained convex optimization problem.
B. Saliency Detection
Human eyes can capture the prominent area of an image quickly through contrast mechanism, which helps eyes to selectively focus on the interested salient area. This attention selection mechanism, which is also called visual saliency, can select the region of interest from all the information in the visual range. Many algorithms of saliency detection refer to the idea of visual saliency and extract the salient region from the whole image effectively. Spectral residual (SR) [27] is a typical saliency detection method, that extracts the saliency region by using the spectral residual information.
Given an original image \begin{equation*} L(u,v) = \mathrm {log}(\vert fft(I) \vert),\tag{6}\end{equation*}
The spectral residual is defined as the difference between the original amplitude spectrum and the smoothed amplitude spectrum:\begin{equation*} R(u,v)=L(u,v\mathrm {)-}h{^\ast }L(u,v),\tag{7}\end{equation*}
\begin{equation*} S(x,y)=ifft\{\mathrm {exp}(R(u,v)+i\mathrm {\cdot }P(u,v\mathrm {))\}}.\tag{8}\end{equation*}
Proposed Method
The flowchart with a pipeline structure of the proposed method, including the background suppression stage, feature matching stage and optimal scale selection stage, is shown in Figure 1. In the background suppression stage, IALM algorithm is used to decompose the original image matrix
Flowchart of the proposed method. The red boxes represent the real target areas, the yellow and green boxes represent the different highlighted residual noise areas in some saliency maps.
A. Background Suppression Stage
The matrix decomposition method is set as the preprocessing step to decompose the target foreground matrix with sparse characteristics from the infrared image matrix. RPCA, a typical matrix decomposition method, can recover essential low-rank data from observation data with large and sparse noise pollution. This problem can be solved by the following optimization problems:\begin{equation*} \mathop {\mathrm {min}}\limits _{L,E} {rank}\left ({L }\right)+\lambda \left \|{ E }\right \|_{0} \quad s.t.~D =L +E,\tag{9}\end{equation*}
\begin{equation*} \mathop {{\mathrm {min}}}\limits _{L,E} {\left \|{ L }\right \|_{\mathrm {\ast }}}+\lambda \left \|{ E }\right \|_{1}\quad s.t.~D =L +E,\tag{10}\end{equation*}
Considering the efficiency and robustness of the abovementioned IALM algorithm, we use it to decompose the original image matrix to a low-rank image matrix and a sparse image matrix, which correspond to the background region and the target region, respectively, in the preprocessing step. The Lagrange multiplier method is often used to solve the constrained convex optimization problem. This technique integrates the original function and constraint conditions into an unconstrained equation to solve this optimization problem. For (10), the Lagrange multiplier method can perform the following integration:\begin{align*} X=&(L,E) \\ f\left ({X }\right)=&\left \|{ L }\right \|_{\ast }+\lambda \left \|{ E }\right \|_{1}. \\ h\mathrm {(X)}=&D -L -E\tag{11}\end{align*}
Then the augmented Lagrange function is as follows:\begin{align*}&\hspace {-0.5pc}\mathrm {L}(L, E, Y, \mu)=\|L\|_{*}+\lambda \|E\|_{1}+\langle Y, D-L-E\rangle \\&\qquad\qquad\qquad\qquad\qquad\;\;\qquad\displaystyle {+\frac {\mu }{2}\|D-L-E\|_{\mathrm {F}}^{2},} \tag{12}\end{align*}
Algorithm 1 (Matrix Decomposition by IALM Algorithm)
Observation matrix
while not converged do
// Fix the others and update
// Fix the others and update
end while
(
The initial parameters need to be set in the algorithm:
The following formula updates the parameter \begin{equation*} \mu _{k+1}=\begin{cases} \rho \mu _{k},&if~\dfrac {\mu _{k}\left \|{ E_{k+1}-E_{k} }\right \|_{F}}{\left \|{ D }\right \|_{\mathrm {F}}} < \varepsilon \\ \mu _{k}, &\mathrm {otherwise}. \end{cases}\tag{13}\end{equation*}
With the iteration of the algorithm and the continuous updating of
Figure 2 shows the experimental results of three typical infrared small target images after the implementation of the IALM algorithm. The experimental results in Figure 2 are all sparse image matrices extracted from the original image. The results show that the IALM algorithm can effectively eliminate most backgrounds with self-correlation property and enhance the target contrast. In Figure 2, the target areas are labeled with red circles and the noise areas are labeled with the yellow circles.
In Figure 2, although most backgrounds are suppressed, the detection results still show a high false alarm rate and poor location precision. Moreover, the target intensity from the three-dimensional maps is considerably weak. ISSS algorithm needs to be used for follow-up processing to further suppress the residual background edges and enhance target contrast.
B. Feature Matching Stage
Studying the different properties of an object often requires different specific scales. For instance, when we study the flying trajectory of an eagle, the eagle is only a small point in our visual range. When we explore its shape, we even observe its feathers. The scale must be introduced as a free parameter variable into the image processing to analyze the areas of interest. The so-called scale space is used to obtain the optimal scale of the target through multiple scales without knowing the image size. On the basis of the SSS algorithm and the contrast mechanism, the target foreground image extracted in the background suppression stage is further processed by ISSS algorithm. According to [29], the convolution of the image amplitude spectrum and the low-pass Gaussian kernel with an appropriate scale is equivalent to image saliency detection; thus, we use a Gaussian kernel function with different scales to obtain the saliency maps of various scales. For selecting the optimal scale factor of the Gaussian kernel function, an optimal-scale selection mechanism is proposed in accordance with the property of local information entropy. This mechanism caters to the characteristics of infrared small targets [45], [46] and limits the calculation of information entropy to the small region of interest instead of traversing all the complete scale maps. Gaussian kernels are the only ones that can produce multiscale spaces [47]. Using linear scale space representation for reference, we generate a single parameter family of smooth spectrum, whose parameters depend on the scale of the Gaussian kernel.
The specific process of the ISSS algorithm is as follows. Given target foreground matrix \begin{align*} I_{A}\left ({u,v }\right)=&\mathrm {log\vert }fft(I(x,y\mathrm {))\vert },\tag{14}\\ I_{P}\left ({u,v }\right)=&angle(fft(I(x,y\mathrm {)))}.\tag{15}\end{align*}
The scale space \begin{equation*} \Phi \left ({u,v\mathrm {;}k }\right)=g\left ({u,v\mathrm {;}\sigma }\right)\mathrm {^\ast }I_{A}(u,v),\tag{16}\end{equation*}
\begin{equation*} \sigma =\begin{cases} 2k &\mathrm {0 < }k\le 5\\ k\ast \left ({k-4 }\right)&\mathrm {5 < }k\le 11 \\ \mathrm {25+2\wedge (}k-\mathrm {6)}&\mathrm {11 < k\le 16} \\ \end{cases}\tag{17}\end{equation*}
The step value of the scale parameter
The obtained smooth logarithmic amplitude spectrum \begin{equation*} S_{k}\left ({x,y }\right)=ifft\{\mathrm {exp}(\Phi \left ({u,v\mathrm {;}k }\right)+i\mathrm {\cdot }I_{P}(u,v\mathrm {}))\}.\tag{18}\end{equation*}
C. Optimal Scale Selection Stage
Information entropy is often used as a quantitative indicator of system information content [48]. Thus, it can be further utilized as a criterion for the optimization of system equations or parameter selection. In a relatively simple background, the highlighted target can change the information entropy of the whole image. By contrast, for a small and dim infrared target, its contribution to the whole image information entropy is insignificant. In the proper sclae map, the areas of interest are highlighted while the other parts are suppressed to the greatest extent. For the saliency detection of large targets, the minimum image information entropy may perform well in selecting the optimal saliency map, but it is not suitable for infrared targets with extremely small sizes.
Small targets can considerably influence the value of information entropy in the local saliency region [49]. Information entropy is a local concept, and for one pixel point in the image, information entropy \begin{equation*} \mathrm {H}\left ({x,y }\right)=H\left [{ \mathrm {\Lambda }\left ({x,y }\right) }\right]=-\sum \nolimits _{b=\mathrm {1}}^{K} {p_{b}(x,y\mathrm {)lg}p_{b}(x,y)},\tag{19}\end{equation*}
In the optimal scale map, the target saliency is better than the background clutters, and at the same time, the background often shows a certain spatial similarity. Hence, when selecting the salient region, we first traverse the largest pixel point \begin{equation*} L_{k}=\max \left ({S_{k}\left ({x,y }\right) }\right)\quad k=\mathrm {1\ldots }K,\tag{20}\end{equation*}
\begin{equation*} m_{k}=\frac {1}{8}\sum \limits _{n=1}^{8} {I_{n}\left ({i,j }\right)} \quad I_{n}\left ({i,j }\right)\in B_{k}\tag{21}\end{equation*}
Figure 3 shows the values of maximum pixel points and their neighborhoods at two scales.
The similarity of the pixel point is determined by its adjacent region [50]; thus, the pixel values of the eight neighborhoods near the strong background interference are similar to one another or tend to approach the largest pixel value, whereas only a few neighborhood pixel values in the target region tend to approach the central pixel value. Extensive experiments indicate that the mean value of the eight neighborhoods’ pixel values in the target area is usually smaller than that in the highlighted background clutters. We set the discriminant standard \begin{equation*} k_{out}=\mathop {{\mathrm {arg}}}\limits _{k} {\mathrm {max}\{H(V_{k}\mathrm {)\}}} V_{k}=L_{k}+B_{k}.\tag{22}\end{equation*}
Figure 4(a2, b2) and Figure 4(a3, b3) respectively show the optimal scale saliency map screened by the minimum global information entropy in accordance with the original SSS algorithm and by the proposed local information entropy. As shown in Figure 4(a), the saliency map obtained by the global information entropy focuses on the car at close range, whereas the proposed selective mechanism focuses on the driver, which is the small target in the original image. Similarly, from Figure 4(b), the saliency map obtained by the global information entropy method focuses on the high-brightness buildings at close distance, and the local information entropy method can effectively detect the small infrared targets at far distances.
a1, b1 is the original image. a2, b2 are the salient maps obtained by global entropy, a3, b3 are the salient maps obtained by local entropy.
Experimental Results and Analysis
In this section, we randomly select twelve infrared small target images from seven infrared image sequences to verify the effectiveness and robustness of the proposed algorithm. These sequences contain images with different sizes in various typical scenarios. The detailed information of seven sequences is shown in Table 1. Figure 5 shows the experimental results of the eight state-of-the-art contrast methods and the proposed method for infrared images in each scene. Table 2 and Figure 6 show the results of SCRg, BSF and ROC curves of eleven contrast methods. The eleven contrast algorithms include TDLMS [51], Top-hat [52], RPCA [19], SSS [29], WNNM-MC [22], TDGS [33], NWIE [35], CF [34], RLCM [32], DM filter [30], IPI [20]. Based on detection results in Figure 5, SSS has achieved satisfactory detection results in relatively simple background, but SSS misses the targets with low SNR such as in Image e and Image f. RLCM and CF fail to fully detect multiple small targets in Image g, k and l; instead, a lot of false alarm emerge. Results of TDGS in Image f, i, j are not capable of detecting target correctly, due to its failure to enhance target with low SNR in complicated background with large noise. Detection results of IPI fail to suppress the heavy cloud edges and highlighted buildings completely in Image b, c, g. DM filter obtains ideal detection results in Image a, d, i, k, all the targets are correctly output while most background clutters are discarded. NWIE has achieved relatively ideal detection results compared to other contrast algorithms, and complicated background clutters are almost suppressed and most small targets are highlighted. However, false alarms still emerge in multitude in detection results of NWIE in Image c and Image e, owing to the bright cloud and building. According to 2D and 3D results in Figure 5, the proposed method can both eliminate the complicated background and enhance the target region of interest in all the seven sequences.
The representative images of the seven real image sequences and the corresponding processed results of different methods.
To further objectively verify the detection performances of the proposed method, we adopt four kinds of evaluation metrics: the signal-to-clutter ratio gain (SCRg), the background suppression factor (BSF), Receiver Operating Characteristic curve (ROC curve) and time. The SCRg and BSF values for different methods are shown in Table 2. The maximum and second values of SCRg and BSF obtained by the detection results of different algorithms in each image are marked as red and blue, respectively.\begin{align*} SCR=&\frac {\vert \mu _{t}-\mu \vert }{\sigma },\tag{23}\\ SCRg=&20lg\frac {SCR_{out}}{SCR_{in}},\tag{24}\\ BSF=&20lg\frac {\sigma _{in}}{\sigma _{out}}.\tag{25}\end{align*}
In addition, we conduct a large number of experiments to prove the necessity of introducing IALM method as the preprocessing step. In the following, we provide the simple experimental results to illustrate it. Figure 7 shows the detection results of Image b, Image c and Image e when the two algorithms’ execution order is altered. The detection results show a high false alarm rate and poor location precision.
Experimental results before and after IALM and ISSS algorithm execution order changes.
In above experiments, the ISSS algorithm not only enhances the target intensity but also sharpens the background edges, resulting in a large amount of residual background clutters. However, in the subsequent processing stage, ISSS algorithm can enhance the target strength and further eliminate the residual background clutters. In conclusion, the IALM and ISSS algorithms are the irreplaceable steps for the proposed method, and their execution order cannot be exchanged.
Conclusion
This paper proposes an improved SSS (ISSS) algorithm via a precise feature matching and scale selection strategy for efficient infrared small target detection. We have selected twelve special infrared images with complex background such as heavy and bright cloud, cloud and building, bright building with rich details, heavy noise and bright background, etc. as application scenarios, and the eleven state-of-the-art methods as contrast algorithms to evaluate the performance of the proposed algorithm. Extensive experiments illustrate that the proposed method outperforms the contrast algorithms not only in visual quality, but also in quantitative evaluation criteria such as SCRg, BSF scores, ROC curves, and running times, especially in scenarios with thick clouds and high-brightness buildings. Therefore, the proposed algorithm is an effective and robust method for infrared small target detection, and it has great potential in the field of infrared small target detection and tracking. In our future work, we will further explore the structural information of different multiscale saliency maps, such as the extraction of salient areas in each scale map by the fusion method.