Introduction
Hyperspectral (HS) imaging has become an important research topic in recent years. HS images contain dozens or hundreds of narrow bands within a certain wavelength range. The spectral resolution can be improved to the nanometer level, which contributes to various applications, such as earth remote sensing [1]–[6] and computer vision applications, including object segmentation, tracking, and recognition [7]–[9]. However, while HS images benefit from excellent spectroscopic properties, the spatial resolution is relatively insufficient compared with multispectral and panchromatic (PAN) images due to the inevitable tradeoff between spectral and spatial sensitivities. As a consequence, the imaging fusion scheme, which combines a low-resolution (LR) HS image with a high-resolution (HR) PAN image, has become an effective and popular approach that improves HS image spatial resolution. To acquire HR-HS images, HR-PAN or multispectral images are generally used as reference images. Compared to multispectral images, PAN images usually have higher spatial resolution but less prior knowledge in the spectral domain, which leads to considerable spectral distortion and makes it more challenging to reconstruct high-quality HR-HS images.
In this article, we propose an HS image superresolution method that recovers an HR-HS image using an LR-HS source image and an HR-PAN reference image. This method primarily exploits prior information; specifically, this method exploits the fact that the corresponding foreground and background intensities for each band in an HS image tend to be spatially smooth, when the alpha channel for image matting contains most of the image's edge information in a local window. Therefore, we reconstruct HR-HS images by designing a regularization term based on the image matting model, which extracts the spectral information from the foreground and background and the spatial information from the alpha channel. Specifically, two alpha channels are generated for the image matting procedure. The first alpha channel is calculated using a weighted strategy based on the structure tensors calculated from the iteratively generated HR-HS term and the PAN image, which ensures the smoothness of the foreground and background in the spatial domain. The second alpha channel is obtained from the PAN image after contrast compression, which introduces edge information into the reconstructed HS image. In this manner, the spatial details in the HS image are enhanced by the PAN image, and the spectral accuracy is also preserved. Experimental results demonstrate that the proposed method is superior when compared to state-of-the-art superresolution methods in terms of improving the quality of the fused HS images.
The rest of this article is organized as follows. In Section II, the representative literature on HS image fusion is briefly reviewed. The proposed method is presented in Section III. Section IV lists the experimental results and the comparative analysis of the different fusion methods. Finally. Section V concludes this article.
Related Works
A. Fusion-Based Image Superresolution Approaches
Various HS image superresolution methods have been developed in recent years, and the existing approaches can generally be categorized into four classes: component substitution [10]–[17], matrix factorization [18]–[26], tensor factorization [27]–[30], and other approaches [31]–[34].
Component-substitution-based approaches decompose the HS image's spatial and spectral components by transforming the image into another domain. Then, the spatial component is substituted with the multispectral or PAN image, and finally the HR-HS image is reconstructed with inverse transformation. Most existing component substitution methods are based on the Gram–Schmidt method [10], [11], intensity hue saturation [12], [13], or principal component analysis transformation [14]–[17]. These methods can be efficiently implemented and usually achieve good spatial performance. However, they also lead to serious spectral distortion when the HS image's spatial component is directly substituted with the PAN image.
Matrix-factorization-based superresolution approaches assume that each pixel in an HS and PAN image can be represented by linear combinations of several spectral atoms in the HS-HR image to be obtained. Kawakami et al. [18] fused HS images with RGB images obtained from cameras, with a prior that assumes the coefficients are sparse. Yokoya et al. [19] proposed coupled nonnegative factorization, which estimates HR-HS images from a pair of multispectral and HS images. Grohnfeldt et al. [20] fused HS and multispectral images by constructing LR and HR dictionary pairs based on joint sparse representations. Zhou et al. [21] and Veganzones et al. [22] learned the spectral basis for local patches and solved the problem in a patch-by-patch manner, assuming that the HS image is locally low-rank. Simoes et al. [23] used the subspace representation and obtained spatial smoothness based on total variation regularization. Wei et al. [24] formulated the HS fusion procedure as an ill-posed inverse problem, and the sparsity of HS images was exploited via subspace learning in the spectral dimension and sparse coding in the spatial dimensions. Dong et al. [25] proposed a nonnegative structured sparse representation (NSSR) approach to promote the nonlocal self-similarities in HR-HS images. Dian and Li [26] utilized a subspace-based low tensor multirank regularization (LTMR) for fusion, which exploits the spectral correlations and nonlocal similarities in HR-HS images. These methods obtain good accuracy for spectral and spatial resolutions with higher computational complexities, which makes them unsuitable for real-time applications.
Tensor-factorization-based approaches are also utilized in HS superresolution methods. Dian et al. [27] proposed a nonlocal sparse tensor factorization method that generates HR-HS images using dictionaries containing several modes and core tensors. Li et al. [28] conducted sparse tensor factorization for HS and multispectral images simultaneously to solve the fusion problem. Chang et al. [29] designed different sparsity regularization parameters for core tensor values in a low-rank tensor recovery procedure. Zhang et al. [30] proposed a graph-regularized low-rank Tucker decomposition approach, which combines spectral smoothness from HS images and spatial consistency from multispectral images. These methods convert conventional images to 4-D or higher order tensors without loss of information and reconstruct HS images based on prior knowledge, such as sparsity and nonlocal similarity. However, tensor-factorization-based methods have limited representation ability, which probably leads to a sharp deterioration with higher downsampling rates.
There are other fusion methods that estimate HS images with appropriate priors. Wei et al. [31] proposed an efficient Bayesian fusion framework by solving an underlying Sylvester equation associated with Gaussian prior, which has the advantage of decreasing computational complexity in HS image fusion. Qu et al. [32] designed an unsupervised deep convolutional neural network (CNN) that obtains representations in a sparse Dirichlet distribution. Dian et al. [33] used deep priors learned by residual-learning-based CNNs and reconstructed HR-HS images by solving optimization problems. Xie et al. [35] reconstructed HR-HS images by obtaining linearly transformed HR multispectral images and residual images within a deep learning framework. Wang et al. [36] proposed a blind fusion model that can improve the reconstruction quality without knowing the prior of spectral and spatial degradation. Zhu et al. [34] proposed a progressive CNN that learns high-frequency spatial details from a HR zero-centric residual image. These deep-learning-based approaches are data adaptive and can boost the reconstruction performance, whereas their computational burdens are very high and additional hardware support is needed for implementation.
B. Image Matting Model
The image matting model [37] was originally proposed to extract the foreground and background from an input image, which can be expressed as follows:
\begin{equation*}
I_m=\alpha _m F_m +(1-\alpha _m) B_m \tag{1}
\end{equation*}
In [38], Kang et al. proposed a multispectral pansharpening framework based on the local linear assumption of the matting model. The alpha channel is generated from the HR-PAN image using contrast compression. Then, the LR foreground and background in each band of the LR multispectral image are estimated with a downsampled alpha channel. The smoothed HR foreground and background are acquired by interpolating the LR foreground and background, respectively. Finally, the HR multispectral image is obtained by combining the HR foreground and background with the alpha channel. This method is simple and effective, but spectral distortion is introduced during the interpolation procedure. Dong et al. [39] proposed a matting-model-based fusion scheme, in which the alpha channel is generated from the HR-PAN image and the interpolated HS image. However, spectral distortion still occurs during interpolation, which significantly affects the accuracy of the obtained HR-HS image. Consequently, matting-model-based component substitution is an effective way for HS image fusion, but it is necessary to reduce the spectral distortion from the interpolation during fusion procedures.
Proposed Method
In this section, the proposed superresolution method is presented, and it consists of three components: the observation model, the constructed regularization based on image matting, and the optimization problem. These components are illustrated in detail as follows.
A. Observation Model
An HR-HS image can be represented as a matrix
\begin{equation*}
\bf X={\bf Z} {\bf H} \tag{2}
\end{equation*}
\begin{equation*}
\bf Y={\bf P} {\bf Z} \tag{3}
\end{equation*}
The sparsity prior assumes that each pixel of the target HR-HS image
\begin{equation*}
\boldsymbol z_i={\bf D} {\boldsymbol a}_i \qquad i=1,2,\ldots,N \tag{4}
\end{equation*}
\begin{equation*}
\bf X={\bf Z} {\bf H}=\bf D A H=\bf D B \tag{5}
\end{equation*}
\begin{equation*}
\bf Y={\bf P} {\bf Z}=\bf P D A=\bf \Psi A \tag{6}
\end{equation*}
\begin{align*}
({\bf D},{\bf B})=&\underset{{\bf D},{\bf B}}{\arg \min } \frac{1}{2} \Vert {\bf X - \bf D \bf B}\Vert _{F} ^{2} + \lambda \Vert {\bf B}\Vert _{1} \\
&{\text {s.t.}}\; \qquad {\boldsymbol b_i} \geq {0}, \boldsymbol d_k \geq {0} \tag{7}
\end{align*}
B. Regularization Based on Image Matting
According to the image matting model presented in (1), an HR-HS image can be separated into three parts: the HS foreground, the HS background, and the alpha channel. If the alpha channel contains most of the edge information in a local window, the foreground and background will be spatially smooth. However, the spatial distribution of the edges in the estimated HR-HS image from
In this article, we design a regularization term based on imaging matting to overcome this problem. Two alpha channels are generated to reduce the inconsistency between the estimated HR-HS image and the PAN reference image. The first alpha channel is used to extract the smooth HS foreground and background from the estimated HR-HS image, whereas the second alpha channel is combined with the extracted foreground and background to introduce edge information from the PAN image into the HR-HS image.
The first alpha channel is computed based on the structure tensor from the estimated HR-HS image and the observed PAN image to obtain the spatially smoothed HS foreground and background, which can be described as
\begin{equation*}
{\bf I}_{\text {w}}=\boldsymbol w_1\cdot {\bf I}_{\text {syn}}+\boldsymbol w_2\cdot {\bf Y} \tag{8}
\end{equation*}
\begin{equation*}
{\bf I}_{\text {syn}}={\bf P}{\bf Z}. \tag{9}
\end{equation*}
Clearly,
\begin{equation*}
\hat{\bf T}_{\text {syn},i}= \begin{bmatrix}I_{Dx,i}^2 & I_{Dx,i} I_{Dy,i} \\
I_{Dx,i} I_{Dy,i} & I_{Dy,i}^2 \end{bmatrix} \tag{10}
\end{equation*}
\begin{align*}
{\bf G}_{xx}=&{\boldsymbol g}_\sigma \times ({\bf I}_{Dx}\cdot {\bf I}_{Dx}) \\
{\bf G}_{xy}=&{\boldsymbol g}_\sigma \times ({\bf I}_{Dx}\cdot {\bf I}_{Dy}) \\
{\bf G}_{yy}=&{\boldsymbol g}_\sigma \times ({\bf I}_{Dy}\cdot {\bf I}_{Dy}) \tag{11}
\end{align*}
\begin{equation*}
{\bf T}_{\text {syn},i}= \begin{bmatrix}G_{xx,i} & G_{xy,i} \\
G_{xy,i} & G_{yy,i} \end{bmatrix}. \tag{12}
\end{equation*}
The matrix
\begin{equation*}
{\bf T}_{\text {syn},i}= \begin{bmatrix}\boldsymbol v_{1,i} & \boldsymbol v_{2,i} \end{bmatrix} \begin{bmatrix}\lambda _{\text {syn}1,i} & 0 \\
0 & \lambda _{\text {syn}2,i} \end{bmatrix} \begin{bmatrix}\boldsymbol v_{1,i}\\
\boldsymbol v_{2,i} \end{bmatrix} \tag{13}
\end{equation*}
\begin{align*}
\lambda _{\text {syn}1,i}=\frac{1}{2}\left[G_{xx,i}+G_{yy,i}+\sqrt{(G_{xx,i}-G_{yy,i})^2+4G_{xy,i}^2}\right] \\
\lambda _{\text {syn}2,i}=\frac{1}{2}\left[G_{xx,i}+G_{yy,i}-\sqrt{(G_{xx,i}-G_{yy,i})^2+4G_{xy,i}^2}\right]. \tag{14}
\end{align*}
Similarly, for the observed PAN image
\begin{align*}
w_{1,i}=\frac{\lambda _{\text {syn}1,i}}{\lambda _{\text {syn}1,i}+\lambda _{\text {Y}1,i}} \\
w_{2,i}=\frac{\lambda _{\text {Y}1,i}}{\lambda _{\text {syn}1,i}+\lambda _{\text {Y}1,i}}. \tag{15}
\end{align*}
In this manner, the first alpha channel
\begin{equation*}
\alpha _{1,i}=\frac{(1-2\rho)I_{{\text {w}},i}}{I_{\text {w,max}}}+\rho \tag{16}
\end{equation*}
Similarly, the second alpha channel
\begin{equation*}
\alpha _{2,i}=\frac{(1-2\rho)Y_i}{Y_{\text {max}}}+\rho \tag{17}
\end{equation*}
For the
\begin{equation*}
Z^{(i,c)}=\alpha _{1,i}F^{(i,c)}+(1-\alpha _{1,i})B^{(i,c)} \tag{18}
\end{equation*}
\begin{align*}
(F^{(i,c)},B^{(i,c)})=&\underset{F^{(i,c)},B^{(i,c)}}{\arg \min } \sum _{i=1}^N \sum _{c=1}^L [\alpha _{1,i}F^{(i,c)} \\
&{+}\;(1-\alpha _{1,i})B^{(i,c)}-Z^{(i,c)}]^2 \\
&{+}\;|\alpha _{1,ix}|[(F_x^{(i,c)})^2+(B_x^{(i,c)})^2] \\
&{+}\;|\alpha _{1,iy}|[(F_y^{(i,c)})^2+(B_y^{(i,c)})^2] \tag{19}
\end{align*}
\begin{equation*}
U^{(i,c)}=\alpha _{2,i}F^{(i,c)}+(1-\alpha _{2,i})B^{(i,c)} \tag{20}
\end{equation*}
\begin{equation*}
\phi ({\bf A}) = \Vert {\bf D A} - {\bf U}\Vert _{F}^{2}. \tag{21}
\end{equation*}
The algorithm for generating the regularization term using structure-tensor-based image matting is summarized in Algorithm 1.
Algorithm 1: Regularization Term Generation.
Input:
Compute
for
Compute
Compute
Compute
end for
for
Compute
Compute
end for
Compute
Output:
C. Optimization Problem
According to (2) and (3), the problem of reconstructing the HR-HS image
\begin{equation*}
{\bf Z}=\underset{{\bf Z}}{\arg \min } \Vert {\bf Y} - {\bf P} {\bf Z}\Vert _{F} ^{2} + \eta \Vert {\bf X}- {\bf Z}{\bf H}\Vert _{F} ^{2}. \tag{22}
\end{equation*}
Since (22) is an ill-posed problem, other constraints should be introduced to arrive at a stable solution. Equation (4) demonstrates that each pixel of
\begin{align*}
{\bf Z}=&\underset{{\bf Z}}{\arg \min } \Vert {\bf Y} - {\bf P D A}\Vert _{F} ^{2} + \eta _1 \Vert {\bf X}- {\bf D A H}\Vert _{F} ^{2} \\
&{+}\;\eta _2 \Vert {\bf A}\Vert _1 \qquad \text {s.t.} \qquad {\boldsymbol a_i} \geq {0}. \tag{23}
\end{align*}
Furthermore, image-matting-based regularization is also utilized to reconstruct the edge information of the HR-HS image in the spatial domain, described as
\begin{align*}
{\bf Z}=&\underset{{\bf Z}}{\arg \min } \Vert {\bf Y} - {\bf P D A}\Vert _{F} ^{2} + \eta _1 \Vert {\bf X}- {\bf D A H}\Vert _{F} ^{2} \\
&{+}\;\eta _2 \Vert {\bf A}\Vert _1 + \eta _3 \phi ({\bf A}) \\
=&\underset{{\bf Z}}{\arg \min } \Vert {\bf Y} - {\bf P D A}\Vert _{F} ^{2} + \eta _1 \Vert {\bf X}- {\bf D A H}\Vert _{F} ^{2} \\
&{+}\;\eta _2 \Vert {\bf A}\Vert _1 + \eta _3 \Vert {\bf D A} - {\bf U}\Vert _{F}^{2} \qquad \text {s.t.} \qquad {\boldsymbol a_i} \geq {0}. \tag{24}
\end{align*}
Equation (24) is convex and can be solved using the ADMM technique. The augmented Lagrangian function in (24) can be represented as
\begin{align*}
&L_{\mu }({\bf A},{\bf Z},{\bf S},{\bf V}_1,{\bf V}_2) \\
&=\Vert {\bf Y} - {\bf \Psi S}\Vert _{F} ^{2} + \eta _1 \Vert {\bf X}- {\bf Z H}\Vert _{F} ^{2} \\
&{+}\;\eta _2 \Vert {\bf A}\Vert _1 + \eta _3 \Vert {\bf D S -U}\Vert _{F}^{2} \\
&{+}\;\mu \left\Vert {\bf S-A}+\frac{{\bf V}_1}{2\mu }\right\Vert _{F}^{2} + \mu \left\Vert {\bf D S-Z}+\frac{{\bf V}_2}{2\mu }\right\Vert _{F}^{2} \\
&{\text {s.t.}}\; \qquad {\boldsymbol a_i} \geq {0} \tag{25}
\end{align*}
\begin{align*}
{\bf A}^{(t+1)}=&\left[{\text {Soft}}\left({\bf S}^{(t)}+\frac{{\bf V}_1^{(t)}}{2\mu },\frac{\eta _2}{2\mu }\right)\right]_{+} \\
{\bf Z}^{(t+1)}\!=\!&\left[{\bf X H}^{\mathrm{T}} \!+\!\frac{\mu }{\eta _1}\left({\bf D S}^{(t)} \!+\! \frac{{\bf V}_2^{(t)}}{2\mu }\right)\right] \left({\bf H H}^{\mathrm{T}} \!+\! \frac{\mu }{\eta _1} {\bf I}\right)^{-1} \\
{\bf S}^{(t+1)}=&\left[{\bf \Psi }^{\mathrm{T}}{\bf \Psi }+\mu {\bf I}+(\eta _3+\mu){\bf D}^{\mathrm{T}}{\bf D}\right]^{-1}\Bigg[{\bf \Psi }^{\mathrm{T}}{\bf Y} \\
&{+}\;\eta _3{\bf D}^{\mathrm{T}}{\bf U} + \mu \left({\bf A}^{(t)}-\frac{{\bf V}_1^{(t)}}{2\mu }\right) \\
&{+}\;\mu {\bf D}^{\mathrm{T}}\left({\bf Z}^{(t)}-\frac{{\bf V}_2^{(t)}}{2\mu }\right)\Bigg]. \tag{26}
\end{align*}
The Lagrangian multipliers are updated by
\begin{align*}
{\bf V}_1^{(t+1)}=&{\bf V}_1^{(t)} + \mu ({\bf S}^{(t+1)} - {\bf A}^{(t+1)}) \\
{\bf V}_2^{(t+1)}=&{\bf V}_2^{(t)} + \mu ({\bf D S}^{(t+1)} - {\bf Z}^{(t+1)}). \tag{27}
\end{align*}
In practice, the HR-HS image
Algorithm 2: HR-HS Image Reconstruction.
Input:
Initialize
for
Compute
Update
Update
Update
end for
Output:
Experimental Results
In this section, the proposed method's reconstruction performance is illustrated using simulated datasets. To verify the superiority of the proposed method, five recent state-of-the-art methods for HS image superresolution are used for comparison, including the subspace-based method (HySure) [23], NSSR [25], fast fusion based on solving the Sylvester equation (R-FUSE) [44], LTMR [26], and the deep-learning-based HS image sharpening method (DHSIS) [33].
To objectively compare the performance of these superresolution methods, following six quantitative metrics are utilized to measure the quality of the fusion results.
Correlation Coefficient (CC): The CC [45] indicates the correlation degree between two images, which is defined as follows:
where\begin{align*} &{\text {CC}}\\&=\frac{1}{L} \sum _{c=1}^{L} \frac{\sum _{i=1}^{N} [Z_{\text {ref}}^{(i,c)} - \overline{Z}_{\text {ref}}^{c}] [Z^{(i,c)} - \overline{Z}^{c}]}{\sqrt{\sum _{i=1}^{N} [Z_{\text {ref}}^{(i,c)} - \overline{Z}_{\text {ref}}^{c}]^2 \sum _{i=1}^{N} [Z^{(i,c)} - \overline{Z}^{c}]^2 }} \tag{28} \end{align*} View Source\begin{align*} &{\text {CC}}\\&=\frac{1}{L} \sum _{c=1}^{L} \frac{\sum _{i=1}^{N} [Z_{\text {ref}}^{(i,c)} - \overline{Z}_{\text {ref}}^{c}] [Z^{(i,c)} - \overline{Z}^{c}]}{\sqrt{\sum _{i=1}^{N} [Z_{\text {ref}}^{(i,c)} - \overline{Z}_{\text {ref}}^{c}]^2 \sum _{i=1}^{N} [Z^{(i,c)} - \overline{Z}^{c}]^2 }} \tag{28} \end{align*}
is the number of bands andL is the number of pixels in the HR-HS image's spatial domain.N andZ^{(i,c)} refer to theZ_{\text {ref}}^{(i,c)} th pixel value in thei th band of the fused HR-HS image and the reference ground-truth HR-HS image, respectively.c and\overline{Z}^{c} refer to the mean value of the\overline{Z}_{\text {ref}}^{c} th band in the fused HS image and the reference HS image, respectively.c Spectral Angle Mapper (SAM): The SAM [46] denotes the absolute value of the spectral angle between the two vectors
where\begin{equation*} {\text {SAM}}=\frac{1}{N} \sum _{i=1}^{N} \arccos (\frac{\langle {\boldsymbol z}_{\text {ref}}^{i},{\boldsymbol z}^{i} \rangle }{||{\boldsymbol z}_{\text {ref}}^{i}||^{2} \cdot ||{\boldsymbol z}^{i}||^{2}} \tag{29} \end{equation*} View Source\begin{equation*} {\text {SAM}}=\frac{1}{N} \sum _{i=1}^{N} \arccos (\frac{\langle {\boldsymbol z}_{\text {ref}}^{i},{\boldsymbol z}^{i} \rangle }{||{\boldsymbol z}_{\text {ref}}^{i}||^{2} \cdot ||{\boldsymbol z}^{i}||^{2}} \tag{29} \end{equation*}
denotes the spectral representation of the{\boldsymbol z}^{i} th pixel in the fused HS image andi denotes the same for the reference HS image. This index reflects the spectral distortion by the absolute angles between the two images.{\boldsymbol z}_{\text {ref}}^{i} Root-Mean-Squared Error (RMSE): The RMSE index measures the standard difference between two images as follows:
\begin{equation*} {\text {RMSE}}=\frac{1}{NL} \sum \nolimits_{c=1}^{L} \sqrt{\sum _{i=1}^{N} (Z_{\text {ref}}^{(i,c)} - Z^{(i,c)})^{2}}. \tag{30} \end{equation*} View Source\begin{equation*} {\text {RMSE}}=\frac{1}{NL} \sum \nolimits_{c=1}^{L} \sqrt{\sum _{i=1}^{N} (Z_{\text {ref}}^{(i,c)} - Z^{(i,c)})^{2}}. \tag{30} \end{equation*}
Clearly, a fused image that is closer to the reference HS image leads to a smaller RMSE value.
Erreur Relative Global Adimensionnelle de Synthse (ERGAS): ERGAS [47] measures the global quality of the fused image, which is defined as follows:
where\begin{equation*} {\text {ERGAS}}=100 \sqrt{\frac{n}{N}} \sqrt{\frac{1}{L} \sum \nolimits_{c=1}^{L}(\frac{({\text {RMSE}}^{c})^{2}}{\overline{Z}_{\text {ref}}^{c}}} \tag{31} \end{equation*} View Source\begin{equation*} {\text {ERGAS}}=100 \sqrt{\frac{n}{N}} \sqrt{\frac{1}{L} \sum \nolimits_{c=1}^{L}(\frac{({\text {RMSE}}^{c})^{2}}{\overline{Z}_{\text {ref}}^{c}}} \tag{31} \end{equation*}
is the number of pixels in the LR-HS image's spatial domain, andn refers to the RMSE value of the{\text {RMSE}}^{c} th band between the fused HR-HS image and the reference HR-HS image.c Peak Signal-to-Noise Ratio (PSNR): The PSNR for an HR-HS image is defined as follows:
\begin{equation*} {\text {PSNR}}=-\frac{10}{L} \sum _{c=1}^{L} \log ({\text {RMSE}}^{c}). \tag{32} \end{equation*} View Source\begin{equation*} {\text {PSNR}}=-\frac{10}{L} \sum _{c=1}^{L} \log ({\text {RMSE}}^{c}). \tag{32} \end{equation*}
Clearly, the PSNR definition for HR-HS images is an average of the PSNR values for the 2-D images for all bands.
Universal Image Quality Index (UIQI): The UIQI [48] is calculated on a sliding window of size
and averaged across all windows. The UIQI for two windows32 \times 32 and\boldsymbol a is given by\boldsymbol b where\begin{equation*} Q({\boldsymbol a},{\boldsymbol b})=\frac{4\sigma _{\boldsymbol{ab}}^{2}}{\sigma _{\boldsymbol a}^{2}+\sigma _{\boldsymbol b}^{2}} \frac{\mu _{\boldsymbol a} \mu _{\boldsymbol b}}{\mu _{\boldsymbol a}^{2}+\mu _{\boldsymbol b}^{2}} \tag{33} \end{equation*} View Source\begin{equation*} Q({\boldsymbol a},{\boldsymbol b})=\frac{4\sigma _{\boldsymbol{ab}}^{2}}{\sigma _{\boldsymbol a}^{2}+\sigma _{\boldsymbol b}^{2}} \frac{\mu _{\boldsymbol a} \mu _{\boldsymbol b}}{\mu _{\boldsymbol a}^{2}+\mu _{\boldsymbol b}^{2}} \tag{33} \end{equation*}
is the sample covariance between\sigma _{\boldsymbol{ab}} and\boldsymbol a , and\boldsymbol b and\sigma _{\boldsymbol a} denote the standard deviation and mean value of\mu _{\boldsymbol a} , respectively.\boldsymbol a
A. Datasets
To test the effectiveness of the proposed method, three sets of data, i.e., CAVE, Pavia University, and Eagle, are used for the experiments. The spectral response of a Canon 60D camera [49] is used to generate the PAN images from the original HS datasets, whereas the LR-HS images are simulated by applying a Gaussian blur and then downsampling in two spatial dimensions on the original HR-HS images.
The CAVE dataset [50] has 32 indoor HS images; each image has
Illustration of the original HR-HS test images from CAVE, Pavia University, and Eagle. The first row: images from the CAVE dataset. The second row: images from the Pavia University and Eagle dataset. (a) Beads. (b) Toy. (c) Cloth. (d) Food. (e) Peppers. (f) Hairs. (g) Jelly beans. (h) Oilpainting. (i) Pavia University. (j) Eagle.
B. Parameters Discussion
In our method, the key parameter that most impacts the accuracy is
Curves of CC, SAM, and UIQI values at different downsampling rates for Pavia University. (a)
Curves of CC, SAM, and UIQI values with different
There are additional parameters that need to be specified. We have found experimentally that these parameters do not influence the results as much as
The input PAN images are generated from the original HS images with camera spectral sensitivity (CamSpec) [49]. The size of the Gaussian blur kernel used to synthesize the LR-HS images is set to
C. Experimental Results
The CC, SAM, RMSE, ERGAS, and UIQI results for different downsampling rates on the CAVE dataset are reported in Table II. The proposed method outperforms other competing methods and achieves the highest spatial and spectral precision. Fig. 4 shows the reconstructed CAVE-Beads images and the error images produced by different methods. All the test methods can effectively reconstruct the spatial structures of the HS image, but the proposed method performs best in recovering the details of the original image. Specifically, for the downsampling rate
Reconstructed images of CAVE-Beads. The false color images are synthesized by band 5, 10, and 29. The first three rows show the reconstructed images with the downsampling rate
The results of the methods for the Pavia University dataset are presented in Table III and Fig. 5.1 From Table III, we can see that our proposed method performs significantly better than other test methods on most of the quality metrics. HySure and R-FUSE achieve good performance with a small downsampling rate (
Reconstructed images of Pavia University. The false color images are synthesized by band 13, 25, and 61. The first three rows show the reconstructed images with the downsampling rate
The quality metrics for the Eagle dataset are shown in Table IV. R-FUSE achieves the highest SAM and UIQI values, whereas the proposed method performs best on the CC, RMSE, ERGAS, and UIQI metrics. Because the detailed information in the observed area in the Eagle dataset is less than that in Pavia University, the performance of the test methods decrease less as the downsampling rate increases. However, the reconstruction quality of LTMR is seriously influenced when
Reconstructed images of Eagle. The false color images are synthesized by band 13, 25, and 61. The first three rows show the reconstructed images with the downsampling rate
Fig. 7 shows the CC, RMSE, and UIQI curves as functions of different spectral bands over CAVE-Beads for the test methods, which indicates the similarities of spectral reflectance between the fusion results and the reference images. All the test methods can effectively reconstruct the HS image with smaller downsampling rates. The proposed method achieves better results on most of the bands and has a significant advantage in terms of reconstruction performance with the bands corresponding to wavelengths over 610 nm, where the pixel values in these bands contain more spatial details.
Curves of (a) CC, (b) RMSE, and (c) UIQI as functions of different spectral bands over CAVE-Beads. From top to bottom: curves with downsampling rates
D. Computational Time
The execution times of the test methods are presented in Table V. All experiments are implemented with MATLAB R2020b on an Intel Xeon W-10885M@2.4 GHz CPU. The MATLAB parallel toolbox is utilized to accelerate the extraction procedure in (19) described in Section III-C. As seen from the table, R-FUSE is the fastest method on the test datasets, and benefits from the high efficiency of its hierarchical Bayesian framework based on the Gaussian prior. DHSIS also demonstrated high efficiency during the test procedure, whereas its training implementation takes considerable computational time. HySure and LTMR are spectral subspace based; therefore, their computational time primarily depend on the spatial resolution of the reconstructed HS images. The computational times of NSSR and our proposed method are related to both the spatial resolution and the number of spectral bands in the reconstructed HS images. Our proposed method requires relatively more computational time, and the primary computational burden occurs when solving (19) and (24), which correspond to the image matting procedure and the HR-HS image reconstruction, respectively.
Conclusion
In this article, we propose an effective HS image superresolution method that reconstructs HR-HS images from LR-HS images and PAN images depicting the same scene. Based on the image matting model, a regularization term is designed to preserve the spectral signatures. To introduce spatial details from the PAN image into the image fusion procedure, two alpha channels are constructed for image matting. The first alpha channel is iteratively computed based on the structure tensors from the current HR-HS term and the original PAN image, whereas the second alpha channel is generated from the PAN image using contrast compression. Experimental results on public HS datasets demonstrate that the proposed method achieves better spatial and spectral accuracy on most test images than existing HR-HS recovery methods in the literature.
In the future works, the proposed superresolution method can be extended in three directions. First, the alpha channel generation algorithm can be further extended and optimized, especially for fusions of HS and multispectral images. Second, the camera spectral response estimation methods can be incorporated into the proposed method, which is necessary for solving the blind fusion problems. Third, image matting techniques with higher efficiency can also be introduced, which may significantly improve the computational speed of our method.
ACKNOWLEDGMENT
The authors would like to thank TERN AusCover and Remote Sensing Centre, Department of Science, Information Technology, Innovation and the Arts, QLD for providing the hyperspectral data Eagle. Airborne hyperspectral are from http://www.auscover.org.au/xwiki/bin/view/Product+pages/Airborne+Hyperspectral.