Journals & Magazines >IEEE Sensors Journal >Volume: 23 Issue: 24

Joint Sparse Representations and Coupled Dictionary Learning in Multisource Heterogeneous Image Pseudo-Color Fusion

Abstract:

Considering that coupled dictionary learning (CDL) method can obtain a reasonable linear mathematical relationship between resource images, we propose a novel CDL-based s...Show More

Metadata

Abstract:

Considering that coupled dictionary learning (CDL) method can obtain a reasonable linear mathematical relationship between resource images, we propose a novel CDL-based synthetic aperture radar (SAR) and multispectral pseudo-color fusion method. First, the traditional Brovey transform is employed as a preprocessing method on the paired SAR and multispectral images. Then, CDL is used to capture the correlation between the preprocessed image pairs based on the dictionaries generated from the source images via enforced joint sparse coding. Afterward, the joint sparse representation in the pair of dictionaries is utilized to construct an image mask via calculating the reconstruction errors and therefore generate the final fusion image. The experimental verification results of the SAR images from the Sentinel-1 satellite and the multispectral images from the Landsat-8 satellite show that the proposed method can achieve superior visual effects and excellent quantitative indicators in terms of spectral distortion, correlation coefficient, mean square error (mse), natural image quality evaluator (NIQE), Blind/Referenceless Image Spatial QUality Evaluator (BRISQUE), and perception-based image quality evaluator (PIQE).

Published in: IEEE Sensors Journal ( Volume: 23, Issue: 24, 15 December 2023)

Page(s): 30620 - 30632

Date of Publication: 23 October 2023

ISSN Information:

DOI: 10.1109/JSEN.2023.3325364

Funding Agency:

Contents

SECTION I.

Introduction

With the expeditious developments of high-resolution synthetic aperture radar (SAR) and multispectral remote sensing imaging equipment, Earth observation technologies have improved remarkably. SAR imaging can achieve all-weather observations and detect the physical properties of the surface target (e.g., orientation, shape, roughness, the dielectric constant of the target, and the frequency and incidence angle of the illuminating electromagnetic radiation) [1]. However, SAR images acquired by remote sensors often suffer from geometric distortion, speckle noise, radio frequency interference, and other image degradation problems. Multispectral remote sensing can discriminate features based on the difference in morphology, structure of images, and spectral properties. It significantly expands the information volume of remote sensing [2], [3] and can be used for thematic mapping applications (e.g., land use surveys and soil erosion production). Researchers have paved the way for the pseudo-color fusion of SAR and multispectral in the past many years, and the fusion of these modalities can provide a high spatial and high spectral resolution [3]; assign each pixel to a specific class of interest, when making the map based on various modalities of remote sensing resources, resulting in better interpretation and more accurate, robust results in the fused image [4], [5]; and perform heterogeneous image fusion on massive remote sensing data to reduce the consumption of computing resources for deploying downstream tasks at the edge and improve usability [2]. The fused images will have complementary information and are used in various remote sensing applications (e.g., urban mapping, vegetation identification, and lithology analysis) [6]. This article investigates improvements in the quality of SAR images assisted by multispectral imaging based on heterogeneous image fusion. The proposed method provides technical support for multisource high-resolution Earth observation [7].

Image fusion techniques include three different levels: pixel level, feature level, and decision level. This article mainly focuses on pixel-level image fusion, which directly uses the primary information from the SAR and multispectral images [8]. Many algorithms for pixel-level image fusion [9], [10], [11], [12], [1], [13], [14], [15], [16], [17], [18] have been explored for remote sensing applications. Some approaches [e.g., principal component analysis (PCA) and intensity–hue–saturation (IHS)] use SAR images to replace the optimal/derived band directly. Nevertheless, different bands shall have different wavelength coverage and operational principles, which will miss important information. Besides, some methods (e.g., Brovey) have an apparent tendency in the fusion target and therefore cannot simultaneously retain the information in SAR and multispectral images.

As machine learning techniques develop rapidly [19], [20], [21], [22], [23], [24], learning-based methodologies can be employed to explore the best intermediate representation of heterogeneous modalities [25], [26], [27]. Dictionary learning has achieved superior performance among various real-world applications [28], [29], [30]. Sparse coefficients and overcomplete dictionaries can be combined to reconstruct images through sparse representation. Yang and Li [31], [32] applied the theory to image fusion for the first time and tried to fuse multifocal images using a discrete cosine transform dictionary-based method, as well as a simultaneous orthogonal matching pursuit (OMP)-based method to fuse multimodal images, respectively. Both Yin and Li [33] and Yu et al. [34] attempted to represent source images as common and innovative components for image fusion. Kim et al. [35] have combined joint patch clustering with dictionary learning for multimodal image fusion. As single dictionary models have been studied intensively, coupled dictionaries are required to represent dual feature spaces, such as two images with different resolutions or from heterogeneous sources. Coupled dictionary learning (CDL) has been applied to reconstruction [21], recognition [20], [36], and signal fusion [23]. Veshki et al. [26] proposed a CDL method based on simultaneous sparse approximation and relaxed the assumption of equal sparse representation. Zhang et al. [37] further employed CDL to preserve the structure, function, and Fedge information in the source images, overcoming the single dictionary’s disadvantage. As a result of the rapid development of deep learning technologies and continuous advancements in high-performance computing, deep learning-based approaches have achieved excellent performance in remote sensing image fusion [38], [39], [40]. However, in order to achieve high performance, deep learning methods require training on large-scale datasets. Moreover, deep learning methods consume significant computational resources in terms of inference speed, storage space, and training costs, making them unsuitable for applications and deployment on edge devices such as satellites [41]. In comparison, CDL and traditional methods do not require a large amount of training data, and the resulting coupled dictionaries occupy much less storage space. In addition, sparse representation also enables high inference speed. Therefore, this article will solely focus on discussing nondeep learning methods to cater to the demands of edge computing and deployment.

As a result of the rapid development of deep learning technologies and continuous advancements in high-performance computing, deep learning-based approaches have achieved excellent performance in remote sensing image fusion [38], [39], [40]. However, in order to achieve high performance, deep learning methods require training on large-scale datasets. Besides, deep learning methods consume significant computational resources in terms of inference speed, storage space, and training costs, which restricts their usage to offline applications and makes it challenging to deploy and operate them on edge devices [41]. In comparison, CDL and traditional methods do not require a large amount of training data, and the resulting coupled dictionaries occupy much less storage space. In addition, sparse representation also enables high inference speed. Therefore, this article will solely focus on discussing nondeep learning methods to cater to the demands of edge computing and deployment.

In the field of remote sensing applications, some studies have extended dictionary learning for image fusion, such as support value transformation and sparse representation [42], and fusion-based cloud removal methods [43]. The CDL-based image fusion method can enforce learning to represent the relationship of two related feature spaces, aiming at two pairs of dictionaries to learn the same sparse representation. Therefore, CDL can spatially capture the dependency information of two images and can obtain a reasonable linear mathematical relationship between SAR and multispectral images, thus preserving different information from heterogeneous images (e.g., spatial information in SAR images and spectral information in multispectral images) and obtaining a better visual effect. Wang et al. [44] further introduced the details injection model. Ayas et al. [45] considered introducing the texture information in the high-resolution image into the low-resolution image to enhance the effect of image fusion. CDL has also been used for collaborative prediction of multimodal remote sensing images, such as methods with distance preserved probability distribution adaptation [46] and class-based guidance solutions [47].

This article proposes a novel image fusion methodology with CDL and hybrid techniques to achieve the pseudo-color fusion of SAR images with multispectral images. First, since the existing fusion solutions (selective mask and weighted fusion) will lead to information loss or attenuation when applied to multimodal data with obvious feature differences, we have applied the Brovey method to perform preprocessing and replace the original input SAR images. Thus, the “pseudo” SAR image shall contain information from two modalities, effectively avoiding information loss. Second, to strengthen the relationship between the two modalities in the coupled dictionary, we force the dictionaries to learn joint sparse representations. Meanwhile, we do not introduce restriction items when updating the dictionary, which can ensure the structural coherence of the coupled dictionaries and further promote the associativity of multimodal information. Finally, the reconstruction error-based selection method is employed to generate the reconstruction mask for final fusion. Our main contributions and findings are threefold.

We introduce a novel hybrid algorithm, which integrates CDL and sparse representation to perform the pseudo-color fusion of SAR and multispectral images for the first time. This solution can capture the mutual relationship and establish the best intermediate representation, which efficiently generates fused images with rich spectral information and geometric properties of SAR.
We use the Brovey transform as a preprocessing method and employ the Brovey image as the “pseudo” SAR image with certain spectral information, enabling the final fused image with more comprehensive information.
In the employed CDL algorithm, the coupled dictionaries are enforcedly learned together by the joint optimization scheme, and the structurally coherent dictionaries are also set by removing restriction items. Therefore, the multimodal correlation in the coupled dictionaries is further promoted, making it suitable for the multimodal fusion task on complex features.
Experimental results on SAR images from the Sentinel-1 satellite and multispectral images taken by the Landsat-8 satellite demonstrate that our method can obtain remote sensing images with comprehensive spatial and spectral information and achieve excellent fusion performance both qualitatively and quantitatively.

The rest of this article is organized as follows. Section II introduces the related work of SAR and multispectral image fusion. Section III provides the formulation and details of the proposed methodology. The experiment results are presented in Section IV before drawing conclusions in Section V.

SECTION II.

Related Works

Four categories may be used to categorize current pixel-level SAR and multispectral image fusion methodologies: component substitution, multiresolution analysis, hybrid techniques, and model-based algorithms [2].

Component substitution methods include Brovey transform [48], [9], [49], Gram–Schmidt (GS) [50], [13], [16], PCA [51], and IHS [52], [53], [54]. Such methods will separate spatial components from spectral information in multispectral images, project them into another space, and use SAR images to replace the spatial components. The replaced image is then converted back to the original image space to acquire the fused result, which has high fidelity and renders spatial details in fused images. However, the spectrum between the multispectral and SAR image channels is not matched, which causes local differences between the images. The component substitution method cannot account for this problem of local differences and can cause significant spectral distortion.

Multiresolution analysis methods need to decompose the original image into multiple scale levels based on wavelet transform [55], [11], [56], [57], [18], [58] or pyramid transform [59] and then fuse the heterogeneous images at each level. Finally, the images from each level are recombined into a fused image. Yuan et al. [60] further introduced edge-preserving filters and weighted backprojection to improve information fidelity during multiresolution fusion. A recent approach also considered decomposing the source image into structure and texture layers and performing fusion separately [61]. This kind of method will increase the consumption of computing resources and computational complexity, but it can achieve better results and performance in the localization in both the spatial and frequency domains.

Hybrid image fusion techniques combine the benefits of component substitution approaches and multiresolution analysis approaches [1], [62], [63], [64], [65], [66], [3]. Chen et al. [1] used atrous wavelet transform (AWT) decomposition to extract the details of SAR images and applied empirical mode decomposition (EMD) to discern high-frequency information in multispectral images and the approximate images of SAR. Finally, an additive operation was conducted in the AWT-EMD domain to achieve the final fused high-resolution image. Hong et al. [62] combined wavelet and IHS fusion to maximize the color and spatial information from source images. Luo et al. [63] integrated PCA and additive wavelet decomposition, attempting to solve the decreased spatial resolution in wavelet fusion shall sacrifice and the severe spectral distortion in PCA fusion. Kurban [65] analyzed the image fusion problem as an optimization problem, combining differential search theory with IHS transform. Zhang et al. [66] connected the Laplacian pyramid with sparse representation. They decomposed the source images into high- and low-frequency components and fused them by sparse representation. Combining different methods, hybrid techniques can generate fused images with different characteristics and emphasis toward different remote sensing applications [3].

Model-based algorithms present excellent capabilities for representing the complicated local features of remote sensing data. A variety of model-based methods have been applied to SAR and multispectral image fusion, such as Bayesian estimation [67], compressive sensing [68], CDL [69], sparse representation [70], and deep neural network (DNN) [71]. Camacho et al. [72] employed an augmented linear mixing model to deal with the spectral variability problem in image fusion. Sun and Wu [73] introduced the ant colony optimization algorithm to explore the global optimal path in image fusion. Sparse representation solutions have also been widely explored in multimodal image fusion (e.g., near-infrared–visible image pairs, visible–infrared image pairs, positron emission tomography (PET)-magnetic resonance imaging (MRI) image pairs, and multifocus image pairs) [74], [75]. Wang et al. [74] employed geometric information to train image patches as subdictionaries and combined input image pairs using the proposed constructed-subdictionary (CSD) strategy. Zhu et al. [75] trained dictionaries using clustering algorithms, fused images using sparse representation to preserve texture details, and employed energy-based spatial algorithms to retain structural information during the fusion process. However, a more effective approach is to directly create coupled dictionaries during the process of dictionary learning. This allows for maximum preservation of the respective valuable information from different modalities while effectively capturing the correlation between the two modalities. Model-based methods allocate pixels of the fused images based on different fitting strategies, seek the optimal combination of spectral and spatial information, and obtain competitive fusion results.

In this article, the proposed pseudo-color fusion technique combines the advantages of component substitution methods, hybrid techniques, and model-based methods. We initially use the Brovey method to preprocess multispectral and SAR images, to endue the “pseudo” SAR image with spectral information. Different from traditional combination approaches of hybrid techniques, CDL is employed to integrate Brovey and multispectral images. Model-based methods have the capability of exploring the best intermediate representation of input data. Hybrid techniques can summarize the advantages of both component substitution and multiresolution analysis methods, while our method can aggregate benefits from these three different types of methods. Our proposed novel method is expected to effectively combine and complement the information in SAR and multispectral images and to produce high-quality fused images.

SECTION III.

Methodology

This work first preprocesses the input SAR image $I_{\mathrm{ SAR}} \in R^{b \times c}$ and multispectral image $I_{\mathrm{ MS}} \in R^{b \times c}$ using the Brovey fusion method and get the Brovey image $I_{B} \in R^{b \times c}$ . Then, CDL is used to analyze the correlation properties of Brovey and multispectral images. Finally, the reconstruction errors of a pair of input patches are set as the discriminative rule for image patch reconstruction. Briefly, for the Brovey images and multispectral images, we extract $\alpha$ patches of size $p$ from these two images as matrices $X_{B} \in R^{p \times q}$ and $X_{\mathrm{ MS}} \in R^{p \times q}$ . Then, we select the appropriate patches from the image fused via Brovey transform to replace the corresponding patches in the multispectral matrix and stitch to get a final matrix $X_{\mathrm{ Fusion}} \in R^{p \times q}$ . A weighted average process is devised for the edges of the stitched patches. The workflow of the algorithm is shown in Fig. 1, and the pseudocode is presented in Algorithm 1.

Fig. 1.

Flowchart of the proposed pseudo-color fusion methodology. Preprocessing with Brovey transform endues the “pseudo” SAR image with certain spectral information. CDL is employed to capture the mutual relationship and establish the coupled dictionaries of the multispectral and Brovey images. Finally, the image is reconstructed via joint sparse representation.

Show All

SECTION Algorithm 1

Multispectral and SAR Image Pseudo-Color Fusion

Require: Input multispectral image $I_{MS}$ and SAR image $I_{SAR}$ , $D_{MS}=D_{B}=D_{0}$ .

Obtain $I_{B}$ via Brovey transform-based image fusion with Equ. (1);

Extract patch matrices $X_{MS}$ and $X_{B}$ ;

Nomalize patches $x_{MS}$ and $x_{B}$ ;

for R rounds of updating do

Compute $L$ in the sparse coding with Equ. (2);

for number of atoms $n=1,2,\ldots$ do

Compute $\int _{n}$ with Equ. (5);

if $\int _{n} = \emptyset$ then

Update $[d_{r}]_{n}$ with Equ. (6);

10:

else

11:

Update $[d_{r}]_{n}$ with Equ. (8);

12:

Update $\left [{\alpha ^{T}_{n}}\right]_{\int _{n}}$ with Equ. (9);

13:

end if

14:

end for

15:

end for

16:

Obtain the dictionaries $D_{MS}$ and $D_{B}$ ;

17:

Establish the coupled dictionary $D$ with Equ. (11);

18:

for ( $x_{MS}$ and $x_{B}$ ) in ( $X_{MS}$ and $X_{B}$ ) do

19:

Calculate $e_{MS}$ and $e_{B}$ to establish $K_{\alpha}$ with Equ. (12);

20:

end for

21:

Compute the reconstruction mask $K$ with Equ. (13);

22:

Fuse $I_{MS}$ and $I_{B}$ and obtain $I_{F}$ with Equ. (14).

A. Preprocessing via Brovey Transform

The Brovey fusion algorithm [48] is a type of ratio-transform fusion, which normalizes each band of a multispectral image and then performs a multiplicative band operation on a panchromatic image. As the Brovey transform operation is related to the color space, this algorithm can only be conducted on remote sensing images with three bands [9], [49]. This methodology was originally designed for panchromatic and multispectral fusion. In our pseudo-color fusion application, we employ SAR images as pseudo-panchromatic images. The Brovey transform can decompose the image elements of a multispectral image into color and luminance separately. The expression of the Brovey fusion algorithm is shown as follows: $\begin{align*} {I_{B}}_{i}=&{I_{\mathrm{ MS}}}_{i} \times {I_{\mathrm{ SAR}}} / \theta \\ \theta=&\sum _{\beta =1}^{n} {I_{\mathrm{ MS}}}_{i} \tag{1}\end{align*}$ View Source where $I_{\mathrm{ MS}}, I_{\mathrm{ SAR}}, \theta \in R^{b \times c}$ . $i$ represents the three bands of the multispectral and Brovey images, and therefore, $n=3$ . ${I_{B}} \in R^{b \times c}$ denotes the fused Brovey image. Therefore, based on the principle of chromaticity transform [76], the Brovey transform possesses the ability to preserve a high degree of spatial detail through arithmetical technique with the SAR image. The defect of the Brovey transform is that the results of Brovey fusion usually contain high spectral distortion and lack spectral information. However, in other words, the image preprocessed with the Brovey transform is highly similar to the SAR image while also containing certain spectral information. From the visual evaluation, the Brovey transform imparts “color” information to the SAR image. In this case, we utilize this “defect” and apply the Brovey image as the “pseudo” SAR image in the subsequent fusion step. Then, we can further explore the potential of the Brovey transform using CDL and joint sparse representation.

B. Coupled Dictionary Generation

After achieving the multispectral image patches $X_{\mathrm{ MS}}$ and another preprocessed image patches $X_{B}$ using the Brovey method, we generate a pair of coupled dictionaries $D_{\mathrm{ MS}}$ and $D_{B}$ , which represent the set of the input data $X_{\mathrm{ MS}}$ and $X_{B}$ , respectively. The coupled dictionary generation procedure can be formulated as follows: $\begin{align*}&\min _{D_{\mathrm{ MS}}, D_{B}, L}\,\left \|{X_{\mathrm{ MS}}-D_{\mathrm{ MS}} L}\right \|_{F}^{2}+\left \|{X_{B}-D_{B} L}\right \|_{F}^{2} \\&\,\text {s.t.} \left \|{\alpha _{m}}\right \|_{0} \leq H_{0},\quad \left \|{\left [{d_{\mathrm{ MS}}}\right]_{n}}\right \|_{2}=1, \left \|{\left [{d_{B}}\right]_{n}}\right \|_{2}=1\quad \forall n, m \tag{2}\end{align*}$ View Source where $[d_{\mathrm{ MS}}]_{n}$ and $[d_{B}]_{n}$ refer to the $n$ th columns of the atoms, $\| {\cdot } \|_{0}$ represents the number of nonzero coefficients, $\| {\cdot } \|_{2}$ is the Euclidean norm, $\| {\cdot } \|_{F}$ is the Frobenius norm, $H_{0}$ is a constraint coefficient, and $L$ denotes the common sparse representation matrix. Therefore, the two dictionaries from different modalities are enforced to be learned together under one optimization procedure, which promotes their learning on joint sparsity representation. Equation (2) can then be rewritten as $\begin{align*} \min _{D_{\mathrm{ MS}}, D_{B}}\,\left \|{X_{\mathrm{ MS}}-\sum _{n}[d_{\mathrm{ MS}}]\left [{\alpha _{n}^{T}}\right]}\right \|_{F}^{2}+\left \|{X_{B}-\sum _{n}[d_{B}]\left [{\alpha _{n}^{T}}\right]}\right \|_{F}^{2}. \tag{3}\end{align*}$ View Source The relationship between the multispectral and Brovey image can thereby be captured for further updating and reconstruction.

We use the coupled dictionary training method proposed in [77] based on an iterative minimization approach to solve (3) and thus obtain the pair of dictionaries $D_{\mathrm{ MS}}$ and $D_{B}$ . The method minimizes the dictionary pair and the sparse coding $L$ alternatively. Sparse encoding is performed using OMP [78]. The dictionary is optimized by minimizing the following item: $\begin{align*} \left [{d_{r}}\right]_{n}&=\underset {\left [{d_{r}}\right]_{n}}{\mathrm {argmin}}\,\left \|{\left [{E_{r}}\right]_{n}-\left [{d_{r}}\right]_{n}\left [{\alpha _{n}^{T}}\right]_{\int _{n}}}\right \|_{F}^{2},\quad r \in \{{\text {MS}}, B\} \tag{4}\\ \left [{E_{r}}\right]_{n} &\triangleq \left [{X_{r}-\sum _{t \neq n}\left [{d_{r}}\right]_{t} \alpha _{n}^{T}}\right]_{\alpha _{n}},\quad \int _{n}=\left \{{m \mid \left [{\alpha _{n}^{T}}\right]_{m} \neq 0}\right \} \tag{5}\end{align*}$ View Source in which $\alpha _{n}^{T}$ denotes the $n$ th row of $L$ and $\int _{n}$ is used to select the nonzero entries in $\alpha _{n}^{T}$ . The subscripts MS and $B$ denote the multispectral image and the image preprocessed with the Brovey method, respectively. Veshki et al. [23] imposed constraints on dictionary updates such that the pair of coupled dictionaries can be discriminated. However, in our remote sensing application, we employ the CDL setup from [77] using two separate least squares and $\mathcal {L}_{2}$ -norm to update the atom. Meanwhile, we remove the restriction items in [23] to further enhance the correlation between the two dictionaries. When $\int _{n} = \emptyset$ , the updated value of $[d_{r}]_{n}$ is obtained by computing the columnwise average of error matrix $[E_{r}]_{n}$ $\begin{equation*} \left [{E_{r}}\right]_{n} = X_{r}-D_{r} L,\quad r \in \{{\text {MS}}, B\}. \tag{6}\end{equation*}$ View Source When $\int _{n} \neq \emptyset$ , the $\mathcal {L}_{2}$ -norm is then employed for updating and optimization $\begin{equation*} \left [{d_{r}}\right]_{n}= \frac {\left [{E_{r}}\right]_{n}\left [{\alpha _{n}^{T}}\right]^{T}_{\int _{n}}}{\left \|{\left [{\alpha ^{T}_{n}}\right]_{\int _{n}}}\right \|^{2}_{2}}, \quad r \in \{{\text {MS}}, B\}. \tag{7}\end{equation*}$ View Source We employ the $\mathcal {L}_{2}$ -norm here, so we need to normalize the normalization term $\|[\alpha ^{T}_{n}]_{\int _{n}}\|^{2}_{2}$ to 1, so the (7) can be rewritten as $\begin{equation*} \left [{d_{r}}\right]_{n}=\left [{E_{r}}\right]_{n}\left [{\alpha _{n}^{T}}\right]^{T}_{\int _{n}}, \quad r \in \{{\text {MS}}, B\}. \tag{8}\end{equation*}$ View Source Subsequent to the updating of $[d_{r}]_{n}$ , based on the initial setup in (3), $[\alpha ^{T}_{n}]_{\int _{n}}$ can also be updated as $\begin{align*} \left [{\alpha ^{T}_{n}}\right]_{\int _{n}} &= {d_{n}^{T}}{E_{n}} \tag{9}\\ {d_{n}}& \triangleq {[d_{\mathrm{ MS}}]_{n}}/{[d_{B}]_{n}}, \quad {E_{n}} \triangleq {[E_{\mathrm{ MS}}]_{n}}/{[E_{B}]_{n}}. \tag{10}\end{align*}$ View Source Therefore, we construct the coupled dictionaries of double feature spaces from the multispectral and Brovey image pairs.

C. Pseudo-Color Fusion via Sparse Representation

Subsequent to constructing the coupled dictionaries of multispectral image and Brovey image, we then classify the image signal with sparse representation. Through evaluating the reconstruction errors, each element is able to be assigned with the minimum reconstruction errors to obtain the best sparse representation classification [23], [79]. First, a dictionary $D$ is built based on the concatenation of the obtained pair of coupled dictionaries $\begin{equation*} D \triangleq \left [{\begin{array}{ll} D_{\mathrm {MS}}^{T} &\quad D_{\mathrm {B}}^{T} \end{array}}\right]^{T}. \tag{11}\end{equation*}$ View Source

Then, we use $e_{\mathrm{ MS}}$ and $e_{B}$ to denote the reconstruction errors of the multispectral images and the preprocessed images using the Brovey method. The reconstruction error formulation is given as follows: $\begin{align*} \begin{cases} e_{\mathrm{ MS}}=\left \|{D \alpha -\left [{x_{B} x_{\mathrm{ MS}}}\right]}\right \|_{2}^{2} \\ e_{B}=\left \|{D \alpha -\left [{x_{\mathrm{ MS}} x_{B}}\right]}\right \|_{2}^{2} \end{cases} \tag{12}\end{align*}$ View Source in which $\alpha$ includes the related sparse codes. When $e_{\mathrm{ MS}} < e_{B}$ , $x_{\mathrm{ MS}}$ will be chosen; otherwise, $x_{B}$ will be chosen. $e_{\mathrm{ MS}}$ and $e_{B}$ will not be equal [23], so there is no need to consider the case when they are equal. By applying this selection rule to all patches, the patch-level mask $K_{\alpha}$ can then be obtained directly. The pixel-level mask $K$ is able to be calculated with $K_{\alpha}$ by applying the following equation: $\begin{equation*} K=P^{\ast}\left ({K_{\alpha} }\right)-1. \tag{13}\end{equation*}$ View Source

$K_{\alpha}$ contains values within an interval $\{1, 2\}$ , so the values in $K$ are within $[{0,1}]$ . $P^{\ast}(\cdot)$ is a function that can place each patch in the image where it should be located. This function performs weighted averaging operations for overlaps between patches. Therefore, we can achieve the final fused image using the above mask as follows: $\begin{equation*} I_{F, u, v}=K_{u, v} I_{{\text {MS}}, u, v}+\left ({1-K_{u, v}}\right) I_{B, u, v} \tag{14}\end{equation*}$ View Source where $I_{F}$ denotes the finally fused image and $(u,v)$ denotes the location of pixels in the image.

D. Complexity

First, the steps of Brovey fusion, mask generation, and final fusion take up little computational cost, as they are only a small amount of pixel- or patch-level elementwise addition and multiplication operations, far less than those from coupled dictionary generation. In this case, the coupled dictionary generation phase shall dominate the complexity calculation. Based on [80], the complexity of the sparse coding phase is $O(pq{S_{p}}A{H_{0}}N)$ [in (4) and (5)]. $S_{p}$ denotes the patch size, $A$ denotes the number of atoms, and $N$ denotes the number of iterations during the optimization. Besides, the complexity during the atom updating period is $2\,\, \times O(pq)$ [in (8) and (9)]. Therefore, the total complexity during the coupled dictionary generation is the sum of the sparse coding and atom updating phase.

SECTION IV.

Experiments

A. Implementation Details

All experiments are conducted on a computer with AMD Ryzen 9 5950X 3.40-GHz CPU and NVIDIA RTX 3090 GPU. Our SAR data are products from the Sentinel-1, which have two satellites to satisfy the requirements of revisit and coverage. The satellite usually scans in interferometric wide (IW) swath mode when observing the land. Data from this Sentinel-1 satellite are publicly available.1 Our multispectral data come from the Operational Land Imager (OLI) on the Landsat-8 satellite, which is also publicly accessible.2 The Landsat-8 satellite collects images by capturing electromagnetic waves reflected and radiated from the Earth’s surface and converting them into digital signals. The data we used were taken in 2022 in Liangshan, Shandong, China. The two modality images are registered directly based on standard map coordinates in ENVI 5.5.3. The experimental results are compared qualitatively and quantitatively against wavelet-based fusion [18], PCA-based fusion [76], hue–saturation–value (HSV) color space-based fusion [14], GS fusion [16], nearest neighbor diffusion (NND)-based Fusion [17], improved wavelet transform (Improved Wavelet)-based fusion [81], adaptive enhanced fusion (AEF) [82], and coupled feature learning via structured convolutional sparse coding (CCFL) [83]. While maintaining the integrity of the classical methods, we select the latest methods of the same type for comparison. These methods can be referred to the corresponding references for details. As we discussed in Section II, to ensure the performance of algorithms computing and deployment at the edge, we did not introduce DNN algorithms that require high training consumption.

B. Evaluation Metrics

To evaluate the image quality, we adopt reference-based metrics, including the degree of spectral distortion [1], correlation coefficient [84], mean square error (mse) [85], no-reference metrics including natural image quality evaluator (NIQE) [86], Blind/Referenceless Image Spatial QUality Evaluator (BRISQUE) [87], and perception-based image quality evaluator (PIQE) [88]. We shall present the equations of reference-based metrics, and the detailed algorithms of the no-reference metrics can be referred to in [86], [87], and [88].

1) Degree of Spectral Distortion:

When comparing with the original multispectral image, the obtained image’s level of distortion is directly represented by the degree of spectral distortion [1], [89] $\begin{equation*} {\mathbf{ Distortion}}=\frac {1}{N} \sum _{i=1}^{N}\left |{\text {Fuse}(i)-\text {Ref}(i)}\right | \tag{15}\end{equation*}$ View Source where $i$ stands for each pixel in the image, $N$ denotes the number of pixels, and $\text {Ref}(i)$ and $\text {Fuse}(i)$ represent the pixels of the reference and fused images, respectively. The multispectral image is set to the reference. The higher spectral distortion represents the more severe spectral distortion.

2) Correlation Coefficient:

The correlation coefficient, within the range of $[-1,1]$ , measures the similarity between reference and fused images [84]. It is evaluated as follows: $\begin{equation*} {\mathbf{ CC}}=\frac {\sum _{i=1}^{N} [\text {Ref}(i)-\overline {\text {Ref}}] \cdot [\text {Fuse}(i)-\overline {\text {Fuse}}]}{\sqrt {\sum _{i=1}^{N} [\text {Ref}(i)-\overline {\text {Ref}}]^{2} \cdot [\text {Fuse}(i)-\overline {\text {Fuse}}]^{2}}} \tag{16}\end{equation*}$ View Source where $\overline {\text {Ref}}$ and $\overline {\text {Fuse}}$ represent the mean value of the pixels. The higher the ${\mathbf{ CC}}$ , the more correlated the two images are.

3) Mean Square Error:

The mse calculates the mean square root error of the three bands between the fused and reference images, which measures the similarity rate [85] $\begin{equation*} {\mathbf{ mse}} = {\frac {1}{N} \sum _{i=1}^{N} \left [{\text {Fuse}(i)-\text {Ref}(i)}\right]^{2}}. \tag{17}\end{equation*}$ View Source

We also employ the multispectral image as the reference.

C. Experimental Results

The performance of our proposed fusion methodology is both quantitatively (Tables I and II) and qualitatively (Figs. 2 and 3) benchmarked against the commonly used remote sensing image fusion methods of Wavelet, GS, HSV, NND, PCA, Improved Wavelet, AEF, and CCFL.

TABLE I Comparison Experiments on the Degree of Spectral Distortion, Correlation Coefficient, and mse Against Wavelet, GS, HSV, NND, PCA, Improved Wavelet, AEF, and CCFL Approaches. Bold and Underline Represent the Best and the Second-Best Performance, Respectively. With respect to the Degree of Spectral Distortion and MSE, “Overall” Denotes the Average Results of Three Bands. With Respect to the Correlation Coefficient, “MS” Denotes the Average Results of Three Multispectral Bands and “Overall” Denotes the Average Results of “MS” and “SAR”

TABLE II Comparison Experiments on NIQE, BRISQUE, and PIQE Against Wavelet, GS, HSV, NND, PCA, Improved Wavelet, AEF, and CCFL Approaches. Bold and Underline Represent the Best and the Second-Best Performance, Respectively. “Overall” Denotes the Average Results of Three Bands

Fig. 2.

Qualitative comparison experiment demonstration of our pseudo-color fusion methodology, with the methods based on Wavelet, GS, HSV, NND, PCA, Improved Wavelet, AEF, CCFL together with the original SAR image, multispectral image, and image fused with Brovey method.

Show All

Fig. 3.

Show All

First, we discuss the subjective visual evaluation of all the fusion techniques. As we discussed, the Brovey images are highly similar to the SAR images and contain a small amount of spectral information (namely, color information in the RGB color space). The HSV images introduce severe scattered noise, and the PCA images cause serious color distortion from the perspective of the RGB color space. Although the images fused using the wavelet method and the improved wavelet method show significant differences from the original SAR and multispectral images, the wavelet images still maintain a high degree of recognition in terms of color, texture, resolution, and so on, and we will further discuss the results of the wavelet methods in the consecutive quantitative analysis. GS, NND, and our proposed approach have better fidelity to the original heterologous information. Nevertheless, the speckle white noises from the SAR images remain in the GS and NND images. Our method removes these noise points effectively, making the resulting images smooth, highly observable, and free of noise pollution. The CDL-based methods (AEF and CCFL) present similar results to our method but present some blurring and distortion in the brighter regions. In contrast, our method does well in brighter regions. In summary, our proposed pseudo-color fusion method achieves superior performance from the above visual evaluation.

Subsequently, we discuss the quantitative analysis of the above fusion methods. Regarding the degree of spectral distortion, our proposed fusion technique performs the best compared to all comparison approaches, proving that it possesses the most spectral fidelity. Like the visual evaluation, PCA and HSV methods bring significant spectral distortions, while NND and GS present less spectral fidelity than ours. It is worth mentioning that the wavelet-based fusion approaches also have a good performance in spectral fidelity. AEF and CCFL also perform well in spectral fidelity, slightly lower than our solution.

The correlation coefficient evaluates the similarity between the fused and source images. We measure the correlation coefficient with multispectral and SAR images. The average correlation coefficients are also obtained to measure which fusion approach retains the most information from the two heterologous images. HSV and PCA still suffer from noises and distortion and perform poorly in the comparison study. Despite PCA achieving a relatively high correlation with SAR images, HSV and PCA perform poorly in all other metrics. The wavelet and improved wavelet methods maintain a high correlation with multispectral images, but they also hold a low correlation with SAR images. Thus, it should not be considered the excellent intermediate representation of different modalities in our scene. GS, NND, AEF, CCFL, and our method fall into the final round. These three methods achieve excellent and similar performance in retaining information from the heterologous source. Regarding the correlation coefficient, NND shows the best performance, while our proposed methods achieve the second. Although our method does not achieve the best performance on any single band, our overall correlation coefficient ranks second, proving that our method can effectively aggregate multisource information. The comparison in terms of mse is straightforward, and our proposed pseudo-color fusion method achieves the best reconstruction mse, followed by CCFL, NND, GS, wavelet, improved wavelet, and AEF, in order. HSV and PCA still have the worst performance in mse.

Finally, our method achieves remarkable performance among all no-reference image quality assessment approaches, only second to CCFL in PIQE. In the performance of NIQE and BRISQUE, our method is always in the first place. The excellent values of NIQE and BRISQUE prove that our image has exceptional performance under the reference of a statistical model based on a large external natural scene corpus, indicating that the features of the image fused by our method are closer to the feature benchmark established by NSS. The high performance on PIQE proves that our solution can effectively prevent distortion between the local patches.

To summarize, a comprehensive analysis of multiple metrics is needed to judge their performance due to the complexity and particularity of remote sensing data. The wavelet fusion method has excellent nonredundancy, reconstruction, and detailed texture retention ability. However, the obtained images are too close to the multispectral images, which is unsuitable for our requirement of optimizing the intermediate representation. The fusion results of HSV and PCA are also not satisfactory. HSV replaces the converted luminance value band with SAR images to preserve the details of SAR images. Because of the differences in spectral reflectance curves in different bands, HSV will inevitably produce spectral degradation, distortion, and noise during component replacement, leading to terrible fusion results. The PCA fusion algorithm shall divide multispectral images into different levels of principal components and simply replace the first one with the SAR images. It applies to all bands of the multispectral images. However, because of simply removing the first principal component, PCA will lose some information reflecting spectral characteristics, which results in severe spectral distortion. It does not fully consider the features from different levels of principal components. NND and GS show similar but slightly lower performance than our proposed method, hindered by some speckle noise and distortion. The two CDL-based solutions (AEF and CCFL) both achieve outstanding performance and similar visual results to our solution but still perform lower than our solution based on the comprehensive comparisons. Our proposed method attains superior qualitative and quantitative performance, realizing the best performance in the degree of spectral distortion, mse, NIQE, and BRISQUE, and the second best in the correlation coefficient and PIQE. Visual evaluation of our proposed method also indicates excellent noise removal and distinctive texture detail preservation. These experimental results demonstrate the superior performance of our proposed pseudo-color fusion method.

SECTION V.

Conclusion

This article proposes a novel pseudo-color fusion algorithm for remote sensing applications based on CDL and Brovey preprocessing. The method performs further processing on the multispectral image and the preprocessed Brovey image, capturing the relevant information of both images with CDL and reconstructing the images with the reconstruction errors of joint sparse representation. The results show that the fused images trade off the linear relationship between SAR and multispectral images, optimizing their intermediate representation. Our methodology shows exceptional performance in all objective metrics and subjective visual evaluation. However, a significant limitation of our method lies in its application to different modalities. Our approach primarily focuses on the fusion of single- and three-channel images, achieving high spatial and spectral resolution in the fused image. Therefore, in similar applications such as PEI and MRI, visible and infrared images, where the combination of structural details from single-channel images and spectral information from multichannel images is required, there are similarities to SAR and multispectral image fusion. Hence, our method holds potential in these areas. However, its performance may not be satisfactory when applied to multifocus images. Future work will involve: 1) validating our method on image fusion tasks involving different modalities; 2) exploring the potential of combining our CDL and sparse representation method with existing fusion methodologies to optimize performance; and 3) assessing the applicability of our fusion methodology in downstream remote sensing tasks, such as remote sensing object segmentation and detection.

References is not available for this document.

Joint Sparse Representations and Coupled Dictionary Learning in Multisource Heterogeneous Image Pseudo-Color Fusion

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

Introduction

Related Works

Methodology