Journals & Magazines >IEEE Journal of Selected Topi... >Volume: 16

Pansharpening Using Unsupervised Generative Adversarial Networks With Recursive Mixed-Scale Feature Fusion

Abstract:

Panchromatic sharpening (pansharpening) is an important technology for improving the spatial resolution of multispectral (MS) images. The majority of the models are imple...Show More

Topic: Radiation Modeling and Remote Sensing

Metadata

Abstract:

Panchromatic sharpening (pansharpening) is an important technology for improving the spatial resolution of multispectral (MS) images. The majority of the models are implemented at the reduced resolution, leading to unfavorable results at the full resolution. Moreover, the complicated relationship between MS and panchromatic (PAN) images is often ignored in detail injection. For the mentioned problems, unsupervised generative adversarial networks with recursive mixed-scale feature fusion for pansharpening (RMFF-UPGAN) are modeled to boost the spatial resolution and preserve the spectral information. RMFF-UPGAN comprises a generator and two U-shaped discriminators. A dual-stream trapezoidal branch is designed in the generator to obtain multiscale information. Further, a recursive mixed-scale feature fusion subnetwork is designed. Perform a prior fusion on the extracted MS and PAN features of the same scale. A mixed-scale fusion is conducted on the prior fusion results of the fine scale and coarse scale. The fusion is executed sequentially in the aforementioned manner building a recursive mixed-scale fusion structure, and finally, generating key information. A compensation information mechanism is also designed for the reconstruction of key information to compensate for information. A nonlinear rectification block for the reconstructed information is developed to overcome the distortion induced by neglecting the complicated relationship between MS and PAN images. Two U-shaped discriminators are designed and a new composite loss function is defined. The presented model is validated using two satellite data and the outcomes reveal better than the prevalent approaches regarding both visual assessment and objective indicators.

Topic: Radiation Modeling and Remote Sensing

Published in: IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing ( Volume: 16)

Page(s): 3742 - 3759

Date of Publication: 20 March 2023

ISSN Information:

DOI: 10.1109/JSTARS.2023.3259014

Funding Agency:

Contents

CCBY - IEEE is not the copyright holder of this material. Please follow the instructions via https://creativecommons.org/licenses/by/4.0/ to obtain full-text articles and stipulations in the API documentation.

SECTION I.

Introduction

Remote sensing images are extensively utilized in geological exploration, terrain classification, agricultural yield prediction, pest detection, disaster prediction, national defense, environmental change detection, and so on [1], [2]. In these applications, images with high spatial resolution, high spectral resolution, or high temporal resolution are required. However, due to the limitations of sensor technology, we obtain low spatial resolution multispectral or hyperspectral (LRMS/LRHS) images, low temporal resolution multispectral or hyperspectral images, and low spectral resolution panchromatic (PAN) images [3], [4]. This requires fusion technology to fuse LRMS and PAN images to generate high spatial resolution multispectral (HRMS) images. This fusion technology is called panchromatic sharpening (pansharpening). The pansharpening techniques are generally divided into component substitution (CS) approaches, multiresolution analysis (MRA) techniques, variational optimization (VO) methods, and deep learning (DL) models [1], [5], [6].

CS techniques primarily involve intensity-hue-saturation (IHS) and variants [7], Gram–Schmidt (GS) [8], GS adaptive (GSA) [9], principal component analysis (PCA) [10], and band-dependent spatial detail (BDSD) [11]. First, the LRMS images are projected into another spatial domain, then the spatial structure information is extracted and replaced by the high resolution image. Finally, the image is inversely transformed into the original space to obtain the fused image. The strengths of the CS are simplicity, extensive application, integration in individual software, easy implementation, and greatly enhancing the spatial resolution of LRMS images. The drawbacks involve spectral distortion, oversharpening, aliasing, and fuzzy problems.

MRA approaches principally include the smoothing-filter-based intensity modulation (SFIM) [12], Laplacian pyramid (LP) transform [13], generalized LP (GLP) transform [14], curvelet transform [15], contourlet transform [16], nonsampled contourlet transform (NSCT) [17], and modulation transfer function-GLP (MTF-GLP) transform and variants [7]. The MRA approaches decompose the LRMS and PAN images, then fuse them through some rules and generate the fused images by inverse transformation. Compared with CS methods, MRA can preserve more spectral information and reduce spectral distortion, but their spatial resolution is relatively low.

VO methods can be divided into two parts: energy function and optimization methods. The core is the optimization of the variational model, such as the panchromatic and multispectral image (P+XS) model [18], the nonlocal variational panchromatic sharpening model [19], and the others [7], [20]. Compared with the CS methods and MRA methods, the VO methods have higher spectral fidelity, but the calculations are more complex.

Convolutional neural networks (CNNs) and generative adversarial networks (GANs) have been widely applied in image processing. Some achievements have been made in the pansharpening of remote sensing images. Early on, pansharpening by CNN (PNN) with three layers was designed [21] based on the superresolution reconstruction. The nonlinear mapping of the CNN is employed to generate HRMS images by feeding LRMS and PAN image pairs into the PNN. The PNN is relatively simple and easy to implement, but it is prone to overfitting. Subsequently, the target adaptive CNN (TA-CNN) [22] was modeled, which utilizes the target adaptive adjustment stage to solve the problems of mismatched data sources and insufficient training data. Yang et al. [23] presented a deep pansharpening network based on ResNet modules, i.e., PanNet, employing the high-frequency information of LRMS and PAN images as the input and outputting the residual between HRMS and LRMS images. Nevertheless, the PanNet overlooks the low-frequency information, causing spectral distortions. Wei et al. [24] modeled a deep residual pansharpening neural network (DRPNN), implemented on the ResNet block. Although the DRPNN is realized by using the powerful nonlinear capability of the CNN, the number of samples required should increase with increasing network depth to avoid overfitting. Regarding training in the spatial domain, the generalization ability of the model still needs to be improved. Deng et al. [25] proposed the FusionNet model based on a CS and MRA detail injection model. The injection details are obtained with a deep CNN (DCNN). Difference from other networks, the input of the network is the difference between PAN images, which are copied to the same number of channels as the LRMS images, and LRMS images. Thus, this network can introduce multispectral information and reduce spectral distortion. Hu et al. [26] proposed a multiscale dynamic convolutional neural network (MDCNN). This MDCNN mainly contains three modules: a filter generation network, a dynamic convolution network, and a weight generation network. The MDCNN uses multiscale dynamic convolution to extract multiscale features of LRMS and PAN images and designs a weight generation network to adjust the relationship between features at different scales to improve the adaptability of the network. Although dynamic convolution improves the flexibility of the network, the network design is more complicated. Simultaneously extracting the features of LRMS and PAN images, the network tends to reduce the effective detail information and spectral information. Wu et al. [27] proposed RDFNet based on a distributed fusion structure and residual module, which extracts multilevel features of LRMS and PAN images, respectively. Then, the corresponding level MS and PAN image features and the fusion result of the previous step are fused gradually to obtain HRMS images. Although the network uses multilevel LRMS and PAN features as much as possible, it is affected by the depth of the network and cannot obtain more details and spectral information. Wu et al. [28] also designed TDPNet based on the cross-scale fusion and multiscale detail compensation. GAN offers great potential for generating images [5]. Shao et al. [29] presented a supervised conditional GAN comprising a residual encoder–decoder, i.e., RED-cGAN, which enhances the sharpening ability with the restriction of PAN images. Liu et al. [30] developed a deeply CNN-based pansharpening GAN, i.e., PsGAN, consisting of a dual-stream generator and a discriminator, which distinguishes the generated MS image from the reference image. Benzenati et al. [31] introduced a detail injection GAN (DIGAN) constructed by a dual-stream generator and a relativistic average discriminator. RED-cGAN, PsGAN, and DIGAN are supervised approaches trained on degraded resolution data, nevertheless, the products are not satisfactory for applying to full-resolution data. Ozcelik et al. [32] constructed a self-supervised learning framework considering pansharpening as a colorization, i.e., PanColorGAN, which reduces blurring by color injection and random-scale downsampling. Li et al. [33] put forward a self-supervised approach using a cycle-consistent GAN trained on reduced resolution data, which builds two generators and two discriminators. The LRMS and PAN images are fed into the first generator to yield the predicted image, and then, the predicted image is input to the second generator to acquire the PAN image, which remains consistent with the input PAN. Regarding the problem of having no reference HRMS images, some unsupervised GANs were presented. Ma et al. [34] suggested an unsupervised pansharpening GAN (Pan-GAN) composed of a generator and two discriminators (a spectral discriminator and a spatial discriminator). The generator produces HRMS images with concatenated MS and PAN images. The spectral discriminator is adopted to judge the spectral information between HRMS and LRMS images, and to produce HRMS data with the consistent spectrum of LRMS data. The spatial discriminator discerns the spatial information between the HRMS and PAN images, enabling the generated HRMS image to agree with the spatial information of the PAN image. Pan-GAN uses two discriminators to better retain spectral information and spatial structure information and solves the problem of the ambiguity caused by downsampling in the supervised training process. However, the input is the concatenated MS and PAN images, resulting in insufficient details and spectral information. Zhou et al. [35] proposed an unsupervised dual-discriminator GAN (PGMAN), which utilizes a dual-stream generator to yield HRMS and two discriminators to retain spectral information and details individually. Pan-GAN and PGMAN are trained on the original data directly with no reference images, which obtains better results at full resolution, but the results on the degraded resolution data are not desirable. This reveals the poor generalization ability of the models.

Although various scholars have proposed a variety of pansharpening networks and achieved certain fusion results, a majority of the models are trained on reduced resolution data, which exhibits problems of spectral distortion and loss of details in fusing the full-resolution data due to changes in resolution. Moreover, in the detail injection model, the details are directly added to the upsampled MS image, ignoring the complicated relationship between the MS image and the PAN image, which is likely to lead to spectral distortion or ringing. For the mentioned problems, unsupervised GANs with recursive mixed-scale feature fusion for pansharpening (RMFF-UPGAN) are modeled to boost the spatial resolution and preserve the spectral information, which is trained on observed data without reference images. The main contributions of this article are as follows.

A dual-stream trapezoidal branch is designed in the generator to obtain multiscale information. We employ a ResNeXt block and residual learning block to obtain the spatial structure and spectral information of four scales.
A recursive mixed-scale feature fusion structure is designed by executing prior fusion and mixed-scale fusion sequentially and generates key information.
A compensation information mechanism is also designed for the reconstructing of key information to compensate for information.
A nonlinear rectification block for the reconstructed information is developed to overcome the distortion induced by ignoring the complicated relationship between MS and PAN images.
Two U-shaped discriminators are designed and a new composite loss function is defined to better preserve spectral information and details.

The rest of this article is organized as follows. Section II describes related work. Section III describes the proposed model in detail. Section IV introduces datasets, evaluation indicators, experimental settings, and comparative experiments. Finally, Section V concludes this article.

SECTION II.

Related Work

A. MRA-Based Detail Injection Model

MRA methods [36], [37] are a class of image fusion methods and are particularly common in the field of remote sensing images. These methods have good multiscale spatial frequency decomposition characteristics, singularity structure representation abilities, and visual perception characteristics. The implementation form of the efficient filter bank of the wavelet provides the possibility for processing large-scale remote sensing image fusion. Based on MRA methods, first, the image is decomposed into a low-frequency component and a high-frequency component by some decomposition method, and then, the high-frequency component and low-frequency component are fused by a fusion method. Finally, the fused high-frequency component and low-frequency component are reconstructed by inverse transformation to generate the fused image. An MRA-based detail injection model can be represented by a general detail injection framework, as shown in the following expression: $\begin{equation*} \hat{F}_{k}\,=\,\uparrow M_{k}+g_{k}\left(P-P_{L}\right) \quad k=1,2,\ldots, N \tag{1} \end{equation*}$ View Sourcewhere $\hat{F}_{k}$ represents the $k$ th-band fused HRMS image, $\uparrow M_{k}$ represents the $k$ th-band upsampled LRMS image, $g_{k}$ is the $k$ th-band detail injection gain, $P$ represents the PAN image, $P_{L}$ is the low-frequency component of the PAN image, and $N$ is the number of bands of the MS image.

B. ResNeXt

Xie et al. [38] proposed a ResNeXt structure, which is an improvement of ResNet [39]. The network uses group convolution to reduce the network complexity and improve the expression ability. The core of ResNeXt is the proposal of cardinality, which is used to measure the complexity of the model. ResNeXt proves that in the case of similar computational complexity and model parameters, increasing the cardinality can achieve better expression ability than increasing the depth or width of the network. The ResNeXt network structure [38] takes advantage of the split-transform-merge idea. However, the convolution operation of each topology is the same, which reduces the computational complexity. The mathematical expression is as follows: $\begin{equation*} y=x+\sum _{i=1}^{C} \mathcal {T}_{i}(x) \tag{2} \end{equation*}$ View Sourcewhere $C$ is the cardinality, i.e., the number of identical paths; $x$ represents the input and $y$ represents the output; and $\mathcal {T}_{i}()$ represents the function of the $i$ th path.

SECTION III.

Methodology

RMFF-UPGANs are modeled to improve spatial resolution and retain spectral information. RMFF-UPGAN is trained directly using the raw full-resolution data to decrease the effect of resolution variation on results. The overall architecture of the RMFF-UPGAN is illustrated in Fig. 1 and is composed of one dual-stream generator and two U-shaped relative average discriminators (i.e., $\rm U-RaLSD_{pe}$ and $\rm U-RaLSD_{pa}$ ). In Fig. 1, M and P stand for the raw MS and PAN images, and $\uparrow M$ refers to the upsampled MS image, $\rm HM$ is the fused image. For the generator, first, a dual-stream trapezoidal branch is designed to obtain multiscale information. A ResNeXt block extracts low-level semantic information of fine-scale and residual learning blocks extract high-level semantic information of mesoscale and coarse-scale to obtain the spatial structure and spectral information of four scales. Second, a recursive mixed-scale feature fusion subnetwork is designed via residual learning. Perform a prior fusion on the extracted MS and PAN features of the same scale. A mixed-scale fusion is conducted on the prior fusion results of the fine scale and coarse scale. The fusion is executed sequentially in the aforementioned manner building a recursive mixed-scale fusion structure and finally generating key information. Then, the key information is reconstructed and a supplemental information structure is also designed for the reconstruction of key information to compensate for information. Finally, a rectification block for the reconstructed information is developed to obtain the fused image, which overcomes the distortion induced by neglecting the complicated relationship between MS and PAN images. Two U-shaped discriminators are designed to better preserve spectral information and details. The $\rm U-RaLSD_{pa}$ discriminator differentiates the details of the $\rm HM$ image from the details in the P image and prompts the details of the $\rm HM$ to be consistent with that in the P image. The $\rm U-RaLSD_{pe}$ discriminator is applied to distinguish the spectral information of the $\rm HM$ from the spectral information in the M image, which drives the spectral information of the $\rm HM$ to be consistent with that in the M image.

Fig. 1.

Implementation framework of the RMFF-UPGAN.

MIT Libraries

MIT Libraries

Pansharpening Using Unsupervised Generative Adversarial Networks With Recursive Mixed-Scale Feature Fusion

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

Introduction

Related Work

A. MRA-Based Detail Injection Model

B. ResNeXt

Methodology

A. Dual-Stream Generator

1) Dual-Stream Trapezoidal Multiscale Feature Extraction (DSTMFE)

2) Recursive Mixed-Scale Feature Fusion

3) Dual-Stream Multiscale Feature Reconstruction

4) Reconstructed Information Rectification

B. U-Shaped Relative Average Least-Squares Discriminator

C. Composite Loss Function

Experimental Results

A. Datasets

B. Quality Evaluation Metrics

C. Implementation Details

D. Reduced Resolution Experiments

E. Full-Resolution Experiments

F. Ablation Experiment

1) Effectiveness of the ResNeXt Module

2) Validity of the CIM

3) Effectiveness of the RIRB

4) Validness of U-Shaped Discriminator

G. Computation and Time

Conclusion

References

IEEE Account

Purchase Details

Profile Information

Need Help?