Introduction
Earth observation from space is a hot topic requiring the development of instruments and satellite platforms to face with increasing need for data with worldwide coverage. Unfortunately, remote sensing optical sensors show a tradeoff among spatial and spectral resolutions, and signal-to-noise ratio (SNR). These tradeoffs cannot be solved through hardware solutions capturing high spatiospectral representations of the Earth without strongly penalizing the SNR. Thus, simultaneously acquiring more than one representation of the Earth with sensors showing different (often opposite) features is usually considered in the design of a payload for a satellite. Hence, in optical remote sensing, it is common to see devices having high spatial resolution but with limited spectral bands [e.g., panchromatic (PAN)] working together with high spectral resolution but with lower spatial resolution sensors (e.g., multispectral (MS)/hyperspectral (HS) ones capturing tens/hundreds of spectral bands, respectively). Starting from the images acquired by these systems, researchers are developing software-based solutions to combine these data to get the best from each source of information. The most dated and widely used framework relies upon the fusion of PAN and MS images. Pansharpening, which stands for PAN sharpening, refers to these approaches [1], [2]. Other powerful examples of these techniques, often representing an extension of the pansharpening concept, are the HS pansharpening ones [3], involving HS data to be enhanced by PAN images, and MS and HS image sharpening methodologies [4], which exploit MS data to provide high spatial resolution information to sharp an HS cube.
HS images acquired by sensors onboard of satellite platforms are widely used for several tasks (see, e.g., classification and object detection) due to their very appealing spectral features. However, a limitation is represented by the spatial resolution of these data that are rarely finer than 30 m. Thus, cutting-edge research can be found in the literature fusing HS with PAN images to improve it (i.e., the so-called HS pansharpening). The relevance of this topic can be demonstrated by the recent scientific production summarized in review papers as [3], and the last HS pansharpening challenge [5] organized in conjunction with the 12th WHISPERS (Workshop on Hyperspectral Image and Signal Processing: Evolution in Remote Sensing), which leverages on datasets from the PRecursore IperSpettrale della Missione Applicativa (PRISMA) system, owned and managed by the Italian Space Agency.
The first attempt of fusing real HS and PAN images captured by Hyperion/Advanced Land Imager (ALI) dated back to 2007 [6], where an optimized component substitution (CS) approach inspired by the literature of MS pansharpening [7], [8] has been proposed. Afterward, many methods borrowed from the classical MS pansharpening literature have been tested on the HS and PAN fusion problem. A pioneering work has been presented in [1], where several CS, e.g., [9], [10], and multiresolution analysis (MRA), e.g., [11], [12], approaches designed for MS pansharpening have been considered. In the abovementioned paper, an interesting study about the comparison between the fusion of HS and PAN images acquired by the same or different platforms has been proposed. Moreover, other interesting approaches, originally designed for fusion of high-resolution PAN and low-resolution MS data, have been adapted to the HS pansharpening case. These can be roughly cast in Bayesian [13], [14], [15], [16], matrix factorization [17], [18], [19], and variational [20] methods. In 2015, many of these methods and others have been compared in an extensive review [3]. Besides, several solutions specifically conceived for the HS pansharpening problem have also been proposed. Notable examples are the use of guided filters [21], variational approaches such as [22] and [23], and the saliency analysis-based CS method proposed in [24].
Recently, the number of research papers about sharpening of optical remote sensing data has dramatically grown, in particular related to the use of deep learning [25], [26], [27], [28], [29]. This trend is confirmed even for the HS pansharpening. Indeed, in 2019, a new HS pansharpening framework via spectrally predictive convolutional neural networks (CNNs) was proposed in [30] to strengthen the spectral prediction capability of a pansharpening network. Subsequently, a dual-attention residual network upsampling the HS image using a deep HS prior module has been considered in [31]. In [32], a new spectral-fidelity CNN for HS pansharpening has been developed to control the spectral distortion of fused products and to progressively synthesize spatial details. Furthermore, a novel CNN-based method for arbitrary resolution HS pansharpening based on a two-step relay optimization process has been proposed in [33]. On the same research line, an arbitrary scale attention upsampling module has been introduced in [34]. Thus, in [35], an overcomplete residual network, which is focused on learning high-level features by constraining receptive fields of deep layers, has been designed together with a new spatial-domain constraint between the PAN and its predicted version. An unsupervised HS pansharpening method via ratio estimation and residual attention network has been described in [36]. A multistage dual-attention guided fusion network has been considered in [37] employing a three-stream structure and fusing the extracted features through a dual-attention guided fusion block. In [38], it is proposed a deconvolution long short-term memories network with bidirectional learning for HS upsampling and spatial–spectral reconstruction based on a two-branch divide-and-conquer network. A generative super-resolution network combined with a segmentation-based injection gain estimation [39], [40] is instead proposed in [41]. Finally, a deep CNN exploiting Gaussian–Laplacian pyramids for pansharpening has been presented in [42]. Following the same idea of multiresolution fusion, in [43], a multiresolution spatial–spectral feature learning has been proposed transforming the existing deep (and complex) network into several simple and shallow subnetworks to simplify the learning process and using multiresolution 3-D convolutional autoencoder networks to learn spatial–spectral HS features.
The 2022 WHISPERS contest on HS pansharpening was held with the goal of providing a picture of the state-of-the-art on the topic, also in light of the recent advances on deep learning, to pave the way for better solutions. Unfortunately, none of the competitors achieved convincing results compared to the baseline methods and no winners were declared by the organizing committee. Actually, a careful inspection of the outcomes reveals that a critical bottleneck was the limited capacity of the proposed solutions to generalize moving from synthetic reduced-resolution datasets (ground truth (GT) available) to real full-resolution ones (GT unavailable). Indeed, this is one of the main problems already encountered in the case of MS image pansharpening using deep learning, motivating the development of unsupervised training solutions [44], [45], [46], [47], [48]. In fact, unsupervised learning procedures do not require GTs, with no need to do synthetic (downgrading) resolution shifts on data. An attempt to follow this same path for the HS case can be found in [36]. However, the wide variability of observed images, due to diversity of sensors, scenes, and operating conditions, still prevents from generalizing well to data not seen during training. In computer vision, this problem is usually solved by increasing the training set and by using suitable forms of augmentation [49]. Such solutions are hardly viable in remote sensing using HS images, due to the scarcity of high-quality training data (often proprietary) and the peculiarities of HS imaging, including the data volume per ground surface unit. Besides, compared to the MS case, in the HS case, the resolution ratio is typically higher and the spectral coverage is much denser and wider, exceeding the PAN bandwidth by far, causing further ill-posedness issues. Furthermore, the number of spectral bands is a specific feature of the HS sensor and can even change from one image to another of the same sensor, because of acquisition errors that can make useless subsets of bands. A solution for handling a variable number of bands [50], based on a single pretrained model, has been proposed in [51].
To cope with the above issues, in this work, we propose a new HS CNN-based pansharpening method, which regards the HS datacube as a chain of individual bands to be sequentially pansharpened. This is achieved by leveraging on a lightweight single-band pansharpening network operating in adaptive mode, whose optimized parameters for a given band are used as the starting point for the self-adaptive inference step of the next spectral band. By doing so, we bridge the model parameters for adjacent bands to some extent, simplifying the adaptation task due to their expected correlation. Both (pre)training and tuning iterations for target adaptation are run at full resolution due to a suitably defined unsupervised loss comprising both spectral and spatial consistency terms. It is worth to observe that the proposed solution is not just a divide-and-conquer solution based on the split of the HS datacube in batches of bands. In fact, the tuning-based protocol for bridging the models of adjacent bands depicts a completely new framework, where any baseline single-band pansharpening network, as well as any unsupervised loss, can be straightforwardly integrated.
Specifically, the advantages of the proposed solution, hereinafter referred to as rolling HS pansharpening neural network (R-PNN), with respect to the state-of-the-art are the following. All but the first spectral band do not need pretrained parameters, as they are inherited from the previous one. In this way, the model progressively and adaptively learns exclusively on the target image, with a limited computational cost due to several design choices such as lightweight architecture, band-wise processing, model propagation, and adaptive distribution of the tuning iterations. Also, the method is not subject to cross-resolution generalization issues as it learns at the target resolution (being unsupervised). More in general, the generalization is not an issue because the network learns directly on the target image, dynamically fitting its parameters to it. Finally, the sequential structure of the method allows to handle an arbitrary number of spectral bands, not necessarily uniformly sampled.
The proposed solution has provided state-of-the-art results on all the considered datasets, both full- and (surprisingly) reduced-resolution ones, consistently outperforming all the competitors. The above-discussed properties combined with the good obtained results make the proposed method very attractive for its use in practical real-world applications. In this perspective and to ensure full reproducibility of our research outcomes, the code of the proposed method is shared at https://github.com/giu-guarino/R-PNN.
In summary, the main contributions of this work are given as follows:
a new unsupervised CNN-based HS pansharpening approach based on a band-wise model propagation protocol;
a new unsupervised spectral–spatial consistency loss for PAN–HS pairs;
a target-adaptive solution for the PAN–HS fusion problem;
state-of-the-art results on both reduced-resolution synthetic data and full-resolution real (PRISMA) data.
The remaining of this article is organized as follows. Section II presents the proposed solution. Section III describes datasets, quality assessment indexes, and reference methods. Section IV presents an experimental analysis aimed to support and validate several design choices. Finally, Section V gathers and discusses comparative numerical and visual results with conclusion given in Section VI.
Proposed Method for HS Pansharpening: R-PNN
Compared to the more familiar case of pansharpening of MS images, in the case of HS data, the generalization of deep learning models is a more critical issue for several reasons:
lesser training datasets;
increased spectral information;
low or no correlation between the PAN and many spectral bands to be super-resolved;
variable number of bands, even for the same sensor, due to acquisition issues (different bands may be discarded for quality reasons).
In the following, we will first detail the single-band tuning/inference block, and then, we will describe the overall high-level scheme, before providing details about the core CNN network and the loss.
A. Single-Band Pansharpening With Tuning
Fig. 1 provides a high-level description of the tuning loop for the generic
CNN-based single-band unsupervised tuning block for pansharpening. The module takes in input the two images to fuse (the
The tuning starts from the initial network parameters
B. High-Level Model Propagation Scheme
The overall tuning-prediction chain involving all \begin{align*} N_{b} = \begin{cases} \displaystyle 20, & b = 1 \\ \displaystyle \min \left ({\alpha \Delta \lambda _{b}, 80 }\right), & b>1 \end{cases} \tag{1}\end{align*}
Unsupervised rolling adaptation scheme for HS image pansharpening. Each tuning module (detailed in Fig. 1) inherits the initial weights,
The proposed scheme allows for a drastic reduction in the computational load required for parameter tuning due to the stronger correlation expected between closer band pairs. Notice that, differently from more common training/tuning configurations, where minibatches of small example patches are formed, here, following the tuning scheme proposed in [52] and [47], we have a unique training batch containing the whole target image. Indeed, recent variants of this scheme [48], [53] propose sampling rules to keep limited the computational burden in the case for very large datasets. In this work, however, the sizes of the interested datasets were such that no sampling was needed.
C. Network
The key characteristics of the proposed approach are its adaptivity to the target image and the band-wise, chained, modality. For these reasons, it makes perfect sense to look at lightweight network architectures to preserve the nimbleness of the proposed solution. Therefore, we decided to rely on a shallow three-layer residual model similar to the one proposed in [52] for the classical pansharpening of four- or eight-band MS images. It is composed of three sequential convolutional layers, interleaved by ReLU activations, with a parallel global skip connection that brings the spectral input band to be pansharpened (already interpolated to fit the PAN size) directly to the exit (by sum) of the third convolutional layer. In detail, the hyperparameters of the network are given in Table I. The most relevant differences compared to the CNN architecture [52] are the number of spectral bands, just one instead of 4/8, and the resolution ratio, which is 6 for PRISMA datasets.
D. Unsupervised Spatial–Spectral Consistency Loss
The proposed model leverages on a band-wise unsupervised loss, \begin{equation*} {\mathcal{ L}}^{\left ({b}\right)} = {\mathcal{ L}}_{\lambda} \left ({\widehat { \mathbf {M}}_{b}, \mathbf {M}_{b}}\right)+ \beta {\mathcal{ L}} _{S}\left ({\widehat { \mathbf {M}}_{b}, \mathbf {P}}\right)\quad \forall b. \tag{2}\end{equation*}
\begin{equation*} {\mathcal{ L}}_{\lambda} \left ({\widehat { \mathbf {M}}_{b}, \mathbf {M}_{b}}\right) = \left \|{ \widehat { \mathbf {M}}_{b}^{\left ({\mathcal {D}}\right)} - \mathbf {M}_{b} }\right \|_{1} \tag{3}\end{equation*}
\begin{equation*} \widehat { \mathbf {M}}_{b}^{\left ({\mathcal {D}}\right)}\left ({n,m}\right) \triangleq \widehat { \mathbf {M}}^{\mathrm{ LPF}}_{b}\left ({n_{0}+6n, m_{0}+6m}\right) \tag{4}\end{equation*}
\begin{equation*} \rho ^{\sigma} _{ \mathbf {X} \mathbf {Y} }\left ({i,j}\right) = \frac {\rm Cov\left ({\mathbf {X}_{w\left ({i,j}\right)}, \mathbf {Y}_{w\left ({i,j}\right)}}\right)}{\sqrt {\rm Var\left ({\mathbf {X}_{w\left ({i,j}\right)}}\right){\mathrm{ Var}}\left ({\mathbf {Y}_{w\left ({i,j}\right)}}\right)}} \tag{5}\end{equation*}
\begin{equation*} {\mathcal{ L}}_{S} = \left \langle{ \left |{ \rho ^{\mathrm{ max}}\left ({i,j}\right) - \rho ^{\sigma} _{ \mathbf {P}\widehat { \mathbf {M}}_{b}}\left ({i,j}\right) }\right | }\right \rangle _{i,j} \tag{6}\end{equation*}
Data, Quality Assessment, and Methods
The main goal of the proposed work was to develop a new data-driven method for HS pansharpening that can outperform the state-of-the-art recently presented in the paper on the HS pansharpening challenge at IEEE WHISPERS 2022 [5]. Leveraging on this claim, our experiments relied upon datasets, quality assessment procedures, and comparative methods exploited in the abovementioned challenge and briefly described in the rest of this section. Besides, to enrich the comparative assessment, four additional deep learning solutions have also been enclosed [30], [32].
A. Datasets
Despite the development of new approaches for HS pansharpening, most of them have been tested on simulated data neglecting an assessment using real data at full resolution. To overcome this limitation, PRISMA data have been distributed after the end of the WHISPERS challenge. Four datasets (the ones used for the contest) both at reduced and full resolutions have been shared. Each dataset comprises a PAN component and an HS image. The spatial resolution of the PAN image is 5 m. Instead, the HS sensor acquires about 250 spectral bands with a spatial resolution of 30 m. Before the announcement of the contest, only very few works on HS pansharpening of PRISMA images have been published. An application-oriented work using pansharpened PRISMA data has just been presented in 2021 [54]. Thus, the goal of the challenge has been to boost the research on HS pansharpening pushing researchers toward using new data, thus addressing new challenges: for example, the tradeoff between computational cost (critical for images with hundreds of bands) and fusion performance or other peculiarities related to the HS pansharpening problem, such as the scale ratio different from 4 (that is instead widely used for MS pansharpening), the effects of a residual space-varying registration error between PAN and HS images, and the fusion of an elevate and sensor-dependent number of bands, which sometimes show low SNRs. Four teams accepted the challenge proposing innovative solutions that relied upon machine learning and variational optimization-based methodologies. Despite the use of state-of-the-art techniques, the four participating teams did not get outstanding results compared to the baseline and, for this reason, the organizing committee decided to close the contest and claim it was inconclusive (no winner).
In Table II, some characteristics of the images are reported, while Fig. 3 shows the data of the challenge. More specifically, four datasets are distributed, i.e., FR1 and FR2 for the assessment at full resolution, and RR1 and RR2 for the assessment at reduced resolution. In this work, a further dataset (i.e., FR0) has been exploited for validation purposes and to generate the initial weights of the proposed model.
B. Accuracy Indexes
Assessing the performance of image fusion products is still an open issue, given the lack of full-resolution GTs. A widespread approach is to rely on the so-called synthesis property of Wald’s protocol [55]. The implementation of the abovementioned protocol is based on a proper downsampling of the available data, under the hypothesis of invariance among scales of pansharpening algorithm performance [56]. Hence, the original HS data play the role of GT on which to measure the similarity with the fused product obtained by combining the degraded versions of the original PAN and HS images. The higher the similarity, the better the performance. This similarity degree can be evaluated through multidimensional score indexes [56].
The
[57] is the multidimensional extension of the universal image quality index (UIQI) [58]. The upper bound of the index is one, even representing the optimal value.Q2^{n} The spectral angle mapper (SAM) [59] determines the spectral similarity (usually in degree) between the fused and the reference spectra. It is measured pixel-by-pixel and averaged over the whole image. The optimal value is zero.
ERGAS [60] is a French acronym that stands for Erreur Relative Globale Adimensionelle de Synthèse (dimensionless global relative error of synthesis). It is a normalized dissimilarity index (multidimensional extension of the root-mean-square error) that measures the radiometric distortion of the fused product with respect to the reference (GT) image. The optimal value is zero.
PSNR, measured in decibels, stands for peak SNR and is one of the most popular quality indexes in the general image processing domain. Higher PSNR values indicate better quality.
It is worth pointing out that the reduced-resolution assessment relies upon the invariance among scales hypothesis. This assumption could not be valid. Furthermore, the accuracy can depend on how to degrade the original PAN and HS products [56]. As a consequence, to provide a complete assessment of pansharpening algorithms, the validation at full resolution is also adopted [61], [62], [63], [64], [65], [66]. In this article, we followed the indications in [5] using the same quality indexes at full resolution. More specifically, the
For additional details about the PSNR, interested readers can refer to [69], while, for all other reduced- and full-resolution indexes, implementation details are given in [5] and in the related freely available toolbox.1
C. Benchmarking
Table III summarizes the techniques used in our experimental analysis. The benchmarking approaches taken from the WHISPERS challenge are described in [5]. More specifically, five methods, exploited as baseline solutions for the challenge, are borrowed from the pansharpening literature. The first two [2], [73] belong to the CS class, i.e., the Gram–Schmidt (GS) [10] approach and its adaptive version, GSA [9]. The other three baseline methods are representative of the MRA. More in detail, the third method is the classical additive wavelet luminance proportional (AWLP) [70]; the fourth technique is the MTF-generalized Laplacian pyramid (MTF-GLP) [12] with histogram matching [71]; and finally, the last baseline method is the morphological filters (MFs) [72]. The other four techniques (labeled Teams 1–4) are innovative variational optimization-based and machine learning-based solutions proposed by the participants to the HS pansharpening challenge [5]. Finally, four extra-challenge deep learning solutions [30], [32] have been reimplemented and trained on PRISMA data for further comparison. In particular, for these methods, lacking extra datasets for training and giving the heterogeneity (different numbers of bands) of the test datasets, we used in training a portion of the same test datasets, applying the canonical resolution downgrade protocol needed for supervised models.
Experimental Validation
In this section, we show and discuss several experimental results aimed to support our design choices. In particular, we will analyze the relationship between the proposed loss and the accuracy indicators (see Section IV-A). We will show the impact of the model propagation mechanism (see Section IV-B) and the tuning (see Section IV-C). Then, we will provide details about the setting of the spatial loss term (see Section IV-D). Finally, we carry out an ablation study on the network architecture (see Section IV-E), before concluding with some details about the pretraining phase (see Section IV-F).
A. Loss and Accuracy
The first experiment deals with the choice of the loss. Since we propose an unsupervised one, it is not obvious or, at least, not always possible that the smaller the loss, the better the accuracy, according to the available quality indicators. Therefore, it is a fundamental question to understand to what extent there is agreement between the loss and the quality indicators. Toward this goal, we have selected a full-resolution (FR2) and a reduced-resolution (RR1) dataset, restricting the pansharpening to a single HS band. For both datasets, we consider two opposite conditions: a highly and a weakly correlated (to the PAN) HS band. Since we target a single HS band, the proposed solution reduces to the traditional pansharpening without the need of model propagation. Moreover, to avoid eventual biases, we run the fine-tuning target adaptation from scratch (random initial weights). In particular, here, we are interested to monitor the evolution of the loss in comparison with the evolution of the accuracy indicators during the training. In Fig. 4, all the involved training curves are gathered for the reduced-resolution case: spectral and spatial loss terms, ERGAS, and
Training curves for single-band pansharpening on RR1. (a) Spectral and (b) spatial loss terms. (c) ERGAS. (d)
Let us now move the focus on the most interesting full-resolution case with the help of Fig. 5 that shows the related training curves: spectral and spatial loss terms, and the spectral distortion index
Training curves for single-band pansharpening on FR2. (a) Spectral and (b) spatial loss terms. (c)
B. Spectral Correlation Analysis and Model Propagation
To gain insight the interband dependence, let us have a look at the covariance matrix normalized in the range [−1, 1] (CC) shown in Fig. 7 associated with a sample HS PRISMA image. The maximum values are on the diagonal and marked in black. For each given band (e.g., look at row #8), the best correlated bands looking backward and forward are marked in green and red, respectively. Focusing on the backward case, we can notice that, in the large majority of the cases, the most correlated band is just the previous one. When this is not the case, the correlation value for that band is, however, very close to the maximum. In the forward search, we have nearly the same situation with just one exception for band 18. We have carried out the same analysis on all the available datasets and the conclusions were always the same. Based on the above considerations, it makes perfect sense to propagate the model from one band to the next one (or previous one, if we proceed in the opposite direction).
Covariance matrix for a sample HS PRISMA datacube. For each band (fix a row index), the most correlated previous and next bands are marked in green and red, respectively.
To further clarify the propagation tuning process, in Fig. 8, it is shown that the progress of the loss adaptation, limited to a subset (1–18) of bands for ease of view. The loss (Section II-D) comprises two terms responsible for the spectral (top) and spatial (bottom) consistencies. Checkpoints highlight the final loss for each band. A careful inspection of the curves reveals that in many cases, the same model for a given band already fits well (for
Concatenated spectral (top) and spatial (bottom) loss progress during model propagation (close-up on the first 18 bands). Each vertical stripe corresponds to a different band whose final loss is marked with a dot.
It is also worth observing how spectral and spatial losses contrast each other when both reach too small values (how much small depends on the band and, in particular, on its correlation with the PAN). In fact, in a few cases, for example, for bands 9, 12, and 18, only one of the two losses decreases. The spatial loss descends at a price of a little increase of the spectral loss for bands 9 and 12 (notice that the spectral loss is one order of magnitude smaller than the spatial loss). For band 18, instead, likely because of a large mismatch with the preceding band, the spectral loss dominates the tuning process.
Finally, as a general remark, we observe that the balance between
Let us now focus on a set of validation experiments dealing with the effectiveness of the model propagation scheme. Here, all spectral bands are concerned according to the processing chain summarized in Fig. 2. First, we compare the proposed solution based on model propagation, where the
Band-wise spectral (top) and spatial (bottom) loss components after tuning with (blue) or without (red) model propagation.
To further validate the proposed solution, we have also compared the proposed forward model propagation with the backward option on the validation dataset FR0. The resulting accuracy indicators are reported in Table IV. The numbers show that the forward propagation option provides only slightly better scores. Experiments carried out on other datasets, however, confirm a substantial equivalence between the two solutions, and hence, we eventually opted for the causal ordering (forward) without loss of generality.
C. Tuning Strength
According to the proposed empirical rule to fix the number
Results obtained on the FR0 image. The final value of spectral (top) and spatial (bottom) losses against band wavelength for the different configurations of
From Fig. 10, it can be observed that, with respect to the limit case (dashed) where the maximum number of iterations is run for all bands, both the spectral (outside the visible spectral range only) and the spatial loss (in the visible range only) register a progressive deterioration as
D. Spatial Loss Configuration
The proposed loss (2) comprises two contributes, the spectral (3) and the spatial (6) consistency terms. While the former leverages on a standard regression error function, such as the
Let us now focus on
Impact of the scale parameter
E. Network Configuration
To validate the network architecture, we carried out experiments on the validation dataset FR0 aimed to assess the impact of network depth, width, and the use of a residual skip connection. The compared solutions are summarized in Table VI, together with the corresponding scores and execution time. At a first glance, it can be observed that the proposed configuration (top row) achieves the best score on
F. Pretraining Details
At test time, the proposed network starts the tuning on the first band using pretrained initial weights. These have been determined by pretraining using the first band of the FR0 dataset. The band and the corresponding PAN have been tiled in 100 patches of (PAN) size
Comparative Results and Discussion
The experimental analysis of the proposed solution ends with the presentation of the numerical and visual comparative results obtained on the test datasets (summarized in Table II, shown in Fig. 3) taken from the HS pansharpening challenge [5]. All numerical results, obtained on both reduced- and full-resolution datasets, are gathered in Table VII. On the one hand, the pansharpening results obtained on reduced-resolution datasets are quantitatively compared in terms of
Moving to the full-resolution datasets, the analysis of the results becomes less linear. On the spectral side (
A careful inspection of these numerical results reveals a surprising behavior in reduced-resolution datasets by the proposed method, which outperforms consistently all the compared solutions, on both datasets and with respect to all indicators, with the exception dataset RR1, where on ERGAS and PSNR (recall these two metrics are highly correlated), HSpeNet2 performs slightly better. Actually, what is worth remarking is that while RR indexes are based on available GTs, the proposed solution is fully unsupervised (both in training and tuning) and, nonetheless, it does not seem to show any performance gap for this.
To conclude this experimental survey, we present some sample pansharpening results for both reduced- and full-resolution datasets in Figs. 12–15 and 16 and 17, respectively. For each dataset, only a representative zoomed detail (crops within yellow boxes in Fig. 3) is shown and, for a more comprehensive analysis, both RGB (Figs. 12, 14, and 16) and false-color (Figs. 13, 15, and 17) bands subsets are displayed. In fact, while RGB bands are all well correlated with the PAN, other bands outside the visible spectrum are less correlated, hence more critical from the fusion perspective and worth inspecting. For the reduced-resolution case, in addition to the reference GT and the expanded version (EXP) of the input HS bands, useful for a direct spectral comparison of the pansharpening results, the error images are also shown. These latter clearly show that the errors are much more severe (for both the datasets) outside the visible spectrum (see false color), with the occurrence for all the methods of both spectral and spatial distortion phenomena. However, among all the compared methods, the proposed seems to mitigate better than others both spectral and spatial distortions. Some methods clearly fail, e.g., Team 1, and they are reported only for the sake of completeness.
Pansharpening results on RR1 (zoomed detail, see Fig. 3). (a) Target GT and all compared solutions on three bands sampled in the visible spectrum [wavelengths (nm): 660 (red channel), 588 (green), and 442 (blue)]. (b) Corresponding error maps.
Pansharpening results on RR1 (zoomed detail, see Fig. 3). (a) Target GT and all compared solutions on three bands sampled outside the visible spectrum [wavelengths (nm): 2053 (red channel), 1229 (green), and 770 (blue)]. (b) Corresponding error maps.
Pansharpening results on RR2 (zoomed detail, see Fig. 3). (a) Target GT and all compared solutions on three bands sampled in the visible spectrum [wavelengths (nm): 632 (red channel), 500 (green), and 434 (blue)]. (b) Corresponding error maps.
Pansharpening results on RR2 (zoomed detail, see Fig. 3). (a) Target GT and all compared solutions on three bands sampled outside the visible spectrum [wavelengths (nm): 1726 (red channel), 1251 (green), and 750 (blue)]. (b) Corresponding error maps.
Pansharpening results on (a) FR1 and (b) FR2 on three bands sampled in the visible spectrum; PAN image followed by EXP and all compared methods. Details on crop selection and sampled wavelengths for display are in Fig. 3.
Pansharpening results on (a) FR1 and (b) FR2 on three bands sampled outside the visible spectrum; PAN image followed by EXP and all compared methods. Details on crop selection and sampled wavelengths for display are in Fig. 3.
Moving to full-resolution results, the evaluation becomes even more difficult and subjective, lacking reference GTs. Fig. 16 shows the pansharpening results obtained on both FR1 and FR2, limited to some selected bands of the visible spectrum, roughly corresponding to the red, green, and blue channels. Fig. 17 gathers, instead, the same results limited to other bands outside the visible range in false colors. In both cases, together with the PAN, that is, the spatial reference, it also shows the upscaled HS (EXP) that can serve as a spectral reference for quality assessment. Likewise the reduced-resolution case, we displayed zoomed details of the full pansharpening obtained on the images shown in Fig. 3. Focusing on the FR1 detail, we can observe relatively good results provided by the proposed, GSA, MTF-GLP, MF, Teams 3 and 4, HSpeNet1, and HSpeNet2, especially in the RGB space. False-color results [Fig. 17 (top)], instead, highlight some problems occurring for spectral bands outside the visible range. For example, GSA seems to be unable to sharpen the interested bands. The same is for Teams 3 and 4 and for the HSpeNet variants. The differences among the methods are more evident for the clip of FR2 due to the presence of water. In this case, the spectral distortions are quite severe in several cases, e.g., GS, HyperPNN1, HyperPNN2, and HSpeNet1, but, more interestingly, there can be noticed some PAN patterns (on the water basin), not present in false-color bands (see EXP on the bottom line), which are added to the pansharpened images. Of course, aware of the subjectiveness of these last considerations, we leave the final say to readers who can add their own perspective to our observations and numbers. In this regard, for the sake of fairness, we have to remind that the teams of the challenge had a limited time to validate their design choices, which somehow explains some unsatisfactory results.
Conclusion
In this work, we have presented a novel deep-learning-based method for HS pansharpening. The proposed approach requires a baseline CNN model for single-band pansharpening trainable/tunable in an unsupervised manner, without resolution downgrade. To this aim, we resorted to a recently proposed four-/eight-band pansharpening model [47], suitably adapted to the single-band case. The baseline model is sequentially used for a band-wise pansharpening where the current application leverages on the model parameters adjusted on the previous band, running a few tuning iterations to let them fit the current (target) band. The tuning is feasible due to the use of an unsupervised loss and the number of iterations is related to the “spectral” distance between the target band and its preceding one from which the model is inherited.
The advantages of the proposed method are: 1) the method is fully unsupervised and does not require training data other than the same target image; 2) it can be applied to any PAN–HS dataset, with no need to have a prefixed number of HS bands; 3) the learning process, which is interleaved with the band-wise inference steps, does not require resolution downgrade, a common but limiting option in pansharpening; 4) the method ensures good generalization properties due to the target-adaptive tuning; and 5) considering that tuning iterations are involved, the computational complexity is relatively limited especially if the baseline CNN is a lightweight network as actually is.
Despite the very good results achieved by the proposed approach, there is still room left for improvement. More specifically, special attention should be put on the spatial consistency loss term used for the spectral bands that have no overlap with the PAN bandwidth. Actually, this falls in the more general problem of spatial quality assessment of pansharpened images, which is well known to be far to be solved [2], [67], [66], becoming even more challenging in the HS case. Another point that is worth investigating is the model propagation rule and the iteration amount per band having an impact on the computational load.
In order to ensure full reproducibility of our research outcomes, the code is made available at https://github.com/giu-guarino/R-PNN.
NOTE
Open Access provided by 'Università degli Studi di Napoli "Parthenope"' within the CRUI CARE Agreement