Introduction
Optical remote sensing images are advantageous due to their wide swath and high resolution, factors that are crucial for Earth observation. They have been widely used in natural resource management [1], emergency rescue [2], atmospheric monitoring [3], surveying, and mapping [4]. However, optical satellites are a tool for passive remote sensing with an imaging band of 0.38–0.76 μm and, as a result, are unable to penetrate thick clouds. It is estimated that an average of 35% of global land is covered by clouds and their shadows daily, resulting in large numbers of optical images with missing information [5]. A sensor malfunction can also cause missing information; for example, the Landsat-7 (30 m) has a 22% scan gap per scene image, severely affecting image utilization [6]. On the whole, missing information reconstruction of such images significantly improves their utilization.
Currently, there are two types of methods for reconstructing missing information: 1) inference-based and 2) temporal-based methods. Inference-based methods infer the reconstructed region based on spatial or spectral domains. However, these methods are only suitable for small ratios of missing information as they are inaccurate [7]. Temporal-based methods use temporal images to precisely adjust texture information [8], eliminating the radiation difference between reconstructed and nonreconstructed regions (see Fig. 1). In this study, reconstructed and nonreconstructed regions coming from temporal and original images (remote sensing images with missing information) were regarded as the foreground and background, respectively (see Fig. 1). The differences in data sources and imaging conditions usually induce significant radiation differences between temporal and reconstructed images. This article summarizes this radiation difference between the foreground and background into two parts: 1) the difference of ground content radiation (GCR), which represents the reflection information including the type and spatial distribution of the ground objects. Radiation differences exist in images of ground objects at different periods because of the influence of phenological conditions. For example, vegetation is brown in winter and green in summer in high-latitude regions. 2) The difference in imaging environment radiation (IER), which includes the image radiation deviation caused by the sensor, solar altitude angle, and atmospheric conditions [1]. Unlike the radiation difference of GCR, IER often occurs as a whole. Thus, the critical gap in the reconstruction of the temporal-based method is to use the radiation information of the background image to correct the radiation of the foreground image accurately. The most significant contribution of this study is solving this problem effectively.
Researchers have pointed out that the large ratios of missing information are an important index affecting the reconstruction performance of the model. Specifically, the larger the missing ratios are, the more difficult it is to reconstruct, which is an urgent problem for the current missing reconstruction model. Based on the above theory and the problem of large ratios, this study proposes a novel decoupling-reconstruction network (DecRecNet) for image reconstruction, which uses a GCR correction module, an IERcorrection module, and their corresponding loss functions to decouple the image radiation information into GCR related to ground objects and IER related to the imaging conditions and to achieve the goal of large ratios of missing information. This is one of the few networks that reconstruct missing information through the imaging theory of remote sensing. The main contributions of this study are as follows.
A novel network DecRecNet is proposed for reconstructing missing information considering ground objects and imaging conditions. The network used a GCR correction module and an IER correction module and their corresponding loss functions to decouple the image radiation information into GCR related to the ground objects and IER related to the imaging conditions and to achieve accurate pixel-level radiometric adjustment.
A GCR consistency loss function is used to eliminate the object radiation difference in pixel-level radiation correction between foreground and background images. In addition, an IER consistency loss function and an IER smoothness loss based on the radiation continuity assumption are used to harmonize the imaging environment difference.
Compared with the classical U-Net [9], DeepLab V3 + [10], RFR-Net [11], and spatial-temporal-spectral deep convolutional neural network (STS-CNN) [12] methods, our model showed remarkable advantages in cloud occlusion and stripes of Landsat-8 (30 m), GaoFen-1 (2 m), and Landsat-7 (30 m) at various missing ratios, data sources, resolutions, and scenes, and achieving the goal of missing information reconstruction in large ratios of missing information.
Related Work
A. Reconstruction of Inference-Based Methods
The inference-based methods mainly infer reconstructed regions based on spatial and spectral domains and further reconstruct them according to the image adjacency or complementarity [13]. These methods can be divided into three types: 1) spatial-based methods; 2) spectral-based methods; and 3) hybrid-based methods. Spatial-based methods assume that the reconstructed and nonreconstructed regions have the same or related statistical features or texture information. Commonly used methods include interpolation [14], propagation diffusion [15], variation-based [16], and exemplar-based methods [15]. For example, Cheng et al. [17] proposed the MRT model by mining similar elements from the nonreconstructed region. Spatial-based methods have low reconstruction accuracy because they only utilize inferential interpolation by mining similar image elements or textures from a single image and are only suitable in cases of small missing ratios [18]. Spectral-based methods establish a relationship between nonreconstructed and reconstructed bands through completely nonreconstructed bands [19]. For example, the reconstruction region in band 6 in MODIS is reconstructed by band 7 because there is a strong correlation between them [20]. However, the method is suitable for thin cloud information restoration because the optical remote sensing imaging range is 0.38–0.76 μm. All optical remote sensing bands are affected by thick clouds but the longer bands are more resistant to thin clouds. Hybrid-based methods generally construct a unified framework by deep learning to realize the hybrid of spatial- and spectral-based methods. Wang et al. [21] proposed a hybrid spatial and spectral FSSRF model for thick clouds in hyperspectral images with small missing ratios. Gao et al. [22] proposed a DCR unified framework for reconstructing small missing ratios. However, all these methods are unsatisfactory for reconstructing large missing ratios that result in blurred images with texture discontinuities because of limited reference information [23].
B. Reconstruction of Temporal-Based Methods
Historical data can be obtained by repeatedly scanning satellites over a short time period and providing a temporal image for the reconstructed region. This method can meet the basic requirements for the accurate reconstruction of missing information in the case of clouds, cloud shadows, and stripes. Therefore, temporal-based methods are more reliable compared to inference-based methods.
Traditional temporal-based methods adjust the grayscale of the temporal image to coordinate it with the nonreconstructed region and utilize the continuity and smoothness of temporal images to fill missing regions. These methods can be divided into two types: 1) temporal replacement methods, which replace the reconstructed region with a temporal image and adjust the grayscale. The commonly used classical methods are the NSPI method [24] and the WLR method [25]. In addition, temporal replacement methods are commonly used to solve the random gap-filling problem, as in MODIS EVI [26]. However, it poorly adjusts for local radiation differences caused by phenological conditions that, for example, cause a wide variation in the radiation of vegetation. 2) Temporal filter methods, which regard the remote sensing image as a single point in a time series and reconstruct the missing region by eliminating the noise in a one-dimensional signal, such as the best index slope extraction method [27], the double logistic technique method [28], and the harmonic analysis of time series method [29]. These methods are insufficient when the pixel value of the reconstructed region is significantly larger than that of the nonconstructed region. Moreover, these methods provide poor results for high-resolution images with large missing ratios.
In recent years, deep learning has made significant advances in the reconstruction of missing information, providing a new research direction. Deep learning in remote sensing image missing information reconstruction can be divided into three types: 1) CNN-based methods, which force reconstructed regions to be as close as possible in the hidden space to the nonreconstructed region using encoders and decoders. Chen et al. [30] developed the ST-Net model to reconstruct the missing information of Landsat-8 (30 m) cloud occlusion using temporal and spatial networks. Ji et al. [31] developed cascaded convolutional neural networks using cascading downsampling and upsampling processes with multiscale information and completed the reconstruction of images of mountains, forests, and waters taken with Landsat-8 (30 m) with ≤22% of missing information. Zhang et al. [12] constructed a joint STS-CNN model using 3 × 3, 5 × 5, and 7 × 7 multiscale feature extraction blocks and completed the reconstruction in Aqua MODIS band 6 (500 m) and ETM+ SLC-off (30 m) with ≤40% missing information. The reconstructed scenes contained slightly complex textures such as mountains and buildings. The idea of multiscale information extraction in the aforementioned method draws on the spatial contextual attention mechanism as well as the spatial and channel attention mechanisms in image inpainting. To address the issue of high-resolution image reconstruction, Zhang et al. [32] constructed a DP-LRTSVD model using time-series images to reconstruct complex scenes at Sentinel-2 (10 m) and GaoFen-1 (16 m) with ≤50% of the missing ratios. The quantitative indices of GaoFen-1 (16 m) in this study are slightly lower than that of Sentinel-2 (10 m), which reveals the difficulty of high-resolution remote sensing image reconstruction. Transformer networks, such as the CLOUDTRAN model proposed previously [33], have also been applied to missing information reconstruction. However, this type of method involves a large computational cost because of the universality of feature mapping. On the whole, CNN-based methods form the mainstream method for missing information reconstruction. In addition to using time-series images, CNN-based methods are poor on high-resolution images with >40% missing information and taken from complex scenes. 2) GAN-based methods, which use a two-player game between the generator-generated reconstructed image and the original image, to obtain a generated image that is as close as possible to the original image. For homogenous data, Chen et al. [34] proposed the CTS-CNN model based on GAN to reconstruct images in ZY-3 with small ratios through the content generation, texture generation, and spectrum generation networks. Sun et al. [35] proposed a cloud-aware generative network (CGAN) to restore the missing information from Google Earth satellite images in relatively complex scenes. Meraner et al. [36] constructed a deep residual neural network based on GAN to reconstruct weakly textured scenes, such as mountains, water, and forests in Sentinel-2 (10 m). Xu et al. [37] used an attention mechanism-based generative adversarial model to fully capture multiscale information to restore the missing information in Lansat-8 (30 m). For heterogeneous data, Grohnfeldt et al. [38] achieved domain transfer from SAR to optical by constructing SAR-Opt-cGAN in multispectral. Sebastianelli et al. [39] completed the removal of thick clouds using the texture information of SAR. On the whole, GAN-based methods are mostly used for images of low and medium resolution with small ratios of missing information in addition to having high uncertainty and leading to image noise. 3) Recurrent neural network (RNN)-based methods use an RNN to memorize short time series and combine the front layer features with current features to realize the reasonable supplement of missing information. The commonly used classical methods are PixelRNN [40] and the RGAN model [41]. These methods are used less because they have high computational costs and are not capable of using global information.
Methods
A. Overview
Given the symbols of each data in the article, the image to be reconstructed is
B. Modeling of Radiation Decoupling
A difference between the measured values and the true reflection values of ground objects generally exists during remote sensing imaging due to the influence of imaging conditions, such as sensors, solar altitude angle, illumination, and atmospheric conditions. In radiation correction, a linear model is generally used to correct image radiation distortion [42] as follows:
\begin{equation*}
Y\ = \ a \cdot X + b \tag{1}
\end{equation*}
Combining formula (1), we propose DecRecNet [see (2)]. In this model, the GCR correction module combined with the GCR consistency loss function and the IER correction module combined with the IER consistency loss function together decouple the remote sensing into
\begin{equation*}
{I}_{\text{out}} = {A}_{\text{out}}\ \cdot {C}_{\text{out}} + {B}_{\text{out}} \tag{2}
\end{equation*}
1) GCR Correction Module
GCR represents the radiation information of the ground objects in the remote sensing images, including the type and spatial distribution of the ground objects. Generally, the GCR is only related to the information of the ground objects, and the types of ground objects are relatively stable for a short period. We assumed that the GCR of ground objects in
\begin{equation*}
\mathscr{l}_{\text{content}} = {\mathbb{E}}_{\left({\nabla {C}_{\text{out}},\nabla {I}_{\text{ori}}} \right)}\ \left[ {\|\nabla {C}_{\text{out}} - \nabla {I}_{\text{ori}}\|}_1 \right] \tag{3}
\end{equation*}
The GCR correction module uses
2) IER Correction Module
IER represents the radiation information affected by sensor distortion, solar altitude angle, illumination, and atmospheric conditions. IER is only related to imaging conditions. To obtain reconstructed images with consistent imaging environment conditions, we expect the IER of
\begin{equation*}
\mathscr{l}_{AB,{I}_{\text{ori}}} = {\mathbb{E}}_{\left({AB,{I}_{\text{ori}}} \right)}\ \left[ {\|\left({{A}_{\text{out}}\cdot E + {B}_{\text{out}}} \right) - {I}_{\text{ori}}\|}_2 \right] \tag{4}
\end{equation*}
In remote sensing, IER often has the characteristics of radiation continuity; thus, IER has the characteristic of a smoothing function. Here, we used the image gradient to constrain the smoothness of
\begin{equation*}
\ \mathscr{l}_{AB} = \ \nabla {A}_{\text{out}} + \nabla {B}_{\text{out}} \tag{5}
\end{equation*}
The IER correction module also uses
C. Radiation-Guiding Module
The radiation-guiding module was designed for the precise radiation adjustment of the foreground and background images, including IEG and GSG to perform precise radiation correction on the foreground and background images of the composite image.
1) Imaging Environment Guiding
The IEG aims to guide the model to adjust the whole radiation of the foreground and background images of the composite image. For the composite image
The IEG takes
2) Ground Semantic Guiding
After the above IER transfer, the radiation difference between the foreground and background of the composite image was eliminated. To eliminate the radiation differences between the same ground objects, we designed GSG based on the assumption that the same ground objects have similar radiation in local remote sensing images. The assumption uses an implicit ground semantic constraint to force the same ground objects in the remote sensing image to show similar advanced features. The specific implementation process is: the
\begin{equation*}
\mathscr{l}_{f,{I}_{\text{oridown}}} = \ 1 - S\left[ {{\rm{\Omega }}\left(f \right),{I}_{\text{oridown}}} \right] \tag{6}
\end{equation*}
The structure of the semantic guide encoder is the same as that of the IEG, except that a 3 × 32 × 32 feature map is obtained by adding a convolution layer with a convolution kernel size of 4.
D. Loss Function
Image reconstruction aims to eliminate the radiation difference between the foreground and background in the composite image. In the actual training process, we obtained both composite images
\begin{align*}
\mathscr{l}_{\text{rec}} = & {\mathbb{E}}_{\left({{I}_{{\rm{out\ }}},{I}_{\text{ori}}} \right)}\ \left[\|{{I}_{{\rm{out\ }}} - {I}_{\text{ori}}\|_{1}} \right] \\
& + {\mathbb{E}}_{\left({\nabla {I}_{\text{out}},\nabla {I}_{\text{ori}}} \right)}\left[\| {\nabla {I}_{{\rm{out\ }}} - \nabla {I}_{\text{ori}}\|_{1}} \right]. \tag{7}
\end{align*}
Finally, the objective function used in this study is
\begin{equation*}
\mathscr{l}\ = {\lambda }_1\mathscr{l}_{\text{content}} + {\lambda }_2\mathscr{l}_{AB,{I}_{\text{ori}}} + {\lambda }_3\mathscr{l}_{AB} + {\lambda }_4\mathscr{l}_{f,{I}_{\text{oridown}}} + {\lambda }_5\mathscr{l}_{\text{rec}}. \tag{8}
\end{equation*}
Among them, λ1, λ2, λ3, λ4, and λ5 are weighting factors, which are used to balance the contribution of different losses.
Experimental Details
Landsat-8 (30 m), GaoFen-1 (2 m), and Landsat-7 (30 m) images were used to verify the DecRecNet through simulated and real experiments, including clouds, cloud shadows, and stripes missing information reconstruction. The simulated and real experiments were used to quantitatively evaluate the effectiveness and verify the performance of the model in practical applications, respectively.
A. Experimental Design
1) Datasets
Data sources: To make the algorithm compatible with most satellite images, we designed correlation experiments using the true color bands of Landsat-8 (30 m) with six pairs of images, and both GaoFen-1 (2 m) and Landsat-7 (30 m) with six pairs of images. As the parameters of Landsat-7 (30 m) and Landsat-8 (30 m) are similar, Landsat-8 (30 m) was set as the temporal image, which was used for the stripes experiment of Landsat-7 (30 m). Landsat and GaoFen data were obtained from USGS and the Land Observation Satellite Data Service platform, respectively. Our experimental data covered the WHU Cloud Dataset [31]. In addition, we added images of the Liaoning, Jiangsu, Henan, Hebei, and Shandong Provinces of China. The land use types covered by the image data are shown in Fig. 6(b).
The geometric shape of the cloud is ever-changing due to air convection. Therefore, the mask should satisfy the following: 1) similarity to the mask drawn in the real use case and 2) diversification. In the cloud occlusion simulation experiments, we obtained cloud mask data by Perlin noise [44]. In the cloud occlusion real experiment, the cloud masks are obtained by drawing. The missing strips are caused by sensor failure and have regularity in the image, so the mask is obtained by drawing strips in both simulated and real experiments (see Fig. 7).
2) Comparison Algorithm and Metrics
Based on the research experience of scholars, we selected U-Net [9], DeepLab V3 + [10], RFR-Net [45], and STS-CNN [46] as comparison algorithms, and all models were retrained on our dataset and analyzed for quantitative and visual comparisons [47]. Among them, U-Net and DeepLabV3+ are the classical networks in the field of semantic segmentation and the first classical networks to reconstruct images. To the best of our knowledge, RFR-Net is a classical network model for both information inference and image reconstruction, while STS-CNN comprises the best network to reconstruct missing information from remote sensing.
We quantitatively evaluated the results using the Frechet inception distance (FID) [48], peak signal-to-noise ratio (PSNR) [31], structural similarity index measure (SSIM) [49], and mean absolute error (MAE) [50]. The smaller the FID and MAE, the better the results; the opposite scenario was observed for PSNR and SSIM.
3) Parameters Setting
In the experiment, DecRecNet used Adam optimizer with
B. Simulation Experiment
We simulated the radiation differences between the temporal image and the reconstructed image by adjusting the hue, contrast, and band flipping scenes of the original image. We observed the performance of different algorithms more intuitively at the index level as the simulated data had real values. Zhang et al. [12], [46] found that when the missing information was ≥40%, the temporal information in the reconstructed region would be reduced and the restoration would be challenging. Therefore, image restoration experiments with 40% missing information were designed using Landsat-8 (30 m), GaoFen-1 (2 m), and Landsat-7 (30 m) data, respectively. Landsat-8 (30 m) and GaoFen-1 (2 m) were used to simulate missing image information due to clouds and cloud shadows, and Landsat-7 (30 m) was used to simulate the information loss caused by sensor malfunction.
1) Visual Comparison
Figs. 8–10 show the processing results of simulation data obtained using different algorithms. For Landsat-8 (30 m) and GaoFen-1 (2 m), the composite image shows considerable radiation differences at the edges. After processing U-Net and DeepLabV3+, apparent abrupt spectral changes and texture discontinuity were observed at the boundary in the reconstructed images. Moreover, the results of DeepLabV3+ were blurred, especially for the differences in spectral and texture features of red buildings, as shown in Figs. 8 and 9. This is mainly due to U-Net and DeepLabV3+ not considering the differences between the foreground and background spectra. The Landsat-8 (30 m) and GaoFen-1 (2 m) images also showed blurred details, even after processing with RFR-Net. This may be because the information reconstruction process of RFR-Net is constantly filling in the center from the edge of the reconstructed region. More severe image blurring occurs at a closer distance from the center of the missing information [45]. This phenomenon is particularly evident in GaoFen-1 (2 m) images, especially in the case of retaining the edges of buildings in Figs. 8 and 9. This is because GaoFen-1 (2 m) images have higher resolution and more information; hence, the information inference is more complicated. It is noted that the radiation continuity of the image processed by U-Net, DeepLabV3+, and RFR-Net is poor in the reconstruction results of the three data sources. STS-CNN is the best model for image reconstruction compared with U-Net, DeepLabV3+, and RFR-Net. However, we found that the band flipping scene was difficult to restore for GaoFen-1 (2 m) images. Severe blurring is observed in Figs. 8 and 9.
Our method revealed a good processing effect on both Landsat-8 (30 m) and GaoFen-1 (2 m) with a stable restoration effect, even in the case of the band flipping scene. DecRecNet considers not only the radiation difference caused by ground content and imaging conditions but also the preservation of the overall radiation characteristics by the radiation-guiding module. The results of U-Net, DeepLabV3+, and RFR-Net in Fig. 10 show more noticeable stripe traces for the reconstruction of the Landsat-7 (30 m) striped image. The image processed by STS-CNN had no stripe traces; however, the processed image spectral information was extremely distorted for contrast and band flipping scenes, respectively. Image texture and spectral information preservation were optimal with our method.
2) Quantitative Evaluation Results
Table I lists the quantitative evaluation results of the simulation experiment. By comparing different models, the four quantitative evaluation indices of U-Net and DeepLabV3+ were relatively low, and the reconstruction effect was poor. This is because the above methods do not consider the spectral differences between the foreground and background images. The reconstruction performance of RFR-Net improved to a certain extent. The reconstruction goal was only achieved in the case of partial radiation difference, which is consistent with the blurring phenomenon shown in Figs. 8–10. In contrast, the performance of STS-CNN was improved in all three scenes. Compared with U-Net FID, STS-CNN enhanced the images nearly five times over for Landsat-8 (30 m) and nearly three times over for GaoFen-1 (2 m). This might be because the GaoFen-1 (2 m) images had more detailed textures. However, compared to the PSNR, SSIM, and MAE indices, the FID index had a slightly higher chance of amplitude under the same conditions due to the different calculation principles used. In general, the quantitative evaluation indices of U-Net, DeepLabV3+, and RFR-Net reflect the shortcomings of these models in missing information reconstruction. The reconstruction effect of STS-CNN on Landsat-8 (30 m) was higher than on GaoFen-1 (2 m) and Landsat-7 (30 m), especially under the differences in band flipping scene. This is because the GaoFen-1 (2 m) and Landsat-7 (30 m) bands had several textural details and weak geographic information, respectively. Most indices of the proposed method were higher than those of the comparison model. A robust quantitative evaluation performance was obtained in each scene, which is consistent with the visual performance in Figs. 8–10.
The reconstruction performance of U-net, DeepLabV3+, and RFR-Net under hue and contrast scenes was higher than that of the model under band flipping scenes for different data sources. This is because the radiation change under band flipping was completely separated from the real radiation change of the original image, which put forward higher requirements for the generalization ability of the model. The reconstruction performance of STS-CNN in different radiation difference situations was inferior to our model, especially in the reconstruction of the Landsat-7(30 m) band flipping scene. This is because imaging conditions were corrected in the foreground and background images through the IER correction module. For different data sources, the performance of each quantitative evaluation index was the best on Landsat-8 (30 m) and the worst on Landsat-7 (30 m). Moreover, our model achieved outstanding results on different data sources, demonstrating the robustness of the DecRecNet model with various data sources.
C. Real Experiment
We verified the effect of our method in real applications using Landsat-8 (30 m), GaoFen-1 (2 m), and Landsat-7 (30 m) real clouds, cloud shadows, and stripes images (see Table II). Land types, such as construction, agriculture, and forest, were considered. Regarding topography, our data cover plains, hills, mountains, and other landforms.
Landsat-8 (30 m) showed significant phenology differences between the original image and the temporal image (see Fig. 11). GaoFen-1 (2 m) original and temporal images had a certain degree of radiation distortion (see Fig. 11). Thus, the composite image shows substantial radiation differences at the edges. There were also significant phenological differences between the Landsat-7 (30 m) original and temporal images. The results from U-Net and DeepLabV3+ processing were apparent spectral abrupt changes and radiation differences in the image boundary, especially the blurring problem. The phenomenon can be observed in the reconstruction effect of the green vegetation, as shown in the results of Landsat-8 (30 m) in Fig. 11. Problems such as blurring and spectral loss, especially in the missing central region, resulted in a large amount of missing texture information after processing with RFR-Net. Compared to those of other models, the results of STS-CNN improved significantly; however, severe spectral losses occurred in green vegetation and buildings, especially in the reconstruction results of the red buildings in Landsat-7 (30 m). This is because the model does not consider the imaging conditions and the characteristics of the ground objects. In contrast, it preserves texture and spectral information in green vegetation and buildings as our model considers differences in imaging conditions and ground semantic similarity of ground objects. For example, our method performs remarkably in protecting the edge information of buildings in the results of GaoFen-1 (2 m) (see Fig. 11). For Landsat-7 (30 m) missing information reconstruction with U-Net, DeepLabV3+, RFR-Net, and STS-CNN showed clear spectral distortion, blurring, and missing texture problems, while our model obtained a better reconstruction effect. The comparison results once again demonstrate the effectiveness of our proposed model.
Real experimental reconstruction results of Landsat-8 (30 m), GaoFen-1 (2 m), and Landsat-7 (30 m) at 40% missing ratios.
Discussion
A. Ablation Experiment of Radiation-Guiding Module
We performed four sets of comparative experiments with 40% clouds and cloud shadows to validate the contribution of the radiation-guiding module; these included base (without the addition of IEG and GSG), base + GSG (with the addition of only GSG), base + IEG (with the addition of only IEG), and DecRecNet (with the addition of IEG and GSG).
1) Visual Comparison
The IEG aims to adjust the overall radiation in the composite image. It can be observed in Fig. 12 that Base < Base + GSG < DecRecNet in terms of overall consistency and the preservation of ground object textural and spectral. After adding the IER, the reconstruction results are further improved with respect to spectral information preservation and edge protection of the red and blue buildings in the Landsat-8 (30 m) and GaoFen-1 (2 m) experiments. The experimental results of base + GSG and DecRecNet also verify the GSG module.
Ablation experiments with the radiation guidance module at the 40% missing ratios (GSG: ground semantic guiding; IEG: imaging environment guiding).
The role of the GSG is to reduce the spectral differences between the same objects of the foreground and background images in local regions. From Fig. 12, we determined that base < base + IEG < DecRecNet in terms of spectral retention of conspecific features. The spectral information of the same objects in Landsat-8 (30 m) and GaoFen-1 (2 m) experiments are closer, especially with the addition of the GSG module. The red building of Landsat-8 (30 m) had greater textural clarity with the full model processing of the red building. For the Gaofen-1 (2 m), the edges of the image features were also significantly enhanced after the full model processing, supporting the importance of GSG. However, the experimental results were improved by adding both an IEG and GSG. This might be because the remote sensing imaging process is affected by both imaging conditions and ground objects; applying imaging environment and GSG corrections, thus facilitating joint correction both globally and locally.
2) Quantitative Evaluation of the Ablation Experiments
Table III lists the quantitative evaluation results of the ablation experiments. The change in quantitative indices of DecRecNet and base + GSG was more significant than for base + IEG and base alone, showing that the model reconstruction performance was significantly improved after adding the IEG. However, the changes in the indices were smaller than that of the Landsat-8 (30 m) data because the GaoFen-1 (2 m) image contained more information, which is consistent with the findings of Zhang et al. [32]. The quantitative index changes also verify the critical role of the IEG in the model. The quantitative evaluation of GSG showed that the changes in DecRecNet and base + IEG were smaller than in base + IEG and base alone. As shown in Fig. 12, GSG preserved the same object texture and spectral information in the reconstruction regions of Landsat-8 (30 m) and GaoFen-1 (2 m). Similar to the IEG ablation experiment, the addition of a full radiation-guiding module significantly improved the quantitative evaluation. Furthermore, the model performance improved and was consistent with the reconstruction results in Fig. 12.
B. Analysis Experiments With Different Missing Ratios
1) Simulation Experiments With Different Missing Ratios
The missing ratio is an important factor affecting the performance of the model. The reconstruction, especially for the reconstruction of high-resolution images, becomes more complex when the missing ratio is >40% [12], [32]. Therefore, we carried out an actual temporal reconstruction experiment with missing ratios ranging from 20% to 70%. As illustrated in Figs. 13–15, significant spectral differences were found at the edges of the Landsat-8 (30 m), GaoFen-1 (2 m), and Landsat-7 (30 m) composite images. U-Net, DeepLabV3+, and RFR-Net displayed severe spectral distortion, texture discontinuity, and blurring in the reconstruction boundary, as shown in Figs. 13–15. They also failed to reconstruct the complex regions in the GaoFen-1 (2 m) images, indicating their poor robustness and difficulty in reconstructing complex high-resolution images, such as the buildings shown in Fig. 14. The resulting images of RFR-Net from Landsat-8 (30 m) and Landsat-7 (30 m) showed noticeable spectral distortion and blurring caused by the repeated inference from the missing boundary to the center. STS-CNN showed better reconstruction performance compared to U-Net, DeepLabV3+, and RFR-Net. It also presented significant texture detail loss in the Landsat-8 (30 m) and GaoFen-1 (2 m) images. For example, the roads and buildings in Figs. 13 and 14 lost substantial spectral and texture information. Our model, however, showed an exemplary processing effect on Landsat-8 (30 m) and GaoFen-1 (2 m) because it accurately corrected for imaging conditions by adding IER consistency loss function and IER smoothness loss. Furthermore, it exhibited robust performance when dealing with large missing ratios. For U-Net, DeepLabV3+, RFR-Net, and STS-CNN improvements in the spectral information, blurring of texture, and severe radiation distortion at the edge of the missing region gradually decreased with an increase in the missing information ratio (see Figs. 13–15). This phenomenon was more pronounced in the GaoFen-1 (2 m) experimental results, where model failure, spectral distortion, and texture loss were observed, which reveals the complexity of reconstructing high-resolution images, especially for missing regions. Despite some radiation problems in the temporal image of GaoFen-1 (2 m) reconstruction experiments, our method still achieved good reconstruction results due to the common protection of texture and other information by GCR and IER.
Simulation experimental construction results of Landsat-8 (30 m) under 20%, 40%, and 70% missing ratios with real temporal image.
Simulation experimental construction results of GaoFen-1 (2 m) under 20%, 40%, and 70% missing ratios with real temporal image.
Simulation experimental construction results of Landsat-7 (30 m) under 20%, 40%, and 70% missing ratios with real temporal image.
Almost all evaluation indices in Table IV are lower than those in Table I because of the period difference between the original image and the temporal image. The model performance shows U-Net < DeepLabV3+ < RFR-Net, which is consistent with the visual performance in Figs. 13–15. Of note, the low index of the model was caused by the poor maintenance of spectral and texture information in the reconstructed image. The performance of the STS-CNN model was improved for Landsat-8 (30 m), GaoFen-1 (2 m), and Landsat-7 (30 m); however, it remained lower than our model. In addition, the radiation decoupling of ground image content and imaging environment was more conducive to the protection of texture and spectrum. Comparing the changes in evaluation indices, our model improved the PSNR, SSIM, MAE, and FID by factors of approximately 10, 0.22, 2–4, and 2–3, respectively, compared with U-Net.
For each data source, the influence of period difference on the GaoFen-1 (2 m) result is more severe; thus, the evaluation index of each model was lower in the experimental results of this data because the GaoFen-1 (2 m) image contained more texture detail information than low-resolution images. Evidently, each model showed a declining performance with an increasing missing ratio. The quantitative indexes change by an apparent order of magnitude in the case of the invalidity of U-Net, DeepLabV3+, and RFR-Net. Thus, our model is still capable of fulfilling information reconstruction at the pixel level.
2) Real Experiments With Different Missing Information Ratios
We conducted real test experiments to further confirm the reconstruction effect of the model on real clouds and stripes. As shown in Figs. 16–18, our model showed an improved performance regarding textural detail and spectral preservation compared to that of U-Net, DeepLabV3+, RFR-Net, and STS-CNN. Furthermore, the results of DeepLabV3+ and RFR-Net presented varying degrees of blurring and textural loss from the reconstruction effects on green vegetation, water, and buildings, as shown in Figs. 16 and 18. The levels of spectral and textural preservation of green vegetation, building edges, and texture in the scenes of Figs. 16–18 revealed that the result of STS-CNN was higher than that of the other three models but lower than that of our model.
Real experimental reconstruction results of Landsat-8 (30 m) under 20%, 40%, and 70% missing ratios.
Real experimental reconstruction results of GaoFen-1 (2 m) under 20%, 40%, and 70% missing ratios.
Real experimental reconstruction results of Landsat-7 (30 m) under 20%, 40%, and 70% missing ratios.
For different data sources, the reconstruction effect of U- Net, DeepLabV3+, RFR-Net, and STS-CNN was good for small missing ratios, especially on Landsat-8 (30 m). This is because the model can obtain more temporal information and reduce the difficulty of information reconstruction for smaller missing ratios. Of note, the reconstruction results on GaoFen-1 (2 m) were poor because of its abundant spectral and textural information. These findings prove the suitability of our model for various scenes.
C. Analysis of Image Resolution
Differences were observed in reconstruction performance between Landsat-8 (30 m) and GaoFen-1 (2 m), as noted in Sections IV-B and V-B. The changing trend of evaluation indices of Landsat-8 (30 m) and GaoFen-1 (2 m) was compared and analyzed, as shown in Fig. 19. The reconstruction performance of each model generally shows the trend of U-Net < DeepLabV3+ < RFR-Net < STS-CNN < DecRecNet, with the exception of some floating points. In addition, the performance of the quantitative evaluation indices of each model declined with an increase in the missing ratios. U-net, DeepLabV3+, and RFR-Net appeared invalid with missing ratios of >40%. However, our model showed excellent performance while other models failed to reconstruct the missing regions. This is because imaging conditions in the foreground and background images and radiation of the same ground objects have been considered in the proposed model.
Variation trend of evaluation indexes under different clouds and cloud shadows coverage of Landsat-8 (30 m) and GaoFen-1 (2 m).
For different data sources, it was found that most of the overall distribution of FID on Landsat-8 (30 m) was <10, while most of GaoFen-1 (2 m) was >20. This phenomenon suggests that the higher the resolution, the more textural detail information of the image, resulting in a higher challenge for reconstruction. Compared with Lansat-8 (30 m), the distribution of PSNR, SSIM, and MAE indicators of GaoFen-1 (2 m) was more scattered and the performance was lower. Our model achieved the same visual and quantitative requirements with 70% missing information due to the precise correction of the ground objects and imaging environment.
D. Limitations of DecRecNet
Although our model produced excellent reconstructions of images obtained from different data sources, resolutions, scenes, and missing ratios when compared with the other four classical models, two limitations of the study should be noted: 1) the inability to distinguish between thin and thick clouds; there is no quantitative distinguishing index between them, and cloud shadows have the same occlusion effect as thick clouds, thus the model regards all these aspects as thick clouds and accordingly reconstructs the missing information. This operation results in the underutilization of the information occluded by thin clouds and cloud shadows. 2) This study focused on the reconstruction of missing information under cloud occlusion and not on cloud and cloud shadow detection. We intend to address these issues in future studies.
Conclusion
To restore missing information in optical remote sensing images, a novel DecRecNet was constructed based on the imaging theory of remote sensing images. The network used a GCR correction module, an IER correction module, and their corresponding loss functions to decouple the image radiation information into GCR related to the ground objects and IER related to the imaging conditions and to achieve accurate pixel-level radiometric adjustment. A radiation-guiding module is designed to guide the correction of ground objects and the imaging environment. The Landsat-8 (30 m), GaoFen-1 (2 m), and Landsat-7 (30 m) image data were used to verify the DecRecNet through simulated and real experiments, including missing information reconstruction from clouds, cloud shadows, and stripes. The constructed model exhibited excellent performance in the quantitative analysis and visual effects for remote sensing missing information with various missing ratios, data sources, resolutions, and scenes, and has achieved reconstruction under 70% missing ratios.
ACKNOWLEDGMENT
We would like to thank the Supercomputing Center of Wuhan University for providing computing power to conduct experiments and Prof. Zhigang Tu for his guidance and for answering questions.