Loading [MathJax]/extensions/MathMenu.js
A Progressive Decoupled Network for Blind Image Super-Resolution | IEEE Journals & Magazine | IEEE Xplore

A Progressive Decoupled Network for Blind Image Super-Resolution


The framework of our proposed progressive decoupled network. The Fini indicates the shadow feature extraction. The DRM denotes the Decoupled Representation Module. The FD...

Abstract:

Blind super-resolution (Blind SR) has become a popular research topic in computer vision in super-resolution, which aims to enhance low-resolution (LR) images with unknow...Show More

Abstract:

Blind super-resolution (Blind SR) has become a popular research topic in computer vision in super-resolution, which aims to enhance low-resolution (LR) images with unknown or partially known degradation blur kernels. At present, most existing methods need to estimate the degradation kernel first and then use the estimated degradation kernel to get the final super-resolution result. However, degradation kernels are expected to burden the network. Moreover, the estimated degradation kernels are often inaccurate. Most existing methods fail to estimate blur kernels to produce satisfactory restoration results. In this paper, we propose a blind SR model based on deep learning, namely A Progressive Decoupled Network for Blind Super-Resolution (PDNet). The method differs from other methods since it decouples the degradation information from the super-resolution content and does not estimate the degradation kernel. The proposed method separates the degradation blur kernel while preserving super-resolution components progressively from degradation low-resolution images. To this end, a dual-path architecture is constructed, learning the combined features and the mutual relationships. Specifically, we design a decoupled representation module to exploit the coarse feature and a decoupled cross-attention module to separate the super-resolution and degradation content from the combined features. Our model can effectively recover super-resolution images with high fidelity and preserve edge details. The comprehensive experiments are conducted on various datasets and evaluate the performance of the proposed method against state-of-the-art approaches. The results demonstrate that our model outperforms existing methods in terms of both quantitative and qualitative evaluations.
The framework of our proposed progressive decoupled network. The Fini indicates the shadow feature extraction. The DRM denotes the Decoupled Representation Module. The FD...
Published in: IEEE Access ( Volume: 12)
Page(s): 53818 - 53827
Date of Publication: 10 April 2024
Electronic ISSN: 2169-3536
References is not available for this document.

SECTION I.

Introduction

The task of single image super-resolution (SISR) [1], [2], [3] is presently receiving much attention as a widely researched low-level vision task, which aims to generate a high-quality image from a low-quality counterpart. Since Convolutional Neural Networks (CNNs) [4], [5], [6] have been proposed, numerous deep learning approaches [7], [8], [9] with distinct network architectures and training strategies have been proposed to enhance the performance of SISR. However, most advanced SR approaches [10], [11] presume a pre-defined degradation kernel that is often complicated and inaccessible in practical scenarios. Consequently, output images may contain unwanted artifacts. This issue, referred to as “kernel mismatch,” hampers the applicability of deep learning-based SR methods in real-life situations. Blind super-resolution, which pertains to the issue of unknown blur kernels, remains a challenge for many deep learning-based techniques [12], [13].

Currently, the method based on deep learning [1] has achieved good results in the super-resolution of SISR. Some non-blind super-resolution (SR) techniques deal with the issue of multiple degradation issues using their known corresponding kernels. The SRMD method [10] is the first to concatenate a low-resolution (LR) image with a stretched blur kernel to generate a super-resolution image with varied degradations. Zhang et al. [14], [15] further enhance this approach by including deblurring algorithms, which extend the degradation to arbitrary blur kernels. Hussein et al. [16] propose a correction filter that transforms blurry LR images to match the bicubicly designed SR model. Additionally, zero-shot methods [11], [17] are exploited for non-blind SR to investigate the multiple degradation problem. However, these methods [8], [9], [18] all take fixed and predefined degradation settings. The known degradation kernels are different from the degradation factors in real scenarios. Therefore, when these methods are applied to real situations, the performance will be greatly reduced.

At present, blind SR [19], [20], [21], [22] is mainly divided into explicit modeling and implicit modeling. The implicit modeling method is quite different from the explicit modeling method. It does not rely on explicit parameters but uses additional data to implicitly learn the potential hyper-partition model through data distribution. Existing methods [23], [24], [25] often use GAN [26] framework to explore data distribution and representative methods include CinCGAN [27] and FSSR [28].

The basic idea of explicit modeling [26], [29], [30] is to train a model with extra data covering a wide range of degradation, which often requires the parameterization of the blur kernel and noise information. Depending on whether the estimated degradation kernel is included in the proposed framework, most available blind SR methods [19], [20], [21], [22] estimate the blur kernel, involving complex optimization procedures. Typically, these blind SR approaches use a two-stage framework involving kernel estimation and super-resolution image reconstruction utilizing the estimated kernels. [21] mainly introduces a blind super-resolution method based on unsupervised degenerate representation learning. The method considers that the degradation modes of different pictures are different, so it uses contrast learning to learn the degradation representation of different pictures. This paper [25] mainly introduces a real-world super-resolution method based on kernel estimation and noise injection. It proposes a novel degradation framework that estimates the noise distribution of various blur kernels with real noise. Some methods [22], [26], [29] utilize generative adversarial networks to generate high-quality super-resolution images. The [26] is an approach that leverages an internal generative adversarial network (GAN) on a single image to estimate the degradation kernel. This kernel is then applied to a non-blind SR algorithm [31] to obtain the super-resolution images. IKC [19] is presented, and it designs a module to take advantage of spatial features while correcting blur kernels in an iterative manner. In order to get better-estimated kernel and super-resolution results, DAN [20] has designed an end-to-end network architecture that can reconstruct super-resolution features and estimate kernels iteratively. Based on DAN [20], DANv2 [32] is proposed. DANv2 designs a parallel network to exchange information and extract features, which can better supervise the optimization of the network.

These methods rely on the self-similarity properties of natural images to predict the underlying degradation blur kernel. However, their results can be easily affected by the noise, resulting in an inaccurate estimated kernel. While some deep learning-based methods, such as CAB [33], SRMD [10], IKC [19], and DAN [20], have made progress in blind SR, they require combining the blur kernel as additional information and gain different results based on the predefined kernel. Although they perform well with input kernels close to the ground truth, they are not suitable for real-world applications as they cannot predict the accurate kernel for each image. Despite the domination of deep learning in SISR, blind SR remains a challenge, with limited progress made so far.

In the past, these blind SR techniques typically involve a two-step process, including modeling the kernel from low-resolution (LR) images and reconstructing high-resolution (HR) images using that kernel. While this approach has yielded satisfactory results, it presents two significant challenges. Firstly, accurately estimating blur kernels from LR images can be challenging due to ambiguity caused by the downsampling step. Mismatched kernels can lead to a significant decline in performance and unsightly artifacts. Secondly, fully utilizing the estimated HR kernel and LR image information can also be difficult.

To address the identified limitations, we develop a unified framework prioritizing effective feature representations in this paper. The proposed framework is a progressive decoupled network specifically designed to estimate super-resolution images while progressively eliminating the degradation blur kernel information, namely A Progressive Decoupled Network for Blind Super-Resolution (PDNet). PDNet takes a fresh idea to the problem of blind super-resolution by separating the mutual relationships between the super-resolution branch and the degradation kernel branch. The proposed network begins by extracting the super-resolution features and degradation kernel information using a feature extraction block, and then a decoupled representation module (DRM) is employed to encode the mutual relationships among the two combined features. This DRM is equipped with a dual-branch cross-attention mechanism that enables it to adaptively learn complementary and redundant components from each other. It is worth pointing out that the proposed method does not explicitly estimate degradation kernels. Instead, effective super-resolution features and invalid degradation information are gradually separated from degradation features. Although the proposed algorithm does not predict the degenerate kernel, it is undeniable that the degradation information [19], [20] has a positive effect on image reconstruction. So, a Feature Re-degradation Module (FDM) is constructed to combine the decoupled super-resolution features and degradation information to generate the input low-resolution images, which promotes the optimization of the network in reverse.

In summary, our major contributions are highlighted as follows:

  • We solve the blind super-resolution problem from a new perspective. The proposed scheme no longer predicts the degradation kernel but decouples the super-resolution information from the degradation information.

  • The network investigates the imaging model between the super-resolution content and unknown blur kernel layer in an image and proposes DRM to decouple the combined features as well as their mutual relations to improve the accurate representation.

  • Several DRMs are cascaded to form a novel progressive decoupled network that progressively separates the super-resolution feature and degradation content along the network, significantly alleviating the learning difficulty.

  • Comprehensive experiments conducted on the proposed method have verified that our proposed decoupled representation mechanism has a promising performance for details recovery of blind super-resolution

SECTION II.

Method

The section formally introduces the proposed method, which consists of some main components given a reformation of degradation: a shadow feature extraction module extracting coarse information from the LR images, and a parallel path network is followed to decouple the HR feature. Finally, a feature re-degradation module is utilized to optimize the network reversely. The flowchart is shown in Fig. 1.

FIGURE 1. - The framework of our proposed progressive decoupled network. The 
$f_{ini}$
 indicates the shadow feature extraction. The DRM denotes the Decoupled Representation Module. The FDM represents the Feature Re-degradation Module.
FIGURE 1.

The framework of our proposed progressive decoupled network. The $f_{ini}$ indicates the shadow feature extraction. The DRM denotes the Decoupled Representation Module. The FDM represents the Feature Re-degradation Module.

A. Problem Formulation

The relationship between the high-resolution image $I_{HR}$ and the low-resolution image $I_{LR}$ can be mathematically described by a degradation function. In the field of image processing, the blind SR issue can be defined as follows:\begin{equation*} I_{LR} = (k \otimes I_{HR}){\downarrow }_{s} + n \tag{1}\end{equation*} View SourceRight-click on figure for MathML and additional features.

The above function utilizes three primary components: the blur kernel k, the downsampling operation ${\downarrow }_{s}$ , and the extra noise n, with $\otimes $ representing the convolution operation. For blind SR, the isotropic Gaussian blur kernel is the most commonly used, while some works also incorporate anisotropic blur kernels. In real-world scenarios, LR images often come with additive noise. Existing methods typically only consider the kernel k and treat the blind SR problem as follows:\begin{align*} k_{e} &= F_{E}(I_{LR}) \tag{2}\\ {\mathcal {L}}_{k} &= argmin||k_{t} - k_{e}||_{1} \tag{3}\\ {\mathcal {L}}_{sr}& = argmin||I_{HR} - f(I_{LR}; k_{e})||_{1} \tag{4}\end{align*} View SourceRight-click on figure for MathML and additional features. where $F_{E}(\cdot)$ is the function that estimates the Gaussian blur kernel $k_{e}$ from $I_{LR}$ , $f(\cdot)$ is the function that restores $I_{HR}$ from $I_{LR}$ . ${\mathcal {L}}_{sr}$ and ${\mathcal {L}}_{k}$ are the optimization parameters for SR results and estimated degradation kernels, respectively. The current approach involves first estimating the unknown blur kernel from the LR image and then using the estimated kernels to restore the HR image. However, relying on a single blur kernel estimate for the entire image restoration process can result in several inherent issues. Firstly, the degradation process is unstable, involving different blur kernels and the random ordering of degradation. Additionally, generating the LR image is influenced not only by the blur kernels themselves but also by the HR image. Secondly, the input LR images are blurred by a combination of several Gaussian blur kernels and downsampling, making it challenging to estimate the Gaussian blur kernels accurately.

In this paper, we propose a different conception, which divides the degradation LR image into super-resolution features and irrelevant degradation information. The degradation scene is then typically modeled as follows:\begin{equation*} I_{LR} = I_{HR}{\downarrow } + I_{d} \tag{5}\end{equation*} View SourceRight-click on figure for MathML and additional features. where $I_{d}$ indicates the invalid degradation information formed by different blur kernels and noise. Consequently, the optimization process can be formulated as follows:\begin{align*} \mathcal {L}_{1} &=argmin||I_{HR} - f(I_{LR}))||_{1} \tag{6}\\ \mathcal {L}_{2} &=argmin||I_{LR} - FDM(f(I_{LR}; I_{d})||_{1} \tag{7}\end{align*} View SourceRight-click on figure for MathML and additional features. where $f(\cdot)$ is the convolution layer that reconstructs the HR image from the LR counterparts. $\mathcal {L}_{1}$ and $\mathcal {L}_{2}$ represent the loss of restoring super-resolution and the loss of re-degradation utilizing FDM, respectively.

The proposed idea aims to learn the combined features of super-resolution and degradation information and utilize their mutual relationships to distill the valuable features through a decoupled representation way. More specifically, we utilize several decoupled representation modules to construct the architecture.

B. Overall Framework

The architecture is illustrated in Fig. 1, and it consists of three main components: a shadow feature extraction module extracting coarse information for decoupled components, cascaded residual groups made up of decoupled representation modules, and a feature re-degradation module.

Given a degradation LR image $I_{LR}$ , the $I_{LR}$ first is fed into a component decomposition module, which consists of a shallow feature extraction part to extract the low-dimension features and a dual-path to divide the shallow feature into different branches. This process can be formulated as follows:\begin{equation*} F_{0,r}, F_{0,d} = F_{ini}(I_{LR}) \tag{8}\end{equation*} View SourceRight-click on figure for MathML and additional features. where $F_{ini}(\cdot)$ represents the divided component function to generate the dual-structure features $F_{0,r}$ and $F_{0,d}$ .

Next, we aim to optimize the relationship between the two components by implementing mutual decoupled representation modules. These modules facilitate various forms of structure information interaction and generate more comprehensive representations by relying on each other. The network obtains a progressive decoupled representation for both components through a series of cascaded decoupled representation modules and long-range connections that aid information propagation. Assuming there are N DRMs, for the n-th DRM, the dual-component learning process can be expressed as follows:\begin{align*} F_{n,r}, F_{n,d} = D_{n}(F_{n-1,r}, F_{n-1,d}) = D_{n}({\dots }(D_{1}(F_{0,r}, F_{0,d})))\\ \,\tag{9}\end{align*} View SourceRight-click on figure for MathML and additional features. where $F_{n,r}, F_{n,d}$ denote the outputs of the n-th DRM. $D_{n}$ indicates the function of the n-th DRM.

Furthermore, the advanced dual-branch network employs the feature re-degradation module to generate accurate predictions. Since the network only decouples the degradation information from the super-resolution content, we do not estimate the degradation kernel. Therefore, loss constraints on degradation kernels are not designed separately. However, to exploit the degradation information fully, an extra feature re-degradation module is designed. This module uses the decoupled super-resolution features and degradation information and fuses the two parts. The ultimate goal is to get an LR image of the network input. We use this loop way to implicitly stimulate the optimization process of the network.

This involves utilizing two reconstruction paths to project final features $F_{r}$ and $F_{d}$ from the feature space into the image space, resulting in the predicted super-resolution image $I_{SR}$ and the degradation image $I_{d}$ . Additionally, we simulate the optimization process of the network by reproducing the degradation image using an extra feature re-degradation module to fuse $I_{SR}$ and $I_{d}$ . The entire process is described below:\begin{equation*} I_{LR}^{*} = F_{re}(I_{SR}, I_{d}) \tag{10}\end{equation*} View SourceRight-click on figure for MathML and additional features. where $F_{re}$ indicates the feature re-degradation function. $I_{LR}^{*}$ denotes the reconstruction degradation LR images.

C. Decoupled Representation Module

Our solution, the Decoupled Representation Module (DRM), offers a unique approach to combinedly represent super-resolution information and degradation content through decoupled learning. Refer to Fig. 2 for a visual representation. The DRM consists of two parts: a feature extraction component and a decoupled cross-attention module (DCAM). The feature extraction component generates a coarse representation of the super-resolution and degradation information, which are fed into the DCAM. The DCAM encodes the self-relation and mutual relationships and obtains a refined, combined representation of the two contents.

FIGURE 2. - The details of the proposed Decoupled Representation Module.
FIGURE 2.

The details of the proposed Decoupled Representation Module.

The basic feature extraction part is constructed with several convolution layers to extract different features $g_{i, d}, g_{i, r}$ for degradation information and super-resolution features $F_{i-1,r}, F_{i-1,d}$ , respectively. The i indicates the i-th DRM.

The DCAM takes $g_{i, d}, g_{i, r}$ as inputs to learn the mutual relationships for the decoupled features. More specifically, DCAM first concats the $g_{i, d}, g_{i, r}$ . Then, several convolution layers are utilized to extract the coupled information. Finally, a channel attention network is utilized to learn a super-resolution confidence map and a bias confidence map by decoupling the combined information, respectively, and then learning the attention masks $M_{r}$ and $M_{d}$ . In this way, the DCAM can extract the mutual information for a refined combined representation of super-resolution features and degradation content. The process can be formulated as follows:\begin{equation*} F_{i,d}, F_{i,r} = F_{i-1,d} + M_{d} \otimes g_{i, d}, F_{i-1,r} + M_{r} \otimes g_{i, r} \tag{11}\end{equation*} View SourceRight-click on figure for MathML and additional features. where $M_{d} \otimes g_{i, d}$ denotes the degradation component decoupled from the coupled features, and $M_{r} \otimes g_{i, r}$ denotes the super-resolution information decoupled from the coupled features. These two components are used to further refine the representation of the super-resolution features and the degradation content in the current DRM.

SECTION III.

Experiments

A. Datasets and Implementation Details

Following the previous works [19], [20], both DIV2K [34] and Flickr2K [35] are utilized, which include a total of 3450 2k images. To improve IO speed, we crop the images into sub-images. During training, anisotropic and isotropic Gauss methods are employed to develop degradation kernels, resulting in corresponding low-quality LR images. For testing, the experiments utilize five widely recognized benchmarks: Set5 [36], Set14 [37], BSD100 [38], Urban100 [39], and Manga109 [40]. As part of this process, The RGB images are converted to YCbCr color space and only utilize the Y channel to calculate PSNR and SSIM metrics [41].

1) Isotropic Gaussian Kernels

To begin with, the experiments are performed on isotropic Gaussian kernels following the method [19]. Specifically, we maintain a fixed kernel size of $21 \times 21$ . During the training phase, we select kernel widths from the following ranges: [0.2, 2.0] for a scale factor of 2, [0.2, 3.0] for a scale factor of 3, and [0.2, 4.0] for a scale factor of 4. To generate an evaluation dataset for the different tested benchmarks, Gaussian8 chooses 8 kernels uniformly from the following ranges: [0.8, 1.6] for a scale factor of 2, [1.35, 2.40] for a scale factor of 3, and [1.80, 3.20] for a scale factor of 4.

2) Anisotropic Gaussian Kernels

As part of our experiments, we test anisotropic Gaussian kernels using the method described in [26]. For scale factors 2 and 4, we set the kernel size to $11 \times 11$ and $31 \times 31$ , respectively. During training, we randomly select the kernel width from the range of $[{0.6, 5}]$ and rotate it within an angle of $[-\pi, \pi]$ to generate anisotropic Gaussian kernels. Additionally, we apply uniform multiplicative noise and normalize it to sum to one. To reckon our model, the DIV2KRK [26] dataset is employed.

3) Implementation Details

In each experiment, the model maintains a batch size of 64 and utilizes LR patch sizes of $64 \times 64$ . The choice of optimizer is Adam [42], with $\beta _{1} = 0.9$ and $\beta _{2} = 0.99$ . All models are trained for $6 \times 10^{5}$ iterations, with an initial learning rate of $2 \times 10^{-4}$ that is halved every $2 \times 10^{5}$ iterations. Additionally, random horizontal flips and 90-degree rotations are applied to augment the training data.

B. Comparison With State-of-the-Arts

1) Evaluation With Isotropic Gaussian Kernels

As referenced in [19], we assess the effectiveness of our approach on different datasets generated by Gaussian8 kernels. To estimate its performance, the experiments compare our method against a number of state-of-the-art blind SR techniques, including ZSSR [31] (utilizing bicubic kernel), RCAN [43], DRSR [12], IKC [19], DANv1 [20], DANv2 [32], DSSR [22], MRDA [47], DASR [21], and AdaTarget [46]. Additionally, we follow the same evaluation protocol as [19] and compared the proposed method against CARN [44]. In most cases, we utilized the official implementations and pre-trained models of the respective methods.

Table 1 presents the visualized results, which demonstrate the outstanding performance of our method across all benchmarks. Notably, the performance of CARN decreases dramatically with Gaussian8 due to its deviation from the predefined bicubic kernel. While ZSSR shows superior performance compared to non-blind SR methods, its image-specific network design restricts its ability to leverage abundant training data. Though AdaTarget can enhance image quality, it is still lower than other algorithms. IKC and DAN employ a two-stage strategy that yields significant improvements. Because they directly input the predicted inaccurate degradation kernels into the network, they are less effective than our proposed algorithms.

TABLE 1 PSNR (dB) / SSIM of Different Models on Testing Datasets With the Isotropic Gaussian Kernels. Red and Blue, Respectively Indicate the Best and Second-Best Results
Table 1- PSNR (dB) / SSIM of Different Models on Testing Datasets With the Isotropic Gaussian Kernels. Red and Blue, Respectively Indicate the Best and Second-Best Results

The visualization results are shown in Fig. 3. RCAN and ZSSR cannot reconstruct the detailed texture in the image, and the deblurring performance is not good. Although the blind SR methods like IKC, DANv1, DANv2, and DSSR generate better results, our algorithm achieves better performance in terms of texture, edges, brightness, etc.

FIGURE 3. - The visual results on img_33 in Urban100 of different methods.
FIGURE 3.

The visual results on img_33 in Urban100 of different methods.

2) Evaluation With Anisotropic Gaussian Kernels

Compared to isotropic Gaussian kernels, the degradation with anisotropic Gaussian kernels is more challenging. We compare our method with such as ZSSR [31], IKC [19], DASR [21], DANv1 [20], DANv2 [32], AdaTarget [46] and KOALAnet [24]. We also compare the proposed method with some bicubicly proposed methods such as EDSR [7], RCAN [43], and DBPN [48].

Table 2 displays the quantitative results of DIV2KRK, revealing a significant enhancement in performance compared to other blind SR approaches with the proposed method. It’s worth noting that when combined with KernelGAN, ZSSR performs better, indicating the significant role of kernel estimation. SOTA blind SR methods like IKC, DAN, and DASR have achieved remarkable accuracy in PSNR and SSIM. AdaTarget can perform comparably with SOTA blind methods by applying an adaptive target to finetune the network. However, the proposed method outperforms all these methods. The visual results in DIV2KRK are provided in Fig. 5 and Fig. 6, demonstrating results obtained by the proposed method are much cleaner, lighter, and have more textures.

TABLE 2 Quantitative Comparison of Different Models on DIV2KRK With the Anisotropic Gaussian Kernels. Red and Blue, Respectively, Indicate the Best and Second-Best Results
Table 2- Quantitative Comparison of Different Models on DIV2KRK With the Anisotropic Gaussian Kernels. Red and Blue, Respectively, Indicate the Best and Second-Best Results
FIGURE 4. - The visual results on img_19 in Urban100 of different methods.
FIGURE 4.

The visual results on img_19 in Urban100 of different methods.

FIGURE 5. - The visual results on img_28 in DIV2KRK of different methods.
FIGURE 5.

The visual results on img_28 in DIV2KRK of different methods.

FIGURE 6. - The visual results on img_30 in DIV2KRK of different methods.
FIGURE 6.

The visual results on img_30 in DIV2KRK of different methods.

C. Ablation Experiments

This paper proposes a progressive architecture to decouple the super-resolution content from the degradation feature without estimating the unknown blur kernels. To verify the effectiveness of the network, an extra loss function is utilized, which is utilized to explicitly constrain the learning of the degradation kernel (DL). In addition, we also discuss the influence of the feature re-degradation module on the network effect.

1) Effectiveness of Different Parts

The performance of ablation experiments is demonstrated in Table 3. When DL is used, this result will decrease compared to our algorithm. This phenomenon also indicates that the use of explicit constraints on degradation kernels will increase the difficulty of network learning and then affect the effectiveness of super-resolution. At the same time, when FDM is replaced, the effect of the network will be greatly reduced. This phenomenon shows two points; first, in the process of blind super-resolution, the degradation content is not insignificant. It can affect the super-resolution performance. So, just decoupling the degradation content is not optimal. Second, the designed FDM can effectively excite the optimization of the proposed network by exploiting the inverse of the degradation feature and the super-resolution feature so that the network can have a better performance.

TABLE 3 Quantitative Comparison of Different Ablation Methods on Set5 With the Isotropic Gaussian Kernels. ✓ Indicates That the Part is Utilized. $\times$ Denotes That This Module is Replaced
Table 3- Quantitative Comparison of Different Ablation Methods on Set5 With the Isotropic Gaussian Kernels. ✓ Indicates That the Part is Utilized. 
$\times$
 Denotes That This Module is Replaced

2) Effectiveness of Different Numbers of DRM

On the other hand, some experiments are conducted to explore the influence of the number of modules on the algorithm. The algorithm we propose is the baseline, which uses several DRMS to form a block. Information is exchanged between different blocks. We label the number of DRM as N and the number of blocks as M. Different amounts of N and M are used to build different networks to conduct experiments. The model employed in the paper is the baseline, where N and M are set to 5 and 8, respectively, denoted $D_{N5M8}$ . In addition, other models are designed, which are $D_{N3M8}$ , $D_{N5M4}$ , and $D_{N10M8}$ .

Table 4 shows that the proposed method achieves the best super-resolution performance when N and M are set to 5 and 8, respectively. Besides, the evaluation performance declined with $D_{N3M8}$ and $D_{N5M4}$ . When changing the number N (referring to $D_{N10M8}$ ), compared with the baseline, the results achieved by the model are not ideal. This is because the deepening of the network makes it more challenging to optimize the network. Considering the trade-off between efficiency and super-resolution performance, we finally set N and M to 5 and 8, respectively.

TABLE 4 Quantitative Comparison of Different Number of Modules on Set5 With the Isotropic Gaussian Kernels
Table 4- Quantitative Comparison of Different Number of Modules on Set5 With the Isotropic Gaussian Kernels

3) Inference Efficiency

To demonstrate the efficiency of the proposed method, we compare the parameters, inference times, and PSNR performance with other blind methods on Set5 dataset with Gaussian8 kernels for $2\times $ . The TITAN RTX GPU is exploited as the platform. Table 5 shows that the proposed method has achieved the best result. Although the proposed method has more parameters, it is the fastest compared with other methods. Specifically, compared with DANv2, not only is the PSNR improved by 0.11dB, but the speed is also improved by 0.25s. The main reason for the high speed of our proposed network is the use of parallel computing. Two parallel branches can be calculated independently, which greatly improves the computing efficiency.

TABLE 5 Quantitative Results of Different Complexities and Inference Speed of Different Methods. The Best Results Have Been Marked in Red
Table 5- Quantitative Results of Different Complexities and Inference Speed of Different Methods. The Best Results Have Been Marked in Red

4) Real-World Degradations

In addition to conducting experiments on a synthetic data set that utilized both isotropic and anisotropic Gaussian kernels, we also compare the proposed method with other existing methods using real-world images. The visual results of this comparison can be seen in Fig. 7. The results of the other methods produced distorted images with noticeable artifacts and blurry edges. In contrast, the proposed method produced SR images with sharper edges, clearer content, and fewer artifacts. These results further demonstrate the promising performance of our proposed method.

FIGURE 7. - The visual results on real-world images of different methods. RD indicates real-world.
FIGURE 7.

The visual results on real-world images of different methods. RD indicates real-world.

SECTION IV.

Conclusion

In this paper, we design an architecture to tackle the blind super-resolution task from a new perspective. Instead of estimating the kernel, we propose a decoupled representation module to separate the super-resolution feature and degradation information. Based on the decoupled representation module, we propose a progressive decoupled network to restore the super-resolution image while removing the degradation information progressively. Comprehensive experiments on different benchmarks demonstrate promising performance compared with the state-of-the-art approaches on blind super-resolution. However, the network is burdened with too many parameters, hence it requires a lightweight structure. Additionally, it is expected that more efficient attention mechanisms will be designed to help in feature decoupling, leading to better performance. Furthermore, better strategies can be developed to take advantage of degradation features in the future.

Getting results...

References

References is not available for this document.