I. Introduction
Single image super-resolution (SISR) aims to obtain more natural and realistic textures from a given low-resolution (LR) image to its high-resolution (HR) image, which is very beneficial to high-level tasks, i.e., image classification [1] and object detection [2]. Due to ill-posed inverse characteristic, the SISR techniques have obtained enormous success via a degradation model with a priori knowledge, i.e., , where and represent an LR image and a scale factor, respectively [3]. Also, denotes a predicted high-definition image. According to that, the SISR methods can be summarized into three paradigms in general, i.e., interpolation methods, optimization methods, and discriminative learning methods. Interpolation methods mainly relied on bilinear [4] or bicubic interpolation operations [5] to obtain a mapping from an LR image to an HR image. Although these methods were simple and efficient, they have obtained poor performance in SISR. To address this issue, optimization methods can be used to guide an SR model via natural image characteristics in a priori knowledge manner [6]. For instance, using a sparse priori knowledge to obtain a linear combination can effectively predict HR images [7]. However, this optimization method may enjoy a flexible work mode at the cost of a time-consuming process. Also, these methods may refer to manual setting parameters to achieve competitive SR performance. As an alternative, since discriminative learning methods have efficiency and flexibility, they are developed [8]. Notably, due to flexible end-to-end architectures, convolutional neural networks (CNNs) have dramatic demands in SISR [9]. The mentioned research can be generalized on two aspects in general, containing SR methods-based high-frequency and low-frequency information. The SR methods-based high-frequency information requires size consistency of input and output in a CNN, which results in given LR images need be converted to high-frequency images through a bicubic operation as training images for constructing an SR model [10]. Inspired by that, a very deep SR network architecture was implemented using residual learning operations and stacking small filter sizes to obtain good visual effects [11]. Due to deep architectures, CNNs are faced with training difficulty. To overcome the mentioned problem, recursive learning and residual learning techniques are presented to accelerate training speed [12], [13]. For instance, a deeply recursive convolutional network (DRCN) integrated hierarchical information via residual learning techniques to facilitate accurate features for preventing exploding and vanishing gradients [12]. Besides, fusing global and local information through skip connections to guide a new network architecture can enhance the learning ability for SISR [14]. As an alternative, exploiting new components (i.e., recursive unit and gate unit) to obtain multilevel representation can improve the quality of a predicted image [15]. Although these approaches can outperform traditional methods in SISR, they may refer to high complexity [16]. To overcome the challenge, the SR methods-based low-frequency information is developed. That is, directly inputting LR image into a CNN and using an upsampling operation of deep layer to amplify obtained low-frequency features can train an SR model [17]. For example, designing a deformable and attentive mechanism to enhance a CNN extracted salient low-frequency texture information to enhance visual effects [17]. Although the mentioned methods have achieved remarkable SR results, they only roughly fuse hierarchical features via residual learning or concatenation operations to affect different layers. These results in obtained features of simplification cannot represent well high-quality images, which achieves poor robustness in SISR under complex scenes.