I. Introduction
Extensive use of high-resolution (HR) displays, high-definition televisions and hand-held portable devices in our day-to-day life is the primary reason for the recent explosive attention of single image super-resolution (SISR) in both research field and industrial applications [1], [2]. It is widely used in applications like video surveillance [3], depth-map estimation [4], face hallucination [5], object recognition [6] and image restoration [7]–[9], where high-frequency details need to be estimated. SISR is a classical problem of reconstructing the high-frequency information from the low-resolution (LR) images, that is naturally lost after down-sampling the HR image. Further, on account of the ill-posed nature of the super-resolution (SR) problem, the risk of minimizing the output prediction towards a single realization of the ground truth HR is quite limited. Although, it is desirable for restoring the frequencies quite close to the target, but the ill-posed nature puts a limitation on the SR network for generating high-frequency components, since its training is more sensitive to deficient details.