1. Introduction
Recent years have witnessed tremendous success of using deep neural networks (DNNs) to generate a high-resolution (HR) image from a low-dimensional input [5], [8], [12], [31], [42], [51]. On the one hand, DNNs-based single image super-resolution (SR) for bicubic degradation is continuously showing improvements in terms of PSNR and perceptual quality [5], [9], [12], [16], [25], [40], [41], [42]. In particular, several fundamental conclusions have been drawn: (i) DNNs based SR with pixel-wise loss (such as L1 loss and L2 loss) tends to produce oversmoothed output for a large scale factor due to the pixel-wise average problem [22]; (ii) The perceptual quality of super-resolved image could be improved by using VGG perceptual loss and generative adversarial (GAN) loss [11], [22], [45]; (iii) There is a trade-off between reconstruction accuracy and perceptual quality, which means no DNNs-based method can achieve its best PSNR and best perceptual quality at the same time [5]. While perceptual SR for bicubic degradation at a moderate scale factor (e.g., ×4) has achieved significant progress [22], [45], [53], [54], [57], the case with an extremely large scale factor has received little attention [7], [12]. On the other hand, realistic HR image synthesis from a latent low-dimensional vector based on GAN has shown great success for natural image [6] and face image [21]. However, how to effectively generate a perceptually pleasant HR image from a low-resolution (LR) image with a very large scale factor remains an open problem.