I. Introduction
Single image Super-Resolution (SR) aims to upsample a Low-Resolution (LR) image and reconstruct the missing high-frequency details. SR has been a widely studied problem for decades, due to its vast number of applications in fields such as medical imaging, remote sensing, and surveillance. In latter, SR are often used to improve the performance of down-stream vision tasks, such as object detection and tracking, by improving the visibility of the images which often suffer from low-resolution due to the wide field-of-view and large object to camera distance. Traditionally, most work has been focusing on improving the fidelity of the images by minimizing the Mean Squared Error (MSE). However, recently more focus has been put into generating realistic High-Resolution (HR) images as perceived by humans [20]. Current State-of-the-Art (SoTA) deep learning-based SR methods most often require paired LR/HR images to be trained by supervised learning. Commonly, researchers have been using artificial LR images created by down-sampling HR images, typically using bicubic interpolation. However, this strategy changes the natural image characteristics, such as sensor noise and other corruptions, which limits a SR model trained on such data to perform well on real LR images. Blind SR tries to address this problem by assuming an unknown down-sampling kernel, but it still relies on Ground-Truth (GT) reference images for supervised learning.