1. Introduction
VSR describes the task of reconstructing the high-resolution (HR) video from its LR representation. Super-resolution is an ill-posed problem, as high-frequency information is inherently lost when downscaling an image or video, because of the lower Nyquist frequency in LR space. SISR methods usually restore this information by learning image priors through paired examples. For VSR, additional information is present in the temporal domain, which can help significantly improving restoration quality over SISR methods. SISR has been an active research for a long time [6], [18], [23], [25], [11], [34], [37], [32], while VSR has gained traction in recent years [30], [15], [33], [16], [7], [2], [40], [36], [31], [17], also due to the availability of more and faster computing resources. While there exists a lot of prior work on super-resolution factors x2, x3 and x4 with impressive results, attempts at higher factors are less common in the field [22]. Restoring such a large amount of pixels from severly limited information is a very challenging task. The aim of this challenge is therefore to find out, if super-resolution with such high downscaling ratios is still possible with acceptable performance. Two tracks are provided in this challenge. Track 1 is set up for fully supervised example-based VSR. The restoration quality is evaluated with the most prominent metrics in the field, Peak Signal-to-Noise Ratio (PSNR) and structural similarity index (SSIM). Because PSNR and SSIM are not always well correlated with human perception of quality, track 2 is aimed at judging the outputs according to how humans perceive quality. Track 2 is also example-based, however, the final scores are determined by a mean opinion score (MOS).