I. Introduction
With the development of convolution neural networks (CNNs), the task of space-time video super-resolution (STVSR) has achieved considerable performance. Different from the video super-resolution (VSR), the goal of STVSR is to reconstruct high-frame-rate (HFR) and high-resolution (HR) videos from corresponding low-frame-rate (LFR) and low-resolution (LR) counterparts. STVSR has the potential to be used in a wide range of practical applications and thus has gained increasing interest among academia and industry.