I. Introduction
Video Super-Resolution (VSR) is a challenging task which tries to learn the complementary information across video frames. Compared with Single Image Super-Resolution (SISR), VSR has to deal with a sequence, made up by temporally high-related but misaligned frames. In several previous works [1], [2], VSR was regarded as an extension of SISR where the time-series data were super-resolved by image super-resolution methods [3] frame by frame. Obviously, the performance is always not satisfactory as the temporal information fails to be well utilized.