I. Introduction
There has been an increasing demand for accurately predicting the quality of videos, coinciding with the exponentially growing of video data. In the context of video big data, it becomes extremely difficult and costly to rely solely on human visual system to conduct timely quality assessment. As such, objective video quality assessment (VQA), the goal of which is to design computational models that automatically and accurately predict the perceived quality of videos, has become more prominent. According to the application scenarios regarding the availability of the pristine reference video, the assessment of video quality can be categorized into full-reference VQA (FR-VQA), reduced-reference VQA (RR-VQA) and no-reference VQA (NR-VQA). Despite remarkable progress, the NR-VQA of real-world videos, which has received great interest due to its high practical utility, is still very challenging especially when the videos are acquired, processed and compressed with diverse devices, environments and algorithms.