1 Introduction
As an alternative to traditional 3D scene representation using scene geometry (or depth) and texture (or reflectance), light field (LF) achieves photorealistic view synthesis in real-time using LF rendering technology [1], [2]. This high-quality rendering requires the disparities between adjacent views to be less than one pixel, i.e., the so-called densely-sampled LF. Unfortunately, the practical scenarios including dynamic scene [3] or limited acquisition time [4] impose insufficient sampling in the angular dimension. The quality of the rendered novel views is inevitably perturbed by the large disparity (range) in the sampled LF. In addition, the potential non-Lambertian effect in the scene, such as jewellery, fur, glass and face, will further aggravate this side effect [5], [6].