1. Introduction
Synthesizing a novel view given a sparse set of images is a long-standing challenge in computer vision and graphics [10], [41], [42]. Recent advances in 3D neural rendering for view synthesis, in particular NeRF [31] and its successors [15], [16], [27], [33], [36], [58], have brought us tantalizingly close to the capability of creating photo-realistic images in complex environments. One reason for NeRF's success is its implicit 5D scene representation which maps a 3D scene point and 2D viewing direction to opacity and color. In principIe, such a representation could be perfectly suited to modeling view-dependent effects such as the non-Lambertian reflectance of specular and translucent surfaces. However, without regularization, this formulation permits degenerate solutions due to the inherent ambiguity between 3D surface and radiance, where an incorrect shape (opacity) can be coupled with a high-frequency radiance function to minimize the optimization objective [61]. In practice, NeRF avoids such degenerate solutions through its neural architecture design, where the viewing direction is introduced only in the last layers of the MLP, thereby limiting the expressivity of the radiance function, which effectively translates to a smooth BRDF prior [61]. Thus, NeRF manages to avoid degenerate solutions at the expense of fidelity in non-Lambertian effects (Fig. 1 highlights this particular limitation of the NeRF model). Photo-realistic synthesis of non-Lambertian effects is one of the few remaining hurdles for neural rendering techniaues.
Novel view synthesis. On top is the target image to be rendered, from the lab scene in the shiny dataset [55]. Bottom row shows crops of novel views generated by our proposed model, NeX [55], and NeRF [31]. Unlike NeX and NeRF that fail to synthesize refractions on the test tube, our model almost perfectly reconstructs these complex view-dependent effects. We indicate the PSNR of the rendered images within parenthesis (higher is better). Images can be zoomed for detail.