I. Introduction
Depth image-based rendering (DIBR) techniques have recently become popular for generating additional views in the multiview-video-plus-depth (MVD) representation. The MVD format consists of video and depth sequences for a limited number of original camera views of the same scene. In order to support autostereoscopic displays as well as to manually fit the depth impression to individual customer habits, the need to calculate additional virtual views arises. Computing these requires the original image points to be reprojected into the 3-D space by using the corresponding depth values. These space points are then projected into the image plane of a virtual camera at the location of the required viewing position. This process is called “3-D image warping” in the computer graphic literature [1]. In the case of a horizontally rectified camera setup, the epipolar lines are horizontally aligned and thus pixel values just need to be shifted along the image rows. The position of the newly projected virtual camera can be between (interpolation) or beside (extrapolation) the viewing range of the original cameras (cf. Fig. 1). One of the most significant problems in DIBR is the question of how to deal with uncovered areas (holes) in the virtual views (cf. Fig. 1), especially in extrapolated views beyond the viewing range of the original cameras. In the extrapolated views, textures which are invisible in all original cameras may become visible. Due to the enhancement of the depth experience through extrapolation in 3D-video, the Moving Pictures Experts Group (MPEG) started experiments to explore the extrapolation capabilities of DIBR algorithms [2]. In the literature, three general methods have been proposed to tackle such holes. First, the depth maps are preprocessed in a way that no disocclusions occur. Usually, the depth map is smoothed, using a symmetric [3] or asymmetric filter [4], to lower gradients. This method gives good results when small baselines need to be compensated. Nevertheless, geometrical distortions can be observed in both foreground and background texture regions. The second way to compensate disocclusions is to cover them with plausible, known image information. Appropriate filling techniques are line-wise filling [5], inpainting methods [6], [7], bilateral filtering [8] and texture synthesis [9]. Alternatively, image domain warping can be utilized to determine virtual views and fill disocclusions by distorting non-salient image regions [10]. A further challenge in DIBR is to maintain temporal consistency in the uncovered areas. First approaches that handle this problem have been published in [9], [11] and [12]. In [9] and [11] a mosaic/sprite is used to store background information from neighboring frames for further reuse during the filling process. Nevertheless, these approaches are restricted to sequences with static backgrounds. Chen et al. [12] assume that the original views are encoded with H.264/AVC and use the motion vectors from the bit stream to find appropriate information in temporally shifted frames. However, the motion vectors in H.264/AVC are sparse and encoder optimized. This can yield motion vectors that are different from the real motion. Hence, only little objective and subjective gains are achieved. Extrapolated virtual camera views from original views.