Conferences >2012 IEEE 14th International ...

Consistent spatio-temporal filling of disocclusions in the multiview-video-plus-depth format

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Depth image-based rendering (DIBR) techniques allow for a wide variety of 3-D applications, including synthesizing additional virtual views in a multiview-video-plus-dept...Show More

Metadata

Abstract:

Depth image-based rendering (DIBR) techniques allow for a wide variety of 3-D applications, including synthesizing additional virtual views in a multiview-video-plus-depth (MVD) representation. The MVD format consists of scene texture and depth information for a limited number of original views of the same scene. One of the main obstacles in the DIBR technique lies in the disocclusion problem which results from the fact that a scene can only be observed from a set of original views. This can lead to missing information in the generated virtual views, especially in extrapolation scenarios. Our work describes a novel algorithm that synthesizes such disoccluded textures. The proposed synthesizer enhances the visual experience by taking spatial and temporal video information into account. In order to compensate for global motion in sequences, image registration is incorporated into the framework. Objective and subjective gains are shown compared to three state-of-the-art approaches.

Published in: 2012 IEEE 14th International Workshop on Multimedia Signal Processing (MMSP)

Date of Conference: 17-19 September 2012

Date Added to IEEE Xplore: 10 November 2012

ISBN Information:

DOI: 10.1109/MMSP.2012.6343410

Conference Location: Banff, AB, Canada

Contents

I. Introduction

Depth image-based rendering (DIBR) techniques have recently become popular for generating additional views in the multiview-video-plus-depth (MVD) representation. The MVD format consists of video and depth sequences for a limited number of original camera views of the same scene. In order to support autostereoscopic displays as well as to manually fit the depth impression to individual customer habits, the need to calculate additional virtual views arises. Computing these requires the original image points to be reprojected into the 3-D space by using the corresponding depth values. These space points are then projected into the image plane of a virtual camera at the location of the required viewing position. This process is called “3-D image warping” in the computer graphic literature [1]. In the case of a horizontally rectified camera setup, the epipolar lines are horizontally aligned and thus pixel values just need to be shifted along the image rows. The position of the newly projected virtual camera can be between (interpolation) or beside (extrapolation) the viewing range of the original cameras (cf. Fig. 1). One of the most significant problems in DIBR is the question of how to deal with uncovered areas (holes) in the virtual views (cf. Fig. 1), especially in extrapolated views beyond the viewing range of the original cameras. In the extrapolated views, textures which are invisible in all original cameras may become visible. Due to the enhancement of the depth experience through extrapolation in 3D-video, the Moving Pictures Experts Group (MPEG) started experiments to explore the extrapolation capabilities of DIBR algorithms [2]. In the literature, three general methods have been proposed to tackle such holes. First, the depth maps are preprocessed in a way that no disocclusions occur. Usually, the depth map is smoothed, using a symmetric [3] or asymmetric filter [4], to lower gradients. This method gives good results when small baselines need to be compensated. Nevertheless, geometrical distortions can be observed in both foreground and background texture regions. The second way to compensate disocclusions is to cover them with plausible, known image information. Appropriate filling techniques are line-wise filling [5], inpainting methods [6], [7], bilateral filtering [8] and texture synthesis [9]. Alternatively, image domain warping can be utilized to determine virtual views and fill disocclusions by distorting non-salient image regions [10]. A further challenge in DIBR is to maintain temporal consistency in the uncovered areas. First approaches that handle this problem have been published in [9], [11] and [12]. In [9] and [11] a mosaic/sprite is used to store background information from neighboring frames for further reuse during the filling process. Nevertheless, these approaches are restricted to sequences with static backgrounds. Chen et al. [12] assume that the original views are encoded with H.264/AVC and use the motion vectors from the bit stream to find appropriate information in temporally shifted frames. However, the motion vectors in H.264/AVC are sparse and encoder optimized. This can yield motion vectors that are different from the real motion. Hence, only little objective and subjective gains are achieved. Fig. 1: Extrapolated virtual camera views from original views.

References is not available for this document.

MIT Libraries

MIT Libraries

Consistent spatio-temporal filling of disocclusions in the multiview-video-plus-depth format

Abstract:

Metadata

Abstract:

I. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

MIT Libraries

MIT Libraries

Consistent spatio-temporal filling of disocclusions in the multiview-video-plus-depth format

Alerts

Abstract:

Metadata

Abstract:

I. Introduction

References