Loading [MathJax]/extensions/MathZoom.js
Peek-a-Boo: Occlusion Reasoning in Indoor Scenes With Plane Representations | IEEE Conference Publication | IEEE Xplore

Peek-a-Boo: Occlusion Reasoning in Indoor Scenes With Plane Representations


Abstract:

We address the challenging task of occlusion-aware indoor 3D scene understanding. We represent scenes by a set of planes, where each one is defined by its normal, offset ...Show More

Abstract:

We address the challenging task of occlusion-aware indoor 3D scene understanding. We represent scenes by a set of planes, where each one is defined by its normal, offset and two masks outlining (i) the extent of the visible part and (ii) the full region that consists of both visible and occluded parts of the plane. We infer these planes from a single input image with a novel neural network architecture. It consists of a two-branch category-specific module that aims to predict layout and objects of the scene separately so that different types of planes can be handled better. We also introduce a novel loss function based on plane warping that can leverage multiple views at training time for improved occlusion-aware reasoning. In order to train and evaluate our occlusion-reasoning model, we use the ScanNet dataset and propose (i) a strategy to automatically extract ground truth for both visible and hidden regions and (ii) a new evaluation metric that specifically focuses on the prediction in hidden regions. We empirically demonstrate that our proposed approach can achieve higher accuracy for occlusion reasoning compared to competitive baselines on the ScanNet dataset, e.g. 42.65% relative improvement on hidden regions.
Date of Conference: 13-19 June 2020
Date Added to IEEE Xplore: 05 August 2020
ISBN Information:

ISSN Information:

Conference Location: Seattle, WA, USA

1. Introduction

Reasoning about occlusions occurring in the 3D world is an ability at which human visual perception excels. While we develop an understanding for the concept of object permanence already as toddlers, for instance by playing peek-a-boo, it is a very challenging skill for machine intelligence to acquire, since it requires strong contextual and prior knowledge about objects and scenes. This is particularly true for indoor scenes where the composition of objects and scenes is highly complex and leads to numerous and strong occlusions. And while several works exist that investigate this problem for outdoor scenes [5], [13], [24], there has been comparatively little work for indoor scenes. But indoor applications that can potentially benefit from occlusion reasoning are ample, like robot navigation or augmented reality.

Given a single image as input, our model predicts planes to describe both visible and occluded areas of the scene with separate branches for objects and layout (top). This model can be used for occlusion reasoning and novel view synthesis (bottom).

Contact IEEE to Subscribe

References

References is not available for this document.