1. Introduction
Reasoning about occlusions occurring in the 3D world is an ability at which human visual perception excels. While we develop an understanding for the concept of object permanence already as toddlers, for instance by playing peek-a-boo, it is a very challenging skill for machine intelligence to acquire, since it requires strong contextual and prior knowledge about objects and scenes. This is particularly true for indoor scenes where the composition of objects and scenes is highly complex and leads to numerous and strong occlusions. And while several works exist that investigate this problem for outdoor scenes [5], [13], [24], there has been comparatively little work for indoor scenes. But indoor applications that can potentially benefit from occlusion reasoning are ample, like robot navigation or augmented reality.
Given a single image as input, our model predicts planes to describe both visible and occluded areas of the scene with separate branches for objects and layout (top). This model can be used for occlusion reasoning and novel view synthesis (bottom).