1. Introduction
While there has been significant progress in solving tasks such as image labeling [14], object detection [5] and scene classification [26], existing approaches could benefit from solving these problems jointly [9]. For example, segmentation should be easier if we know where the object of interest is. Similarly, if we know the type of the scene, we can narrow down the classes we are expected to see, e.g., if we are looking at the sea, we are more likely to see a boat than a cow. Conversely, if we know which semantic regions (e.g., sky, road) and which objects are present in the scene, we can more accurately infer the scene type. Holistic scene understanding aims at recovering multiple related aspects of a scene so as to provide a deeper understanding of the scene as a whole.