Describing the scene as a whole: Joint object detection, scene classification and semantic segmentation | IEEE Conference Publication | IEEE Xplore

Describing the scene as a whole: Joint object detection, scene classification and semantic segmentation


Abstract:

In this paper we propose an approach to holistic scene understanding that reasons jointly about regions, location, class and spatial extent of objects, presence of a clas...Show More

Abstract:

In this paper we propose an approach to holistic scene understanding that reasons jointly about regions, location, class and spatial extent of objects, presence of a class in the image, as well as the scene type. Learning and inference in our model are efficient as we reason at the segment level, and introduce auxiliary variables that allow us to decompose the inherent high-order potentials into pairwise potentials between a few variables with small number of states (at most the number of classes). Inference is done via a convergent message-passing algorithm, which, unlike graph-cuts inference, has no submodularity restrictions and does not require potential specific moves. We believe this is very important, as it allows us to encode our ideas and prior knowledge about the problem without the need to change the inference engine every time we introduce a new potential. Our approach outperforms the state-of-the-art on the MSRC-21 benchmark, while being much faster. Importantly, our holistic model is able to improve performance in all tasks.
Date of Conference: 16-21 June 2012
Date Added to IEEE Xplore: 26 July 2012
ISBN Information:

ISSN Information:

Conference Location: Providence, RI, USA

1. Introduction

While there has been significant progress in solving tasks such as image labeling [14], object detection [5] and scene classification [26], existing approaches could benefit from solving these problems jointly [9]. For example, segmentation should be easier if we know where the object of interest is. Similarly, if we know the type of the scene, we can narrow down the classes we are expected to see, e.g., if we are looking at the sea, we are more likely to see a boat than a cow. Conversely, if we know which semantic regions (e.g., sky, road) and which objects are present in the scene, we can more accurately infer the scene type. Holistic scene understanding aims at recovering multiple related aspects of a scene so as to provide a deeper understanding of the scene as a whole.

Contact IEEE to Subscribe

References

References is not available for this document.