Loading [MathJax]/extensions/MathMenu.js
Decomposing a scene into geometric and semantically consistent regions | IEEE Conference Publication | IEEE Xplore

Decomposing a scene into geometric and semantically consistent regions


Abstract:

High-level, or holistic, scene understanding involves reasoning about objects, regions, and the 3D relationships between them. This requires a representation above the le...Show More

Abstract:

High-level, or holistic, scene understanding involves reasoning about objects, regions, and the 3D relationships between them. This requires a representation above the level of pixels that can be endowed with high-level attributes such as class of object/region, its orientation, and (rough 3D) location within the scene. Towards this goal, we propose a region-based model which combines appearance and scene geometry to automatically decompose a scene into semantically meaningful regions. Our model is defined in terms of a unified energy function over scene appearance and structure. We show how this energy function can be learned from data and present an efficient inference technique that makes use of multiple over-segmentations of the image to propose moves in the energy-space. We show, experimentally, that our method achieves state-of-the-art performance on the tasks of both multi-class image segmentation and geometric reasoning. Finally, by understanding region classes and geometry, we show how our model can be used as the basis for 3D reconstruction of the scene.
Date of Conference: 29 September 2009 - 02 October 2009
Date Added to IEEE Xplore: 06 May 2010
ISBN Information:

ISSN Information:

Conference Location: Kyoto, Japan
References is not available for this document.

1. Introduction

With recent success on many vision subtasks-object detection [21], [18], [3], multi-class image segmentation [17], [7], [13], and 3D reconstruction [10], [16]—holistic scene understanding has emerged as one of the next great challenges for computer vision [11], [9], [19]. Here the aim is to reason jointly about objects, regions and geometry of a scene with the hope of avoiding the many errors induced by modeling these tasks in isolation.

Select All
1.
D. Comaniciu, P. Meer and S. Member, Mean shift: A robust approach toward feature space analysis, PAMI, 2002.
2.
A. Criminisi, Microsoft research cambridge object recognition image database, 2004.
3.
N. Dalal and B. Triggs, "Histograms of oriented gradients for human detection", CVPR, 2005.
4.
M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn and A. Zisserman, The PASCAL Visual Object Classes Challenge 2007 (VOC2007) Results, 2007.
5.
S. Gould, J. Rodgers, D. Cohen, G. Elidan and D. Koller, "Multi-class segmentation with relative location prior", IJCV, 2008.
6.
R. I. Hartley and A. Zisserman, Multiple View Geometry in Computer Vision, Cambridge University Press, 2004.
7.
X. He, R. Zemel and M. Carreira-Perpinan, "Multiscale CRFs for image labeling", CVPR, 2004.
8.
G. Heitz, G. Elidan, B. Packer and D. Koller, "Shape-based object localization for descriptive classification", NIPS, 2008.
9.
G. Heitz, S. Gould, A. Saxena and D. Koller, "Cascaded classification models: Combining models for holistic scene understanding", NIPS, 2008.
10.
D. Hoiem, A. A. Efros and M. Hebert, "Recovering surface layout from an image", IJCV, 2007.
11.
D. Hoiem, A. A. Efros and M. Hebert, "Closing the loop on scene interpretation", CVPR, 2008.
12.
D. Hoiem, A. N. Stein, A. A. Efros and M. Hebert, "Recovering occlusion boundaries", ICCV, 2007.
13.
P. Kohli, L. Ladicky and P. Torr, "Robust higher order potentials for enforcing label consistency", CVPR, 08.
14.
B. C. Russell, A. A. Efros, J. Sivic, W. T. Freeman and A. Zisserman, "Using multiple segmentations to discover objects and their extent in image collections", CVPR, 06.
15.
B. C. Russell, A. B. Torralba, K. P. Murphy and W. T. Freeman, "Labelme: A database and web-based tool for image annotation", IJCV, 2008.
16.
A. Saxena, M. Sun and A. Y. Ng, "Learning 3-D scene structure from a single still image", PAMI, 2008.
17.
J. Shotton, J. Winn, C. Rother and A. Criminisi, "Texton-Boost: Joint appearance shape and context modeling for multi-class obj. rec. and seg", ECCV, 2006.
18.
A. Torralba, K. P. Murphy and W. T. Freeman, "Contextual models for object detection using BRFs", NIPS, 2005.
19.
Z. Tu, "Auto-context and its application to high-level vision tasks", CVPR, 2008.
20.
Z. Tu, X. Chen, A. L. Yuille and S.-C. Zhu, "Image parsing: Unifying segmentation detection and recognition", ICCV, 2003.
21.
P. Viola and M. J. Jones, "Robust real-time face detection", IJCV, 2004.
22.
J. Winn and N. Jojic, "LOCUS: Learning object classes with unsupervised segmentation", ICCV, 2005.
23.
J. Winn and J. Shotton, "The layout consistent random field for recognizing and segmenting partially occluded objects", CVPR, 2006.
24.
L. Yang, P. Meer and D. J. Foran, "Multiple class segmentation using a unified framework over mean-shift patches", CVPR, 2007.
Contact IEEE to Subscribe

References

References is not available for this document.