Conferences >2009 IEEE 12th International ...

Decomposing a scene into geometric and semantically consistent regions

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

High-level, or holistic, scene understanding involves reasoning about objects, regions, and the 3D relationships between them. This requires a representation above the le...Show More

Metadata

Abstract:

High-level, or holistic, scene understanding involves reasoning about objects, regions, and the 3D relationships between them. This requires a representation above the level of pixels that can be endowed with high-level attributes such as class of object/region, its orientation, and (rough 3D) location within the scene. Towards this goal, we propose a region-based model which combines appearance and scene geometry to automatically decompose a scene into semantically meaningful regions. Our model is defined in terms of a unified energy function over scene appearance and structure. We show how this energy function can be learned from data and present an efficient inference technique that makes use of multiple over-segmentations of the image to propose moves in the energy-space. We show, experimentally, that our method achieves state-of-the-art performance on the tasks of both multi-class image segmentation and geometric reasoning. Finally, by understanding region classes and geometry, we show how our model can be used as the basis for 3D reconstruction of the scene.

Published in: 2009 IEEE 12th International Conference on Computer Vision

Date of Conference: 29 September 2009 - 02 October 2009

Date Added to IEEE Xplore: 06 May 2010

ISBN Information:

ISSN Information:

DOI: 10.1109/ICCV.2009.5459211

Conference Location: Kyoto, Japan

References is not available for this document.

Contents

1. Introduction

With recent success on many vision subtasks-object detection [21], [18], [3], multi-class image segmentation [17], [7], [13], and 3D reconstruction [10], [16]—holistic scene understanding has emerged as one of the next great challenges for computer vision [11], [9], [19]. Here the aim is to reason jointly about objects, regions and geometry of a scene with the hope of avoiding the many errors induced by modeling these tasks in isolation.

Select All

D. Comaniciu, P. Meer and S. Member, Mean shift: A robust approach toward feature space analysis, PAMI, 2002.

Google Scholar

A. Criminisi, Microsoft research cambridge object recognition image database, 2004.

Google Scholar

N. Dalal and B. Triggs, "Histograms of oriented gradients for human detection", CVPR, 2005.

CrossRef Google Scholar

M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn and A. Zisserman, The PASCAL Visual Object Classes Challenge 2007 (VOC2007) Results, 2007.

S. Gould, J. Rodgers, D. Cohen, G. Elidan and D. Koller, "Multi-class segmentation with relative location prior", IJCV, 2008.

CrossRef Google Scholar

R. I. Hartley and A. Zisserman, Multiple View Geometry in Computer Vision, Cambridge University Press, 2004.

CrossRef Google Scholar

X. He, R. Zemel and M. Carreira-Perpinan, "Multiscale CRFs for image labeling", CVPR, 2004.

Google Scholar

G. Heitz, G. Elidan, B. Packer and D. Koller, "Shape-based object localization for descriptive classification", NIPS, 2008.

Google Scholar

G. Heitz, S. Gould, A. Saxena and D. Koller, "Cascaded classification models: Combining models for holistic scene understanding", NIPS, 2008.

Google Scholar

10.

D. Hoiem, A. A. Efros and M. Hebert, "Recovering surface layout from an image", IJCV, 2007.

CrossRef Google Scholar

11.

D. Hoiem, A. A. Efros and M. Hebert, "Closing the loop on scene interpretation", CVPR, 2008.

CrossRef Google Scholar

12.

D. Hoiem, A. N. Stein, A. A. Efros and M. Hebert, "Recovering occlusion boundaries", ICCV, 2007.

Google Scholar

13.

P. Kohli, L. Ladicky and P. Torr, "Robust higher order potentials for enforcing label consistency", CVPR, 08.

CrossRef Google Scholar

14.

B. C. Russell, A. A. Efros, J. Sivic, W. T. Freeman and A. Zisserman, "Using multiple segmentations to discover objects and their extent in image collections", CVPR, 06.

View Article

Google Scholar

15.

B. C. Russell, A. B. Torralba, K. P. Murphy and W. T. Freeman, "Labelme: A database and web-based tool for image annotation", IJCV, 2008.

CrossRef Google Scholar

16.

A. Saxena, M. Sun and A. Y. Ng, "Learning 3-D scene structure from a single still image", PAMI, 2008.

Google Scholar

17.

J. Shotton, J. Winn, C. Rother and A. Criminisi, "Texton-Boost: Joint appearance shape and context modeling for multi-class obj. rec. and seg", ECCV, 2006.

Google Scholar

18.

A. Torralba, K. P. Murphy and W. T. Freeman, "Contextual models for object detection using BRFs", NIPS, 2005.

Google Scholar

19.

Z. Tu, "Auto-context and its application to high-level vision tasks", CVPR, 2008.

Google Scholar

20.

Z. Tu, X. Chen, A. L. Yuille and S.-C. Zhu, "Image parsing: Unifying segmentation detection and recognition", ICCV, 2003.

Google Scholar

21.

P. Viola and M. J. Jones, "Robust real-time face detection", IJCV, 2004.