Conferences >2006 IEEE Computer Society Co...

Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

This paper presents a method for recognizing scene categories based on approximate global geometric correspondence. This technique works by partitioning the image into in...Show More

Metadata

Abstract:

This paper presents a method for recognizing scene categories based on approximate global geometric correspondence. This technique works by partitioning the image into increasingly fine sub-regions and computing histograms of local features found inside each sub-region. The resulting "spatial pyramid" is a simple and computationally efficient extension of an orderless bag-of-features image representation, and it shows significantly improved performance on challenging scene categorization tasks. Specifically, our proposed method exceeds the state of the art on the Caltech-101 database and achieves high accuracy on a large database of fifteen natural scene categories. The spatial pyramid framework also offers insights into the success of several recently proposed image descriptions, including Torralba’s "gist" and Lowe’s SIFT descriptors.

Published in: 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06)

Date of Conference: 17-22 June 2006

Date Added to IEEE Xplore: 09 October 2006

Print ISBN:0-7695-2597-0

Print ISSN: 1063-6919

DOI: 10.1109/CVPR.2006.68

Conference Location: New York, NY, USA

Contents

1. Introduction

In this paper, we consider the problem of recognizing the semantic category of an image. For example, we may want to classify a photograph as depicting a scene (forest, street, office, etc.) or as containing a certain object of interest. For such whole-image categorization tasks, bag-of-features methods, which represent an image as an orderless collection of local features, have recently demonstrated impressive levels of performance [7], [22], [23], [25]. However, because these methods disregard all information about the spatial layout of the features, they have severely limited descriptive ability. In particular, they are incapable of capturing shape or of segmenting an object from its background. Unfortunately, overcoming these limitations to build effective structural object descriptions has proven to be quite challenging, especially when the recognition system must be made to work in the presence of heavy clutter, occlusion, or large viewpoint changes. Approaches based on generative part models [3], [5] and geometric correspondence search [1], [11] achieve robustness at significant computational expense. A more efficient approach is to augment a basic bag-of-features representation with pairwise relations between neighboring local features, but existing implementations of this idea [11], [17] have yielded inconclusive results. One other strategy for increasing robustness to geometric deformations is to increase the level of invariance of local features (e.g., by using affine-invariant detectors), but a recent large-scale evaluation [25] suggests that this strategy usually does not pay off.

References is not available for this document.

MIT Libraries

MIT Libraries

Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories

Abstract:

Metadata

Abstract:

1. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

MIT Libraries

MIT Libraries

Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories

Alerts

Abstract:

Metadata

Abstract:

1. Introduction

References