Loading [MathJax]/extensions/MathZoom.js
Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories | IEEE Conference Publication | IEEE Xplore

Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories


Abstract:

This paper presents a method for recognizing scene categories based on approximate global geometric correspondence. This technique works by partitioning the image into in...Show More

Abstract:

This paper presents a method for recognizing scene categories based on approximate global geometric correspondence. This technique works by partitioning the image into increasingly fine sub-regions and computing histograms of local features found inside each sub-region. The resulting "spatial pyramid" is a simple and computationally efficient extension of an orderless bag-of-features image representation, and it shows significantly improved performance on challenging scene categorization tasks. Specifically, our proposed method exceeds the state of the art on the Caltech-101 database and achieves high accuracy on a large database of fifteen natural scene categories. The spatial pyramid framework also offers insights into the success of several recently proposed image descriptions, including Torralba’s "gist" and Lowe’s SIFT descriptors.
Date of Conference: 17-22 June 2006
Date Added to IEEE Xplore: 09 October 2006
Print ISBN:0-7695-2597-0
Print ISSN: 1063-6919
Conference Location: New York, NY, USA
No metrics found for this document.

1. Introduction

In this paper, we consider the problem of recognizing the semantic category of an image. For example, we may want to classify a photograph as depicting a scene (forest, street, office, etc.) or as containing a certain object of interest. For such whole-image categorization tasks, bag-of-features methods, which represent an image as an orderless collection of local features, have recently demonstrated impressive levels of performance [7], [22], [23], [25]. However, because these methods disregard all information about the spatial layout of the features, they have severely limited descriptive ability. In particular, they are incapable of capturing shape or of segmenting an object from its background. Unfortunately, overcoming these limitations to build effective structural object descriptions has proven to be quite challenging, especially when the recognition system must be made to work in the presence of heavy clutter, occlusion, or large viewpoint changes. Approaches based on generative part models [3], [5] and geometric correspondence search [1], [11] achieve robustness at significant computational expense. A more efficient approach is to augment a basic bag-of-features representation with pairwise relations between neighboring local features, but existing implementations of this idea [11], [17] have yielded inconclusive results. One other strategy for increasing robustness to geometric deformations is to increase the level of invariance of local features (e.g., by using affine-invariant detectors), but a recent large-scale evaluation [25] suggests that this strategy usually does not pay off.

Usage
Select a Year
2025

View as

Total usage sinceJan 2011:23,637
020406080100120140JanFebMarAprMayJunJulAugSepOctNovDec4373134000000000
Year Total:250
Data is updated monthly. Usage includes PDF downloads and HTML views.

Contact IEEE to Subscribe

References

References is not available for this document.