Loading [MathJax]/extensions/MathMenu.js
Nonparametric image parsing using adaptive neighbor sets | IEEE Conference Publication | IEEE Xplore

Nonparametric image parsing using adaptive neighbor sets


Abstract:

This paper proposes a non-parametric approach to scene parsing inspired by the work of Tighe and Lazebnik [22]. In their approach, a simple kNN scheme with multiple descr...Show More

Abstract:

This paper proposes a non-parametric approach to scene parsing inspired by the work of Tighe and Lazebnik [22]. In their approach, a simple kNN scheme with multiple descriptor types is used to classify super-pixels. We add two novel mechanisms: (i) a principled and efficient method for learning per-descriptor weights that minimizes classification error, and (ii) a context-driven adaptation of the training set used for each query, which conditions on common classes (which are relatively easy to classify) to improve performance on rare ones. The first technique helps to remove extraneous descriptors that result from the imperfect distance metrics/representations of each super-pixel. The second contribution re-balances the class frequencies, away from the highly-skewed distribution found in real-world scenes. Both methods give a significant performance boost over [22] and the overall system achieves state-of-the-art performance on the SIFT-Flow dataset.
Date of Conference: 16-21 June 2012
Date Added to IEEE Xplore: 26 July 2012
ISBN Information:

ISSN Information:

Conference Location: Providence, RI, USA
Department of Computer Science, Courant Institute, New York University, USA
Department of Computer Science, Courant Institute, New York University, USA

1. Introduction

Densely labeling a scene is a challenging recognition task which is the focus of much recent work [7], [13], [20], [26], [27]. The difficulty stems from several factors. First, the incredible diversity of the visual world means that each region can potentially take on one of hundreds of different labels. Second, the distribution of classes in a typical scene is far from uniform, following a power-law (as illustrated in Fig. 8). Consequently, many classes will have a small number of instances even in a large dataset, making it hard to train good classifiers. Third, as noted by Frome et al. [3], the use of single global distance metric for all descriptors is insufficient to handle the large degree of variation found in a given class. For example, the position within the image may sometimes be an important cue for finding people (e.g. when they are walking on a street), but on other occasions position may be irrelevant and color a much better feature (e.g. the person is close and facing the camera).

Department of Computer Science, Courant Institute, New York University, USA
Department of Computer Science, Courant Institute, New York University, USA
Contact IEEE to Subscribe

References

References is not available for this document.