Loading [MathJax]/extensions/MathMenu.js
Ensemble of exemplar-SVMs for object detection and beyond | IEEE Conference Publication | IEEE Xplore

Ensemble of exemplar-SVMs for object detection and beyond


Abstract:

This paper proposes a conceptually simple but surprisingly powerful method which combines the effectiveness of a discriminative object detector with the explicit correspo...Show More

Abstract:

This paper proposes a conceptually simple but surprisingly powerful method which combines the effectiveness of a discriminative object detector with the explicit correspondence offered by a nearest-neighbor approach. The method is based on training a separate linear SVM classifier for every exemplar in the training set. Each of these Exemplar-SVMs is thus defined by a single positive instance and millions of negatives. While each detector is quite specific to its exemplar, we empirically observe that an ensemble of such Exemplar-SVMs offers surprisingly good generalization. Our performance on the PASCAL VOC detection task is on par with the much more complex latent part-based model of Felzenszwalb et al., at only a modest computational cost increase. But the central benefit of our approach is that it creates an explicit association between each detection and a single training exemplar. Because most detections show good alignment to their associated exemplar, it is possible to transfer any available exemplar meta-data (segmentation, geometric structure, 3D model, etc.) directly onto the detections, which can then be used as part of overall scene understanding.
Date of Conference: 06-13 November 2011
Date Added to IEEE Xplore: 12 January 2012
ISBN Information:

ISSN Information:

Conference Location: Barcelona, Spain

1. Motivation

A mere decade ago, automatically recognizing everyday objects in images (such as the bus in Figure 1) was thought to be an almost unsolvable task. Yet today, a number of methods can do just that with reasonable accuracy. But let us consider the output of a typical object detector - a rough bounding box around the object and a category label (Figure 1 left). While this might be sufficient for a retrieval task (“find all buses in the database”), it seems rather lacking for any sort of deeper reasoning about the scene. How is the bus oriented? Is it a mini-bus or a double-decker? Which pixels actually belong to the bus? What is its rough geometry? These are all very hard questions for a typical object detector. But what if, in addition to the bounding box, we are able to obtain an association with a very similar exemplar from the training set (Figure 1 right), which can provide a high degree of correspondence. Suddenly, any kind of meta-data provided with the training sample (a pixel-wise annotation or label such as viewpoint, segmentation, coarse geometry, a 3D model, attributes, etc.) can be simply transferred to the new instance.

Contact IEEE to Subscribe

References

References is not available for this document.