Loading [MathJax]/extensions/MathZoom.js
Finding Things: Image Parsing with Regions and Per-Exemplar Detectors | IEEE Conference Publication | IEEE Xplore

Finding Things: Image Parsing with Regions and Per-Exemplar Detectors


Abstract:

This paper presents a system for image parsing, or labeling each pixel in an image with its semantic category, aimed at achieving broad coverage across hundreds of object...Show More

Abstract:

This paper presents a system for image parsing, or labeling each pixel in an image with its semantic category, aimed at achieving broad coverage across hundreds of object categories, many of them sparsely sampled. The system combines region-level features with per-exemplar sliding window detectors. Per-exemplar detectors are better suited for our parsing task than traditional bounding box detectors: they perform well on classes with little training data and high intra-class variation, and they allow object masks to be transferred into the test image for pixel-level segmentation. The proposed system achieves state-of-the-art accuracy on three challenging datasets, the largest of which contains 45,676 images and 232 labels.
Date of Conference: 23-28 June 2013
Date Added to IEEE Xplore: 03 October 2013
Electronic ISBN:978-1-5386-5672-3

ISSN Information:

Conference Location: Portland, OR, USA
References is not available for this document.

1. Introduction

This paper addresses the problem of image parsing, or labeling each pixel in an image with its semantic category. Our goal is achieving broad coverage - the ability to recognize hundreds or thousands of object classes that commonly occur in everyday street scenes and indoor environments. A major challenge in doing this is posed by the non-uniform statistics of these classes in realistic scene images. A small number of classes - mainly ones associated with large regions or “stuff,” such as road, sky, trees, buildings, etc. - constitute the majority of all image pixels and object instances in the dataset. But a much larger number of “thing” classes - people, cars, dogs, mailboxes, vases, stop signs - occupy a small percentage of image pixels and have relatively few instances each.

Select All
1.
P. Arbelaez, B. Hariharan, C. Gu, S. Gupta, and L. Bourdev. Semantic segmentation using regions and parts. In CVPR, 2012. 1
2.
E. Borenstein and S. Ullman. Class-specific, top-down segmentation. In ECCV, 2002. 1
3.
L. Bourdev and J. Malik. Poselets: Body part detectors trained using 3d human pose annotations. In ICCV, 2009. 1
4.
Y. Boykov and V. Kolmogorov. An experimental comparison of mincut/ max-flow algorithms for energy minimization in vision. PAMI, 26(9):1124-37, Sept. 2004. 3
5.
G. J. Brostow, J. Shotton, J. Fauqueur, and R. Cipolla. Segmentation and recognition using structure from motion point clouds. In ECCV, 2008. 1, 4, 6
6.
N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. In CVPR, 2005. 1
7.
D. Eigen and R. Fergus. Nonparametric image parsing using adaptive neighbor sets. In CVPR, 2012. 1, 5
8.
C. Farabet, C. Couprie, L. Najman, and Y. LeCun. Scene parsing with multiscale feature learning, purity trees, and optimal covers. Arxiv preprint arXiv:1202. 2160 [cs. CV], 2012. 1, 5
9.
P. Felzenszwalb, D. McAllester, and D. Ramanan. A discriminatively trained, multiscale, deformable part model. In CVPR, 2008. 1
10.
G. Floros, K. Rematas, and B. Leibe. Multi-class image labeling with top-down segmentation and generalized robust PN potentials. In BMVC, 2011. 1, 4, 6
11.
M. Grundmann, V. Kwatra, M. Han, and I. Essa. Efficient hierarchical graph-based video segmentation. In CVPR, 2010. 4
12.
R. Guo and D. Hoiem. Beyond the line of sight: labeling the underlying surfaces. In ECCV, 2012. 1
13.
B. Hariharan, J. Malik, and D. Ramanan. Discriminative decorrelation for clustering and classification. In ECCV, 2012. 6
14.
G. Heitz and D. Koller. Learning spatial context: Using stuff to find things. In ECCV, pages 30-43, 2008. 1
15.
V. Kolmogorov and R. Zabih. What energy functions can be minimized via graph cuts? PAMI, 26(2):147-59, Feb. 2004. 3
16.
L. Ladický, P. Sturgess, K. Alahari, C. Russell, and P. H. S. Torr. What, where and how many? Combining object detectors and CRFs. In ECCV, 2010. 1, 4, 5, 6
17.
B. Leibe, A. Leonardis, and B. Schiele. Robust object detection with interleaved categorization and segmentation. IJCV, 77(13):259289, 2008. 1
18.
C. Liu, J. Yuen, and A. Torralba. Nonparametric scene parsing via label transfer. PAMI, 33(12):2368-2382, June 2011. 1, 3, 5
19.
T. Malisiewicz, A. Gupta, and A. A. Efros. Ensemble of exemplar-SVMs for object detection and beyond. In ICCV, 2011. 1, 2, 3
20.
M. Marsza?ek and C. Schmid. Accurate object recognition with shape masks. IJCV, 97(2):191-209, 2012. 1
21.
D. Munoz, J. A. Bagnell, and M. Hebert. Stacked hierarchical labeling. In ECCV, pages 57-70, 2010. 1
22.
H. J. Myeong, Y. Chang, and K. M. Lee. Learning object relationships via graph-based context model. CVPR, June 2012. 1, 5
23.
A. Rahimi and B. Recht. Random features for large-scale kernel machines. In NIPS, 2007. 3, 4, 5
24.
B. C. Russell, A. Torralba, K. P. Murphy, and W. T. Freeman. LabelMe: a database and web-based tool for image annotation. IJCV, 77(1-3):157-173, 2008. 3, 6
25.
J. Shotton, J. Winn, C. Rother, and A. Criminisi. TextonBoost: Joint appearance, shape and context modeling for multi-class object recognition and segmentation. In ECCV, 2006. 1, 3
26.
P. Sturgess, K. Alahari, L. Ladický, and P. H. S. Torr. Combining appearance and structure from motion features for road scene understanding. BMVC, 2009. 1, 4, 6
27.
J. Tighe and S. Lazebnik. SuperParsing: Scalable nonparametric image parsing with superpixels. IJCV, 101(2):329-349, Jan 2013. 1, 2, 3, 4, 5, 6
28.
J. Xiao, J. Hays, K. A. Ehinger, A. Oliva, and A. Torralba. SUN database: Large-scale scene recognition from abbey to zoo. In CVPR, June 2010. 3
29.
C. Zhang, L. Wang, and R. Yang. Semantic segmentation of urban scenes using dense depth maps. In ECCV, 2010. 1, 4, 6

Contact IEEE to Subscribe

References

References is not available for this document.