Loading [MathJax]/extensions/MathZoom.js
Proposal-Free Network for Instance-Level Object Segmentation | IEEE Journals & Magazine | IEEE Xplore

Proposal-Free Network for Instance-Level Object Segmentation


Abstract:

Instance-level object segmentation is an important yet under-explored task. Most of state-of-the-art methods rely on region proposal methods to extract candidate segments...Show More

Abstract:

Instance-level object segmentation is an important yet under-explored task. Most of state-of-the-art methods rely on region proposal methods to extract candidate segments and then utilize object classification to produce final results. Nonetheless, generating reliable region proposals itself is a quite challenging and unsolved task. In this work, we propose a Proposal-Free Network (PFN) to address the instance-level object segmentation problem, which outputs the numbers of instances of different categories and the pixel-level information on i) the coordinates of the instance bounding box each pixel belongs to, and ii) the confidences of different categories for each pixel, based on pixel-to-pixel deep convolutional neural network. All the outputs together, by using any off-the-shelf clustering method for simple post-processing, can naturally generate the ultimate instance-level object segmentation results. The whole PFN can be easily trained without the requirement of a proposal generation stage. Extensive evaluations on the challenging PASCAL VOC 2012 semantic segmentation benchmark demonstrate the effectiveness of the proposed PFN solution without relying on any proposal generation methods.
Published in: IEEE Transactions on Pattern Analysis and Machine Intelligence ( Volume: 40, Issue: 12, 01 December 2018)
Page(s): 2978 - 2991
Date of Publication: 22 November 2017

ISSN Information:

PubMed ID: 29990248

Funding Agency:

References is not available for this document.

1 Introduction

Over the past few decades, two of the most popular object recognition tasks, object detection and semantic segmentation, have received a lot of attention. The goal of object detection is to accurately predict the semantic category and the bounding box location for each object instance, which is a quite coarse localization. Different from object detection, the semantic segmentation task aims to assign the pixel-wise labels for each image but provides no indication of the object instances, such as the number of object instances and precise semantic region for any particular instance. In this work, we follow some of the recent works [1] , [2], [3] and attempt to solve a more challenging task, instance-level object segmentation, which predicts the segmentation mask for each instance of each category. We suggest that the next generation of object recognition should provide a richer and more detailed parsing for each image by labeling each object instance with an accurate pixel-wise segmentation mask. This is particularly important for real-world applications such as image captioning, image retrieval, 3-D navigation and driver assistance, where describing a scene with detailed individual instance regions is potentially more informative than describing roughly with located object detections. However, instance-level object segmentation is very challenging due to high occlusion, diverse shape deformation and appearance patterns, obscured boundaries with respect to other instances and background clutters in real-world scenes. In addition, the exact number of instances of each category within an image is dramatically different.

Select All
1.
B. Hariharan, P. Arbeláez, R. Girshick and J. Malik, "Simultaneous detection and segmentation", Proc. Eur. Conf. Comput. Vis., pp. 297-312, 2014.
2.
Y.-T. Chen, X. Liu and M.-H. Yang, "Multi-instance object segmentation with occlusion handling", Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pp. 3470-3478, 2015.
3.
Z. Zhang, A. G. Schwing, S. Fidler and R. Urtasun, "Monocular object instance segmentation and depth ordering with CNNs", Proc. IEEE Int. Conf. Comput. Vis., pp. 2614-2622, 2015.
4.
J. Dai, K. He and J. Sun, "BoxSup: Exploiting bounding boxes to supervise convolutional networks for semantic segmentation", Proc. IEEE Int. Conf. Comput. Vis., pp. 1635-1643, 2015.
5.
S. Zheng et al., "Conditional random fields as recurrent neural networks", Proc. IEEE Int. Conf. Comput. Vis., pp. 1529-1537, 2015.
6.
S. Ren, K. He, R. Girshick and J. Sun, "Faster R-CNN: Towards real-time object detection with region proposal networks", Proc. Int. Conf. Neural Inf. Process. Syst., pp. 91-99, 2015.
7.
J. Redmon, S. Divvala, R. Girshick and A. Farhadi, "You only look once: Unified real-time object detection", Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pp. 779-788, 2016.
8.
R. Stewart and M. Andriluka, "End-to-end people detection in crowded scenes", Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pp. 2325-2333, 2016.
9.
C. Szegedy et al., "Going deeper with convolutions", Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pp. 1-9, 2015.
10.
K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition", Proc. Int. Conf. Learn. Representations, 2015.
11.
J. Pont-Tuset, P. Arbeláez, J. Barron, F. Marques and J. Malik, "Multiscale combinatorial grouping for image segmentation and object proposal generation", arXiv:1503.00848, Mar. 2015.
12.
J. R. Uijlings, K. E. van de Sande, T. Gevers and A. W. Smeulders, "Selective search for object recognition", Int. J. Comput. Vis., vol. 104, no. 2, pp. 154-171, 2013.
13.
C. L. Zitnick and P. Dollár, "Edge boxes: Locating object proposals from edges", Proc. Eur. Conf. Comput. Vis., pp. 391-405, 2014.
14.
P. O. Pinheiro, R. Collobert and P. Dollár, "Learning to segment object candidates", Proc. Int. Conf. Neural Inf. Process. Syst., pp. 1990-1998, 2015.
15.
A. Krizhevsky, I. Sutskever and G. E. Hinton, "ImageNet classification with deep convolutional neural networks", Proc. Int. Conf. Neural Inf. Process. Syst., pp. 1097-1105, 2012.
16.
Y. Wei et al., "HCP: A flexible CNN framework for multi-label image classification", IEEE Trans. Pattern Anal. Mach. Intell., vol. 38, no. 9, pp. 1901-1907, Sep. 2016.
17.
S. Ren, K. He, R. Girshick, X. Zhang and J. Sun, "Object detection networks on convolutional feature maps", IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, no. 7, pp. 1476-1481, Jul. 2017.
18.
G. Papandreou, L.-C. Chen, K. Murphy and A. L. Yuille, "Weakly- and semi-supervised learning of a DCNN for semantic image segmentation", Proc. IEEE Int. Conf. Comput. Vis., pp. 1742-1750, 2015.
19.
H. Noh, S. Hong and B. Han, "Learning deconvolution network for semantic segmentation", Proc. IEEE Int. Conf. Comput. Vis., pp. 1520-1528, 2015.
20.
R. Girshick, J. Donahue, T. Darrell and J. Malik, "Rich feature hierarchies for accurate object detection and semantic segmentation", Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pp. 580-587, 2014.
21.
S. Gidaris and N. Komodakis, "Object detection via a multi-region semantic segmentation-aware CNN model", Proc. IEEE Int. Conf. Comput. Vis., pp. 1134-1142, 2015.
22.
D. Erhan, C. Szegedy, A. Toshev and D. Anguelov, "Scalable object detection using deep neural networks", Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pp. 2155-2162, 2014.
23.
R. Girshick, "Fast R-CNN", Proc. IEEE Int. Conf. Comput. Vis., pp. 1440-1448, 2015.
24.
N. Silberman, D. Sontag and R. Fergus, "Instance segmentation of indoor scenes using a coverage loss", Proc. Eur. Conf. Comput. Vis., pp. 616-631, 2014.
25.
J. Tighe, M. Niethammer and S. Lazebnik, "Scene parsing with object instances and occlusion ordering", Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pp. 3748-3755, 2014.
26.
B. Hariharan, P. Arbeláez, R. Girshick and J. Malik, "Hypercolumns for object segmentation and fine-grained localization", Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pp. 447-456, 2015.
27.
J. Dai, K. He and J. Sun, "Instance-aware semantic segmentation via multi-task network cascades", Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pp. 3150-3158, 2016.
28.
J. Dai, K. He and J. Sun, "Convolutional feature masking for joint object and stuff segmentation", Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pp. 3992-4000, 2015.
29.
M. Ren and R. S. Zemel, "End-to-end instance segmentation with recurrent attention", Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pp. 293-301, 2017.
30.
Y. Yang, S. Hallman, D. Ramanan and C. C. Fowlkes, "Layered object models for image segmentation", IEEE Trans. Pattern Anal. Mach. Intell., vol. 34, no. 9, pp. 1731-1743, Sep. 2012.

Contact IEEE to Subscribe

References

References is not available for this document.