Conferences >2020 IEEE/CVF Conference on C...

Superpixel Segmentation With Fully Convolutional Networks

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

In computer vision, superpixels have been widely used as an effective way to reduce the number of image primitives for subsequent processing. But only a few attempts have...Show More

Metadata

Abstract:

In computer vision, superpixels have been widely used as an effective way to reduce the number of image primitives for subsequent processing. But only a few attempts have been made to incorporate them into deep neural networks. One main reason is that the standard convolution operation is defined on regular grids and becomes inefficient when applied to superpixels. Inspired by an initialization strategy commonly adopted by traditional superpixel algorithms, we present a novel method that employs a simple fully convolutional network to predict superpixels on a regular image grid. Experimental results on benchmark datasets show that our method achieves state-of-the-art superpixel segmentation performance while running at about 50fps. Based on the predicted superpixels, we further develop a downsampling/upsampling scheme for deep networks with the goal of generating high-resolution outputs for dense prediction tasks. Specifically, we modify a popular network architecture for stereo matching to simultaneously predict superpixels and disparities. We show that improved disparity estimation accuracy can be obtained on public datasets.

Published in: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Date of Conference: 13-19 June 2020

Date Added to IEEE Xplore: 05 August 2020

ISBN Information:

ISSN Information:

DOI: 10.1109/CVPR42600.2020.01398

Conference Location: Seattle, WA, USA

Contents

1. Introduction

In recent years, deep neural networks (DNNs) have achieved great success in a wide range of computer vision applications. The advance of novel neural architecture design and training schemes, however, often comes a greater demand for computational resources in terms of both memory and time. Consider the stereo matching task as an example. It has been empirically shown that, compared to traditional 2D convolution, 3D convolution on a 4D volume (height × width × disparity × feature channels) [17] can better capture context information and learn representations for each disparity level, resulting in superior disparity estimation results. But due to the extra feature dimension, 3D convolution is typically operating on spatial resolutions that are lower than the original input image size for the time and memory concern. For example, CSPN [8], the top-1 method on the KITTI 2015 benchmark, conducts 3D convolution at 1/4 of the input size and uses bilinear interpolation to upsample the predicted disparity volume for final disparity regression. To handle high resolution images (e.g., 2000 × 3000), HSM [42], the top-1 method on the Middlebury-v3 benchmark, uses a multi-scale approach to compute disparity volume at 1/8,1/16, and 1/32 of the input size. Bilinear upsampling is again applied to generate disparity maps at the full resolution. In both cases, object boundaries and fine details are often not well preserved in final disparity maps due to the upsampling operation.

References is not available for this document.

Superpixel Segmentation With Fully Convolutional Networks

Abstract:

Metadata

Abstract:

ISSN Information:

1. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Superpixel Segmentation With Fully Convolutional Networks

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

1. Introduction

References