Loading [MathJax]/extensions/MathMenu.js
Hierarchical Lovász Embeddings for Proposal-free Panoptic Segmentation | IEEE Conference Publication | IEEE Xplore

Hierarchical Lovász Embeddings for Proposal-free Panoptic Segmentation


Abstract:

Panoptic segmentation brings together two separate tasks: instance and semantic segmentation. Although they are related, unifying them faces an apparent paradox: how to l...Show More

Abstract:

Panoptic segmentation brings together two separate tasks: instance and semantic segmentation. Although they are related, unifying them faces an apparent paradox: how to learn simultaneously instance-specific and category-specific (i.e. instance-agnostic) representations jointly. Hence, state-of-the-art panoptic segmentation methods use complex models with a distinct stream for each task. In contrast, we propose Hierarchical Lovász Embeddings, per pixel feature vectors that simultaneously encode instance- and categorylevel discriminative information. We use a hierarchical Lovász hinge loss to learn a low-dimensional embedding space structured into a unified semantic and instance hierarchy without requiring separate network branches or object proposals. Besides modeling instances precisely in a proposal-free manner, our Hierarchical Lovász Embeddings generalize to categories by using a simple Nearest-Class-Mean classifier, including for non-instance "stuff" classes where instance segmentation methods are not applicable. Our simple model achieves state-of-the-art results compared to existing proposal-free panoptic segmentation methods on Cityscapes, COCO, and Mapillary Vistas. Furthermore, our model demonstrates temporal stability between video frames.
Date of Conference: 20-25 June 2021
Date Added to IEEE Xplore: 02 November 2021
ISBN Information:

ISSN Information:

Conference Location: Nashville, TN, USA

1. Introduction

Holistic scene understanding is an important task in computer vision, where a model is trained to explain each pixel in an image, whether that pixel describes stuff – uncountable regions of similar texture such as grass, road or sky – or thing – a countable object with individually identifying characteristics, such as people or cars. While holistic scene understanding received some early attention [49], [55], [48], modern deep learning-based methods have mainly tackled the tasks of modeling stuff and things independently under the task names semantic segmentation and instance segmentation. Recently, Kirillov et al. proposed the panoptic quality (PQ) metric for unifying these two parallel tracks into the holistic task of panoptic segmentation [24]. Panoptic segmentation is a key step for visual understanding, with applications in fields such as autonomous driving or robotics, where it is crucial to know both the locations of dynamically trackable things, as well as static stuff classes. For example, an autonomous car needs to be able to both avoid other cars with high precision, as well as understand the location of the road and sidewalk to stay on a desired path.

Contact IEEE to Subscribe

References

References is not available for this document.