Conferences >2023 IEEE/CVF Conference on C...

GRES: Generalized Referring Expression Segmentation

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Referring Expression Segmentation (RES) aims to generate a segmentation mask for the object described by a given language expression. Existing classic RES datasets and me...Show More

Metadata

Abstract:

Referring Expression Segmentation (RES) aims to generate a segmentation mask for the object described by a given language expression. Existing classic RES datasets and methods commonly support single-target expressions only, i.e., one expression refers to one target object. Multitarget and no-target expressions are not considered. This limits the usage of RES in practice. In this paper, we introduce a new benchmark called Generalized Referring Expression Segmentation (GRES), which extends the classic RES to allow expressions to refer to an arbitrary number of target objects. Towards this, we construct the first largescale GRES dataset called gRefCOCO that contains multitarget, no-target, and single-target expressions. GRES and gRefCOCO are designed to be well-compatible with RES, facilitating extensive experiments to study the performance gap of the existing RES methods on the GRES task. In the experimental study, we find that one of the big challenges of GRES is complex relationship modeling. Based on this, we propose a region-based GRES baseline ReLA that adaptively divides the image into regions with subinstance clues, and explicitly models the region-region and region-language dependencies. The proposed approach ReLA achieves new state-of-the-art performance on the both newly proposed GRES and classic RES tasks. The proposed gRefCOCO dataset and method are available at https://henghuiding.github.io/GRES.

Published in: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Date of Conference: 17-24 June 2023

Date Added to IEEE Xplore: 22 August 2023

ISBN Information:

ISSN Information:

DOI: 10.1109/CVPR52729.2023.02259

Conference Location: Vancouver, BC, Canada

Contents

1. Introduction

Referring Expression segmentation (RES) is one of the most important tasks of multi-modal information processing. Given an image and a natural language expression that describes an object in the image, RES aims to find this target object and generate a segmentation mask for it. It has great potential in many applications, such as video production, human-machine interaction, and robotics. Currently, most of the existing methods follow the RES rules defined in the popular datasets ReferIt [20] and RefCoco [34], [47] and have achieved great progress in recent years.

References is not available for this document.

MIT Libraries

MIT Libraries

GRES: Generalized Referring Expression Segmentation

Abstract:

Metadata

Abstract:

ISSN Information:

1. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

MIT Libraries

MIT Libraries

GRES: Generalized Referring Expression Segmentation

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

1. Introduction

References