Conferences >2024 IEEE/CVF Winter Conferen...

Towards Addressing the Misalignment of Object Proposal Evaluation for Vision-Language Tasks via Semantic Grounding

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Object proposal generation serves as a standard preprocessing step in Vision-Language (VL) tasks (image captioning, visual question answering, etc.). The performance of o...Show More

Metadata

Abstract:

Object proposal generation serves as a standard preprocessing step in Vision-Language (VL) tasks (image captioning, visual question answering, etc.). The performance of object proposals generated for VL tasks is currently evaluated across all available annotations, a protocol that we show is "misaligned" - higher scores do not necessarily correspond to improved performance on downstream VL tasks. Our work serves as a study of this phenomenon and explores the effectiveness of semantic grounding to mitigate its effects. To this end, we propose evaluating object proposals against only a subset of available annotations, selected by thresholding an annotation importance score. Importance of object annotations to VL tasks is quantified by extracting relevant semantic information from text describing the image. We show that our method is consistent and demonstrates greatly improved alignment with annotations selected by image captioning metrics and human annotation when compared against existing techniques. Lastly, we compare current detectors used in the Scene Graph Generation (SGG) benchmark as a use case, which serves as an example of when traditional object proposal evaluation techniques are misaligned¹.

Published in: 2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)

Date of Conference: 03-08 January 2024

Date Added to IEEE Xplore: 09 April 2024

ISBN Information:

ISSN Information:

DOI: 10.1109/WACV57701.2024.00434

Conference Location: Waikoloa, HI, USA

Contents

1. Introduction

Vision-Language (VL) tasks are a growing topic in both the Natural Language Processing (NLP) and Computer Vision communities with the majority of techniques relying on object proposal generation for pre-processing [2], [51]. Object proposals are a set of regions or bounding boxes deemed likely to contain the object specified by a detector. Object proposal generation offers an explainable, efficient, and highly effective bridge between raw images and VL tasks.

References is not available for this document.

MIT Libraries

MIT Libraries

Towards Addressing the Misalignment of Object Proposal Evaluation for Vision-Language Tasks via Semantic Grounding

Abstract:

Metadata

Abstract:

ISSN Information:

1. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

MIT Libraries

MIT Libraries

Towards Addressing the Misalignment of Object Proposal Evaluation for Vision-Language Tasks via Semantic Grounding

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

1. Introduction

References