Conferences >2024 IEEE/CVF Winter Conferen...

ArtQuest: Countering Hidden Language Biases in ArtVQA

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

The task of Visual Question Answering (VQA) has been studied extensively on general-domain real-world images. Transferring insights from general domain VQA to the art dom...Show More

Metadata

Abstract:

The task of Visual Question Answering (VQA) has been studied extensively on general-domain real-world images. Transferring insights from general domain VQA to the art domain (ArtVQA) is non-trivial, as the latter requires models to identify abstract concepts, details of brushstrokes and styles of paintings in the visual data as well as possess background knowledge about art. This is exacerbated by the lack of high-quality datasets. In this work, we shed light on hidden linguistic biases in the AQUA dataset, which is the only publicly available benchmark dataset for ArtVQA. As a result, the majority of questions can be answered without consulting the visual information, making the “V” in ArtVQA rather insignificant. In order to counter this problem, we create a simple, yet practical dataset, ArtQuest, using structured information from the SemArt collection. Our dataset and the pipeline to reproduce our results are publicly available at https://github.com/bletib/artquest.

Published in: 2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)

Date of Conference: 03-08 January 2024

Date Added to IEEE Xplore: 09 April 2024

ISBN Information:

ISSN Information:

DOI: 10.1109/WACV57701.2024.00716

Conference Location: Waikoloa, HI, USA

Contents

1. Introduction

The emergence of large foundation models has led to notable improvements in multimodal vision-language understanding tasks such as visual question answering (VQA; [8], [20], [32]). While these models have been extensively studied for general-domain tasks on generic real-world images, their capabilities in understanding specific domains such as art remains unclear. Art is a fundamental aspect of human culture, and art museums are visited by many millions of people every year. Thus, achieving visual question answering in the art domain (ArtVQA) is an important step towards conversational systems that can guide and assist people by addressing their information needs. Imagine encountering an interesting artwork and wondering who created it or in which time-frame it was created. ArtVQA can emit the answer to this, given a photo of the artwork and the relevant question in natural language. Furthermore, these systems may facilitate art education by acting as a study assistant.

References is not available for this document.

ArtQuest: Countering Hidden Language Biases in ArtVQA

Abstract:

Metadata

Abstract:

ISSN Information:

1. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

ArtQuest: Countering Hidden Language Biases in ArtVQA

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

1. Introduction

References