Journals & Magazines >IEEE Transactions on Circuits... >Volume: 34 Issue: 11

Question Type-Aware Debiasing for Test-Time Visual Question Answering Model Adaptation

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

In Visual Question Answering (VQA), addressing language prior bias, where models excessively rely on superficial correlations between questions and answers, is crucial. T...Show More

Metadata

Abstract:

In Visual Question Answering (VQA), addressing language prior bias, where models excessively rely on superficial correlations between questions and answers, is crucial. This issue becomes more pronounced in real-world applications with diverse domains and varied question-answer distributions during testing. To tackle this challenge, Test-time Adaptation (TTA) has emerged, allowing pre-trained VQA models to adapt using unlabeled test samples. Current state-of-the-art models select reliable test samples based on fixed entropy thresholds and employ self-supervised debiasing techniques. However, these methods struggle with diverse answer spaces linked to different question types and may fail to identify biased samples that still leverage relevant visual context. In this paper, we propose Question type-guided Entropy Minimization and Debiasing (QED) as a solution for test-time VQA model adaptation. Our approach involves adaptive entropy minimization based on question types to improve the identification of fine-grained and unreliable samples. Additionally, we generate negative samples for each test sample and label them as biased if their answer entropy change rate significantly differs from positive test samples, subsequently removing them. We evaluate our approach on two public benchmarks, VQA-CP v2, and VQA-CP v1, and achieve new state-of-the-art results, with overall accuracy rates of 48.13% and 46.18%, respectively.

Published in: IEEE Transactions on Circuits and Systems for Video Technology ( Volume: 34, Issue: 11, November 2024)

Page(s): 10805 - 10816

Date of Publication: 05 June 2024

ISSN Information:

DOI: 10.1109/TCSVT.2024.3410041

Funding Agency:

Contents

I. Introduction

Visual question answering (VQA) is a prevalent and challenging multi-modal task that demands a strong grasp of visual context understanding [27] and linguistically-aware reasoning [5], [21], [43]. With the advancement of deep neural networks, VQA models have made significant strides in applications like human-robot interaction [42] and visual dialogs [44]. Typically, existing VQA models excel when they have access to large-scale training datasets and share similar data distributions between training and testing sets. However, these well-trained VQA models often struggle when faced with out-of-distribution scenarios in real-world applications [40]. In these cases, answer distributions diverge between the training and testing datasets. For instance, as depicted in Figure 1 (a), when we query a well-trained VQA model with “What color is her shirt?”, it might provide a biased answer like “black”, reflecting the answer distribution in the training dataset, while neglecting the true visual context. Therefore, it is crucial to appropriately adapt the deployed VQA model in one distinct scene to ensure optimal performance when facing distribution shifts in test samples. Fig. 1.

Test-time Adaptation for Visual Question Answering with Biased Dataset Distributions. (a) Illustration of biased training and testing subsets related to the question type “what color is” in the VQA-CP v2 [1] dataset. In the training subset, “black” constitutes a significant portion of the answers, whereas in the testing subset, “black” represents a small proportion. Current methods tend to predict the biased answer by capturing language biases within the training dataset. (b) Given the biased testing dataset and a pre-trained biased VQA model (e.g., UpDn [2]), test-time adaptation aims to enhance the VQA model’s out-of-distribution performance by leveraging unlabeled sequential testing samples.

References is not available for this document.

Question Type-Aware Debiasing for Test-Time Visual Question Answering Model Adaptation

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

I. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Question Type-Aware Debiasing for Test-Time Visual Question Answering Model Adaptation

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

I. Introduction

References