Conferences >2023 International Joint Conf...

Generating Questions via Unexploited OCR Texts: Prompt-Based Data Augmentation for TextVQA

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Text-based Visual Question Answering (TextVQA) tasks rely on Optical Character Recognition (OCR) text to answer. There have been many models successfully exploring multi-...Show More

Metadata

Abstract:

Text-based Visual Question Answering (TextVQA) tasks rely on Optical Character Recognition (OCR) text to answer. There have been many models successfully exploring multi-modal features fusing and knowledge reasoning. However, current TextVQA datasets are few and the cost of using manual annotation is too high. So generating pseudo-labeled data is a better choice. In this paper, a prompt-based data augmentation method is proposed. The problems of current data augmentation are solved: 1) the distribution of the number of answer words in the pseudo-labeled data is not consistent with the real dataset. 2) the question forms in the pseudo-labeled data are not diverse. Specifically, prompt words are first matched to the constraints in the questions by finding the same words in the vocabulary. So, our generating model can generate different types of questions when the different prompt words are input. Experiments show that our method is significantly better than other state-of-the-art methods on TextVQA.

Published in: 2023 International Joint Conference on Neural Networks (IJCNN)

Date of Conference: 18-23 June 2023

Date Added to IEEE Xplore: 02 August 2023

ISBN Information:

ISSN Information:

DOI: 10.1109/IJCNN54540.2023.10191606

Conference Location: Gold Coast, Australia

Contents

I. Introduction

The Visual Question Answering (VQA) task is an image-text multi-modal question answering task proposed by Agrawal in 2015 [1]. The task requires multimodal models with the ability to read, understand, fuse, and reason. In natural scenes, textual information appears in many places, such as car licenses, store names, and clothing logos. These OCR texts are indispensable supplementary information for visual question answering. This kind of task using OCR text is called Text-based Visual Question Answering [2]. An example is shown in Fig. 1.

References is not available for this document.

MIT Libraries

MIT Libraries

Generating Questions via Unexploited OCR Texts: Prompt-Based Data Augmentation for TextVQA

Abstract:

Metadata

Abstract:

ISSN Information:

I. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

MIT Libraries

MIT Libraries

Generating Questions via Unexploited OCR Texts: Prompt-Based Data Augmentation for TextVQA

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

I. Introduction

References