Conferences >2023 IEEE Intl Conf on Depend...

MERGE: Multi-Entity Relational Reasoning Based Explanation in Visual Question Answering

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

To handle VQA tasks in complex scenarios involving multiple entities and obtain reliable explanations, models need to fully understand the high-level semantic information...Show More

Metadata

Abstract:

To handle VQA tasks in complex scenarios involving multiple entities and obtain reliable explanations, models need to fully understand the high-level semantic information of visual and textual features. Existing VQA methods usually lack the exploration of entity relation features, resulting in insufficient answer prediction accuracy and generated explanations that are not sufficiently relevant to the image and question. To address this issue, we use visual relational reasoning to enhance the overall understanding of image scenes and improve the accuracy of predicted answers and explanations. Our proposed method, named Multi-Entity Relational Reasoning based Explanation (MERGE), leverages the construction of action, spatial, and attribute relations among the question-related entities in images. The contextual visual features are encoded through a graph attention mechanism and fused with question and answer embeddings to generate more accurate textual explanations. To validate the effectiveness of our method, we conducted extensive experiments on seven datasets, including VQA-CP, VQA-X, and CLEVR-X. The results demonstrate improved answer accuracy and high-quality explanations. Furthermore, our results show that the supervisory role of explanations can quantitatively improve the accuracy of answer prediction.

Published in: 2023 IEEE Intl Conf on Dependable, Autonomic and Secure Computing, Intl Conf on Pervasive Intelligence and Computing, Intl Conf on Cloud and Big Data Computing, Intl Conf on Cyber Science and Technology Congress (DASC/PiCom/CBDCom/CyberSciTech)

Date of Conference: 14-17 November 2023

Date Added to IEEE Xplore: 25 December 2023

ISBN Information:

ISSN Information:

DOI: 10.1109/DASC/PiCom/CBDCom/Cy59711.2023.10361383

Conference Location: Abu Dhabi, United Arab Emirates

Contents

I. Introduction

Visual Question Answering(VQA) is an advanced task that combines natural language understanding and computer vision, which aims to predict answers based on images and corresponding questions. Therefore, compared to traditional computer vision tasks such as object detection [1], VQA models need the ability to handle more complex image-text inference. The VQA task needs to jointly analyze multimodal features from vision and text, which has also gained increasing attention.

References is not available for this document.

MIT Libraries

MIT Libraries

MERGE: Multi-Entity Relational Reasoning Based Explanation in Visual Question Answering

Abstract:

Metadata

Abstract:

ISSN Information:

I. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

MIT Libraries

MIT Libraries

MERGE: Multi-Entity Relational Reasoning Based Explanation in Visual Question Answering

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

I. Introduction

References