Loading [MathJax]/extensions/MathZoom.js
UAV-VQG: Visual Question Generation Framework on UAV Images | IEEE Conference Publication | IEEE Xplore

UAV-VQG: Visual Question Generation Framework on UAV Images


Abstract:

Visual Question Generation (VQG) is one of the most challenging problems since it aims to produce relevant and meaningful questions from images. As the VQG process is abl...Show More

Abstract:

Visual Question Generation (VQG) is one of the most challenging problems since it aims to produce relevant and meaningful questions from images. As the VQG process is able to generate a diverse set of questions that do not exist in the training set, various models (e.g., visual question answering) can be beneficial from this question generation task by evaluating the model’s performance in unknown settings. In this paper, we explored the visual question generation task on images collected by an unmanned aerial vehicle (UAV). We highlight the significant role of the question generation task and present a variational attention-based model that focuses on creating diversified and meaningful questions from images. In comparison to baseline approaches, our presented method has demonstrated the ability to create a broad and meaningful set of questions.
Date of Conference: 15-18 December 2021
Date Added to IEEE Xplore: 13 January 2022
ISBN Information:
Conference Location: Orlando, FL, USA
Department of Information Systems, Bina Lab, University of Maryland, Maryland, USA
Department of Information Systems, Bina Lab, University of Maryland, Maryland, USA

I. Introduction

Multi-modal approaches are gaining attention in recent times due to their capability of representing distributed information across multiple modalities. Multi-modal rendition is essential where a single modality cannot signify the objective of the task. Providing image information through the answer (e.g. visual question answering) [1]–[3], describing a scene in natural language (e.g. image captioning) [4]–[6], generating diverse, meaningful, and goal-oriented questions (e.g. visual question generation) [7]–[9] or some real-life applications [10], [11] are some of the applications of multi-modal tasks. Visual question generation (VQG) is a generative model that focuses on generating questions from images. An immense effort to include visual question generation tasks in many applications [7], [12] has been addressed in recent times. The main purpose of the VQG process is to generate task-oriented, relevant, and diverse questions from images. Question generation is not a one-to-one function, meaning that for an image, our objective is not to predict a specific question. Rather, a set of possible questions can be generated from a single image. Generated questions should be task-specific in the sense that there should be flexibility in the model that allows generating questions conditioned by answer types. For instance, if we want our model to generate questions that will provide answers in binary form, then questions will follow a specific structure (e.g. questions starting with "is" or "do/did") or if we want our model to generate questions that will provide the frequency (counting number) in responses, questions should start with "how many". Subsequently, the generated questions should have relevancy to the objective of the task. Questions like "how many objects are there in the image?" and "what is the color of the object?" are very generic. These less informative questions will not add sense to the question generation process. To bring out a high-level understanding of a scene through an answer, the model should focus on generating relevant questions. finally, diversity in the VQG task enables the model to generate questions that do not exist in the training data. These unseen settings certainly permit many models [1], [7] to examine the robustness in terms of performance.

Department of Information Systems, Bina Lab, University of Maryland, Maryland, USA
Department of Information Systems, Bina Lab, University of Maryland, Maryland, USA
Contact IEEE to Subscribe

References

References is not available for this document.