I. Introduction
VQA[1] is a multi-discipline research problem that gained popularity from academic communities like natural language processing and computer vision because humans when looked at images, tend to see objects and understand how they interact with their properties and states. Visual question answering (VQA) is fascinating as it permits humans to recognize models they originally see[3]. The current approaches in VQA or grounding rely on concatenating vectors or applying elementwise sum or product. Researchers have utilized many techniques to build a VQA model. Deep learning methods can attain advanced results on challenging face recognition, object detection and object classification which are parts of computer vision. Deep learning algorithms have shown propitious results for VQA rather than other handcrafted algorithms and the sample example for the VQA system is shown in Fig. 1. For extracting the image features, the CNN algorithm in deep learning is used.
Example for vqa system