I. Introduction
Deep learning is an exciting field where human activity can be simulated using a machine. Data is provided to the machine then the machine can learn from the pattern and relation and predict the output similar to the input data. Image caption generation is a problem in deep learning, where a machine can learn from input data and generate captions for images that describe the image the most. This image captioning system can solve many problems, i.e., blind people can benefit from this system, where an image can be described using a deep learning model and then converted to audio data.