Conferences >2023 14th International Confe...

Generation of Royalty-Free Images with NLP-Based Text Encoding and Generative adversarial networks

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

This paper introduces the AttnGAN, an Attentional Generative Adversarial Network that facilitates detailed text-to-image generation through multi-stage refinement driven ...Show More

Metadata

Abstract:

This paper introduces the AttnGAN, an Attentional Generative Adversarial Network that facilitates detailed text-to-image generation through multi-stage refinement driven by attention. The AttnGAN employs a unique attentional generative network that generates images with precise details in different subregions by focusing on relevant words in the textual description. The detailed image-text similarity loss is calculated using a deep cognitive multimodal resemblance model to further improve the training of the generator. Furthermore, we have evaluated the proposed AttnGAN on two widely used datasets, namely CUB and COCO. The investigational outcomes indicate that our model can accurately synthesis high-quality images on the CUB dataset. However, the performance of our model is not as satisfactory on the COCO dataset, which poses a more challenging text-to-image generation task because of the complex and diverse scenes in the dataset. Nonetheless, our proposed attention mechanisms have demonstrated their effectiveness, especially for generating fine-grained details in the images.

Published in: 2023 14th International Conference on Computing Communication and Networking Technologies (ICCCNT)

Date of Conference: 06-08 July 2023

Date Added to IEEE Xplore: 23 November 2023

ISBN Information:

ISSN Information:

DOI: 10.1109/ICCCNT56998.2023.10307047

Conference Location: Delhi, India

Contents

I. Introduction

Image generation also referred to as computer graphics (CG), is a group of methods used to create the most realistic image possible within the constraints of time, money, skill sets, and available computer hardware. These images can be static, like a digital painting or a photograph, or dynamic, like an animated movie. This project includes the integration of a generator of StyleGAN and a discriminator of AttnGAN. The generator creates a picture that corresponds to a text description given as input. The generator can then improve its output thanks to the discriminator’s evaluation and feedback of the created image. StyleGAN uses its technique called "style-based synthesis" to separate the generation of image content from the generation of image styles like color, texture, and lighting of a face. It has ability to produce high-quality, controllable outputs has made it a popular tool. Attention Encoded Generative Adversarial Networks" (AttnGAN) is a type of Generative Adversarial Network (GAN) that is used in image generation, particularly in the field of NLP. AttnGAN uses a multi-level attention mechanism that attends to different parts of the textual description at different scales. This allows the network to collect all the features of the text and generate images that are more faithful to the description. Another important feature of AttnGAN is its use of a hierarchical structure to generate images. The generator first generates a low-resolution image and then refines it in a step-by-step manner to produce a higher-resolution output. This enables the network to produce images that possess both exceptional quality and seamless coherence with the textual description. AttnGAN has been utilized in several applications, counting the generation of realistic images from textual descriptions of birds, flowers, and other natural objects. Its capacity to produce pictures from written descriptions has important applications in fields such as e-commerce, where it can be used to generate product images from product descriptions. This project uses the COCO database and the flower database. The "COCO" database refers to the Common Objects in Context (COCO) dataset, which contains over 3,30,000 images, it is one of the biggest and fullest datasets for object comprehension and verification, including 80 different item categories.. Each image in the COCO dataset is annotated with object instance masks, object bounding boxes, and object category labels. This project is to avoid copyrights occurring because of using the same images. Image generation with captions is also important because it can help us to better understand how language and vision are related. By inputting a specific caption or description, unique and personalized images can be generated.

References is not available for this document.

Generation of Royalty-Free Images with NLP-Based Text Encoding and Generative adversarial networks

Abstract:

Metadata

Abstract:

ISSN Information:

I. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Generation of Royalty-Free Images with NLP-Based Text Encoding and Generative adversarial networks

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

I. Introduction

References