Conferences >2022 IEEE International Confe...

GR-GAN: Gradual Refinement Text-To-Image Generation

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

A good Text-to-Image model should not only generate high quality images, but also ensure the consistency between the text and the generated image. Previous models failed ...Show More

Metadata

Abstract:

A good Text-to-Image model should not only generate high quality images, but also ensure the consistency between the text and the generated image. Previous models failed to simultaneously fix both sides well. This paper proposes a Gradual Refinement Generative Adversarial Network (GR-GAN) to alleviates the problem efficiently. A GRG module is designed to generate images from low resolution to high resolution with the corresponding text constraints from coarse granularity (sentence) to fine granularity (word) stage by stage, a ITM module is designed to provide image-text matching losses at both sentence-image level and word-region level for corresponding stages. We also introduce a new metric Cross-Model Distance (CMD) for simultaneously evaluating image quality and image-text consistency. Experimental results show GR-GAN significant outperform previous models, and achieve new state-of-the-art on both FID and CMD. A detailed analysis demonstrates the efficiency of different generation stages in GR-GAN.

Published in: 2022 IEEE International Conference on Multimedia and Expo (ICME)

Date of Conference: 18-22 July 2022

Date Added to IEEE Xplore: 26 August 2022

ISBN Information:

ISSN Information:

DOI: 10.1109/ICME52920.2022.9859791

Conference Location: Taipei, Taiwan

Funding Agency:

References is not available for this document.

Contents

1. Introduction

Text-to-Image synthesis aims to automatically generate images conditioned on text descriptions, which is one of the most popular and challenging multi-modal task. The task re-quires the generator not only generates high-quality images, but also preserve the semantic consistency between the text and the generated image. Generative Adversarial Networks (GANs) [1] have shown promising results on text-to-image generation by using the sentence vector as a conditional in-formation. Zhang et al. [2] proposes Stack-GAN++, which employed a multi-stage structure to improve image resolution stage by stage, and an unconditional loss besides a conditional loss at each stage. Xu et al. [3] proposes Attn-GAN with a module DAMSM to strengthen the consistency constraint on the generator. These models have achieved great improve-ments on the task, but the performances are still not satisfied, especially on complex scenes.

Select All

Scott Reed, Zeynep Akata, Xinchen Yan, Lajanugen Lo-geswaran, Bernt Schiele and Honglak Lee, "Gener-ative adversarial text to image synthesis", International Conference on Machine Learning. PMLR, pp. 1060-1069, 2016.

MIT Libraries

MIT Libraries

GR-GAN: Gradual Refinement Text-To-Image Generation

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

1. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?