Loading [MathJax]/extensions/MathMenu.js
GR-GAN: Gradual Refinement Text-To-Image Generation | IEEE Conference Publication | IEEE Xplore

GR-GAN: Gradual Refinement Text-To-Image Generation


Abstract:

A good Text-to-Image model should not only generate high quality images, but also ensure the consistency between the text and the generated image. Previous models failed ...Show More

Abstract:

A good Text-to-Image model should not only generate high quality images, but also ensure the consistency between the text and the generated image. Previous models failed to simultaneously fix both sides well. This paper proposes a Gradual Refinement Generative Adversarial Network (GR-GAN) to alleviates the problem efficiently. A GRG module is designed to generate images from low resolution to high resolution with the corresponding text constraints from coarse granularity (sentence) to fine granularity (word) stage by stage, a ITM module is designed to provide image-text matching losses at both sentence-image level and word-region level for corresponding stages. We also introduce a new metric Cross-Model Distance (CMD) for simultaneously evaluating image quality and image-text consistency. Experimental results show GR-GAN significant outperform previous models, and achieve new state-of-the-art on both FID and CMD. A detailed analysis demonstrates the efficiency of different generation stages in GR-GAN.
Date of Conference: 18-22 July 2022
Date Added to IEEE Xplore: 26 August 2022
ISBN Information:

ISSN Information:

Conference Location: Taipei, Taiwan

Funding Agency:


1. Introduction

Text-to-Image synthesis aims to automatically generate images conditioned on text descriptions, which is one of the most popular and challenging multi-modal task. The task re-quires the generator not only generates high-quality images, but also preserve the semantic consistency between the text and the generated image. Generative Adversarial Networks (GANs) [1] have shown promising results on text-to-image generation by using the sentence vector as a conditional in-formation. Zhang et al. [2] proposes Stack-GAN++, which employed a multi-stage structure to improve image resolution stage by stage, and an unconditional loss besides a conditional loss at each stage. Xu et al. [3] proposes Attn-GAN with a module DAMSM to strengthen the consistency constraint on the generator. These models have achieved great improve-ments on the task, but the performances are still not satisfied, especially on complex scenes.

Contact IEEE to Subscribe

References

References is not available for this document.