Journals & Magazines >IEEE Transactions on Circuits... >Volume: 33 Issue: 12

Layout-Bridging Text-to-Image Synthesis

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

The crux of text-to-image synthesis stems from the difficulty of preserving the cross-modality semantic consistency between the input text and the synthesized image. Typi...Show More

Metadata

Abstract:

The crux of text-to-image synthesis stems from the difficulty of preserving the cross-modality semantic consistency between the input text and the synthesized image. Typical methods, which seek to model the text-to-image mapping directly, could only capture keywords in the text that indicates common objects or actions but fail to learn their spatial distribution patterns. An effective way to circumvent this limitation is to generate an image layout as guidance, which is attempted by a few methods. Nevertheless, these methods fail to generate practically effective layouts due to the diversity of input text and object location. In this paper we push for effective modeling in both text-to-layout generation and layout-to-image synthesis. Specifically, we formulate the text-to-layout generation as a sequence-to-sequence modeling task, and build our model upon Transformer to learn the spatial relationships between objects by modeling the sequential dependencies between them. In the stage of layout-to-image synthesis, we focus on learning the textual-visual semantic alignment per object in the layout to precisely incorporate the input text into the layout-to-image synthesizing process. To evaluate the quality of generated layout, we design a new metric specifically, dubbed Layout Quality Score, which considers both the absolute distribution errors of bounding boxes in the layout and the mutual spatial relationships between them. Extensive experiments on three datasets demonstrate the superior performance of our method over state-of-the-art methods on both predicting the layout and synthesizing the image from the given text.

Published in: IEEE Transactions on Circuits and Systems for Video Technology ( Volume: 33, Issue: 12, December 2023)

Page(s): 7438 - 7451

Date of Publication: 08 May 2023

ISSN Information:

DOI: 10.1109/TCSVT.2023.3274228

Funding Agency:

No metrics found for this document.

Contents

I. Introduction

Text-to-image synthesis aims to synthesize a realistic image that is consistent with the textual description. It has extensive applications ranging from artistic creation to computer-aided design. Text-to-image synthesis is quite challenging in that it demands not only high quality of the synthesized image, but also cross-modality semantic consistency between the given text and the synthesized image.

Usage

Select a Year

View as

Total usage sinceMay 2023:464

Year Total:37

Data is updated monthly. Usage includes PDF downloads and HTML views.

Citations

Crossref^®

Search for
Citations in
Google Scholar^®

References is not available for this document.

Layout-Bridging Text-to-Image Synthesis

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

I. Introduction

View as

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Layout-Bridging Text-to-Image Synthesis

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

I. Introduction

View as

References

IEEE Account

Purchase Details

Profile Information

Need Help?