I. Introduction
Text-to-image synthesis aims to synthesize a realistic image that is consistent with the textual description. It has extensive applications ranging from artistic creation to computer-aided design. Text-to-image synthesis is quite challenging in that it demands not only high quality of the synthesized image, but also cross-modality semantic consistency between the given text and the synthesized image.