I. Introduction
In recent years, artificial intelligence-generated content (AIGC) has emerged as a novel approach to the production, manipulation, and modification of data [1]. By utilizing AI technologies, AIGC automates content generation alongside traditionally professionally-generated content (PGC) and user-generated content (UGC) [2], [3], [4]. With the marginal cost of data creation reduced to nearly zero, AIGC, e.g., ChatGPT [5], promises to supply a vast amount of synthetic data for AI development and the digital economy, offering significant productivity and economic value to society. The rapid growth of AIGC capabilities is driven by the continuous advancements in AI technology, particularly in the areas of large-scale and multimodal models [6], [7]. A prime example of this progress is the development of the transformer-based DALL-E [8] which is designed to generate images by predicting successive pixels. In its latest iteration, DALL-E2 [9], a diffusion model is employed to reduce noise generated during the training process, leading to more refined and novel image generation. In the context of text-to-image generation using generative AI models, the language model serves as a guide, enhancing semantic coherence between the input prompt and the resulting image. Simultaneously, the generative AI model processes existing image attributes and components, generating limitless synthesis images from existing datasets.