Loading [MathJax]/extensions/MathZoom.js
Consistency Control in Text-to-Image Diffusion Models | IEEE Conference Publication | IEEE Xplore

Consistency Control in Text-to-Image Diffusion Models


Abstract:

Recent Diffusion models are known for producing high-quality images. However, many applications necessitate precise control over various factors such as object appearance...Show More

Abstract:

Recent Diffusion models are known for producing high-quality images. However, many applications necessitate precise control over various factors such as object appearance, size, position, and the integration of multiple objects. Current techniques often fall short in addressing these requirements while ensuring consistent visual quality. To tackle this challenge, we propose a novel approach that connects object appearance with random features, allowing for enhanced control over appearance, position, and size. By fine-tuning a pre-trained text-to-image model, we incorporate control information through mask mapping and local feature classification loss. Our experimental results demonstrate that our method can effectively manage individual objects, facilitating flexible combinations while maintaining a consistent appearance across all elements.
Date of Conference: 25-27 October 2024
Date Added to IEEE Xplore: 12 February 2025
ISBN Information:
Conference Location: Xi'an, China

I. Introduction

Recent developments in text-to-image models have significantly enhanced the ability to generate high-quality images from natural language prompts [1] –[4]. The writing of text prompts, as shown in the first method in Figure 1, is a reasonable approach to crafting text prompts. These models possess a strong semantic understanding, having been trained on extensive datasets of images and their corresponding captions. However, challenges arise when generating images with multiple objects, as the quality can often diminish. Crafting precise prompts is essential, yet current models struggle to provide the flexibility needed to control the positioning and combinations of specific objects without impacting other elements. For example, adjusting a single object in an image can inadvertently disrupt other well-formed aspects of the scene, leading to consistency issues [5, 6].

Contact IEEE to Subscribe

References

References is not available for this document.