Loading [MathJax]/extensions/MathMenu.js
PARASOL: Parametric Style Control for Diffusion Image Synthesis | IEEE Conference Publication | IEEE Xplore

PARASOL: Parametric Style Control for Diffusion Image Synthesis


Abstract:

We propose PARASOL, a multi-modal synthesis model that enables disentangled, parametric control of the visual style of the image by jointly conditioning synthesis on both...Show More

Abstract:

We propose PARASOL, a multi-modal synthesis model that enables disentangled, parametric control of the visual style of the image by jointly conditioning synthesis on both content and a fine-grained visual style embedding. We train a latent diffusion model (LDM) using specific losses for each modality and adapt the classifer-free guidance for encouraging disentangled control over independent content and style modalities at inference time. We leverage auxiliary semantic and style-based search to create training triplets for supervision of the LDM, ensuring complementarity of content and style cues. PARASOL shows promise for enabling nuanced control over visual style in diffusion models for image creation and stylization, as well as generative search where text-based search results may be adapted to more closely match user intent by interpolating both content and style descriptors.
Date of Conference: 17-18 June 2024
Date Added to IEEE Xplore: 27 September 2024
ISBN Information:

ISSN Information:

Conference Location: Seattle, WA, USA

1. Introduction

Deep generative models have immense potential for creative expression, yet their controllability remains limited. While diffusion models excel in synthesizing high-quality and diverse outputs, their fine-grained attribute control, especially in visual style, is often limited by coarse-grained inputs such as textual descriptions [35], structural visual cues [53] or style transfer [5], [55]. As shown in Fig. 2, these inputs present significant limitations: (i) they restrict the nuances that can be inherited from style inputs, (ii) without specifically disentangling both attributes, they hinder the model’s ability to distinguish between content and style information. In contrast, visual search models often use parametric style embeddings to achieve this more nuanced control. Leveraging such embeddings for guiding image synthesis, we propose Parametric Style Control (PARASOL) to bridge this gap. PARASOL is a novel synthesis model that enables disentangled parametric control over the fine-grained visual style and content of an image, conditioning synthesis on both a semantic cue and a fine-grained visual style embedding [37]. We show how the use of parametric style embeddings also enable various applications, including (i) interpolation of multiple contents and/or styles (Fig. 1), (ii) refining generative search. Additionally, for enhanced user control, we introduce test-time features in our pipeline that enable more control over the influence of each attribute on the output. Our approach holds relevance in real-world contexts such as fashion design, architectural rendering, and personalized content creation, where precise control over image style and content is essential for creative expression and practical utility. Thus, our technical contributions are:

Contact IEEE to Subscribe

References

References is not available for this document.