1. Introduction
Deep generative models have immense potential for creative expression, yet their controllability remains limited. While diffusion models excel in synthesizing high-quality and diverse outputs, their fine-grained attribute control, especially in visual style, is often limited by coarse-grained inputs such as textual descriptions [35], structural visual cues [53] or style transfer [5], [55]. As shown in Fig. 2, these inputs present significant limitations: (i) they restrict the nuances that can be inherited from style inputs, (ii) without specifically disentangling both attributes, they hinder the model’s ability to distinguish between content and style information. In contrast, visual search models often use parametric style embeddings to achieve this more nuanced control. Leveraging such embeddings for guiding image synthesis, we propose Parametric Style Control (PARASOL) to bridge this gap. PARASOL is a novel synthesis model that enables disentangled parametric control over the fine-grained visual style and content of an image, conditioning synthesis on both a semantic cue and a fine-grained visual style embedding [37]. We show how the use of parametric style embeddings also enable various applications, including (i) interpolation of multiple contents and/or styles (Fig. 1), (ii) refining generative search. Additionally, for enhanced user control, we introduce test-time features in our pipeline that enable more control over the influence of each attribute on the output. Our approach holds relevance in real-world contexts such as fashion design, architectural rendering, and personalized content creation, where precise control over image style and content is essential for creative expression and practical utility. Thus, our technical contributions are: