1. Introduction
The progress in image synthesis using text-guided diffusion models has attracted much attention due to their exceptional realism and diversity. Large-scale models [29], [32], [34] have ignited the imagination of multitudes of users, enabling image generation with unprecedented creative freedom. Naturally, this has initiated ongoing research efforts, investigating how to harness these powerful models for image editing. Most recently, intuitive text-based editing was demonstrated over synthesized images, allowing the user to easily manipulate an image using text only [18].