1. Introduction
Generating high-resolution images with Generative Artificial Intelligence (GenAI) models has demonstrated remarkable potential [1], [19], [23]. However, these capabilities are increasingly centralised. Training high-resolution image generation models requires substantial capital investments in hardware, data, and energy that are beyond the reach of indi-vidual enthusiasts and academic institutions. For example, training Stable Diffusion 1.5, at a resolution of 5122, en-tails over 20 days of training on 256 A100 GPUs [1]. Companies that make these investments understandably want to recoup their costs and increasingly hide the resulting models behind paywalls. This trend toward centralisation and pay-per-use access is accelerating as GenAI image synthe-sis advances in quality since the investment required to train image generators increases rapidly with image resolution.
Selected landscape samples of DemoFusion versus SDXL [24] (all images in the figure are presented at their actual sizes). SDXL can synthesize images up to a resolution of 10242, while DemoFusion extends SDXL to generate images at 4×, 16×, and even higher resolutions without any fine-tuning or prohibitive memory demands. All generated images are produced using a single RTX 3090 GPU. Best viewed ZOOMED-IN.