I. Introduction
Reaching Exascale compute performance and beyond at an affordable budget requires increasingly heterogeneous high performance computing (HPC) systems. Hence, in recent years, highly specialized processors suited for different tasks and use cases have been developed, including graphics processing units (GPUs), tensor processing units (TPUs) and quantum processing units (QPUs). The Modular Supercomputing Architecture (MSA) [1], [2] breaks with traditional HPC systems and exploits these heterogeneous resources by integrating them at the system level. An MSA system consists of a collection of modules with different architecture and/or performance characteristics and can supply any combination or ratio of resources across modules. It is not bound to fixed associations between, for instance, CPUs and accelerators as will be found in clusters of heterogeneous nodes and is therefore ideal for HPC centers running a heterogeneous work-load mix. The goal is to provide cost-effective computing at extreme performance scales fitting the needs of a wide range of computational sciences. Each application run can dynamically decide which kinds and how many nodes to use, mapping its intrinsic requirements and concurrency patterns onto the hardware (as depicted in Figure 1), therefore improving the parallel efficiency, time to solution and energy use.