1. Introduction
As processors become more complex and incorporate additional computational resources, aggressively optimizing compilers become critical. This dependence on compiler support is especially pronounced in non-uniform-resource, explicitly-parallel platforms like the Intel Itanium, Philips TriMedia, and Equator MAP/CA [1], [2], [3]. In these and other complex architectures, the compiler can no longer rely on simple metrics, such as instruction count, to guide optimization. Instead, the compiler must carefully balance execution resource utilization, register usage, and dependence height while attempting to minimize any unnecessary stalls due to dynamic effects such as cache misses and branch mispredictions.