I. Introduction
High-performance and energy-efficient system-on-a-chip (SoC) with tens to even hundreds of building blocks have gained a large amount of attention for emerging applications such as autonomous vehicles, drones, and robots. In such an architecture, network-on-chip (NoC) plays a key role to enable high-speed and energy-efficient data communication among blocks. As a result, the NoC may occupy up to 36% of the SoC power consumption [1]. To improve the energy-efficiency, dynamic voltage and frequency scaling (DVFS), adaptive clocking, timing resilient circuits design, and ultra-low voltage (ULV) operation techniques are widely adopted [2], [3]. However, when an NoC works in the ULV regime, it will exhibit severe delay variability across process, voltage, and temperature (PVT) variations, requiring a large amount of timing margin as a guard band. As a result, a large portion of the energy benefit in the ULV regime will be undermined.