I. Introduction
We are entering the “killer microsecond” era in data center applications [1]. Due to advances in processor, memory, storage, and networking technologies, events that stall execution increasingly fall in a microsecond-scale latency range. Accesses to emerging storage-class memories [2]–[9], rack-scale memory disaggregation [10]–[14], 100+ gigabit network communication [15], and accelerator/GPU micro-offloads [16]–[18] are example program activities that incur microsecond delays.