I. Introduction
From servers to the edge, and now to microcontroller-based devices, deep neural networks (DNNs) are in high demand. Microcontrollers dominate the computing engine market for small, low-cost or energy-efficient devices [1], [2]. Microcontrollers exist everywhere, from household appliances to cars, consumer electronics, wearables and so on [3]. It is estimated that 250 billion microcontrollers are already in use [4]. Microcontrollers are very resource-constrained. For example, STM32 F469I contains only 324KB SRAM and 2048KB flash on-chip memory. As a result, they have been widely regarded viable only for simple applications (e.g., keyword spotting [5]), rather than complex DNN models.