I. Introduction
The recent rapid advancement in hardware of edge accelerators has led to the deployment of cloud scale machine learning (ML) inferences to run locally, near sensory data sources such as cameras and microphones, without the need for sending large volumes of data to remote data centers. This leads to improved efficiency, latency, and throughput.