I. Introduction
Fueled by the massive influx of data and advanced algorithms, modern deep neural network (DNN) has surprisingly benefited Internet-of-Things (IoT) applications in a spectrum of domains [2], [3], including visual detection, smart security, audio analytics, health monitoring, infrastructure inspection, etc. In recent years, enabling efficient integration of DNNs and IoT is receiving increasing attention from both academia and industry. DNN-driven applications typically have a two-phase paradigm: 1) a training phase wherein a model is trained using a training data set and 2) an inference phase wherein the trained model is used to output results (e.g., prediction, decision, and recognition) for a piece of input data. With regard to the deployment on IoT devices, the inference phase is mainly adopted to process data collected on the fly. Given the fact that complex DNN inference tasks can contain a large number of computational operations, their execution on resource-constrained IoT devices becomes challenging, especially when time-sensitive tasks are taken into consideration. For example, a single inference task using popular DNN architectures (e.g., AlexNet [4], FaceNet [5], and ResNet [6]) for visual detection can require billions of operations. Moreover, many IoT devices are powered by battery, which will be quickly drained by executing these complex DNN inference tasks. To soothe IoT devices from heavy computation and energy consumption, outsourcing complex DNN inference tasks to public cloud computing platforms has become a popular choice in the literature. However, this type of “cloud-backed” system can raise privacy concerns when the data sent to remote cloud servers contain sensitive information [7], [8].