I. Introduction
Along with the prosperity of artificial intelligence (AI), deep learning (DL) techniques are increasingly deployed in Internet-of-Things (IoT) domains, such as commercial surveillance, autonomous driving, and robotics, where the requirements of model prediction accuracy and real-time response are of crucial importance [1]–[3]. However, due to limited computing and memory resources of AI IoT (AIoT) devices, both requirements cannot be guaranteed [4]–[6]. To extend the processing capabilities of AIoT devices for higher prediction accuracy, more and more AIoT applications rely on cloud computing by offloading some part of computation-intensive tasks to remote cloud servers. Therefore, the combination of strong cloud computing platforms and weak AIoT devices is becoming an emerging paradigm for large-scale AIoT design [7]–[9]. However, no matter whether the role of cloud servers in AIoT applications is for DL model training or inference, it is required that AIoT devices need to send their private data to cloud servers, where the concerns of user privacy and network latency cannot be neglected.