I. Introduction
With the rapid development of IoT technology, massive amounts of data are generated in distributed sensors and widespread mobile devices [1]. Due to the computing power and resource limitations of IoT devices, it is impossible to process all the data in the local devices. Instead, the traditional cloud-based architecture is widely used with its strong computation power to handle mage-scale data and complex tasks such as model training in deep neural networks. However, uploading data of such a large scale to the cloud may induce serious transmission burdens, which results in unacceptable response delay and expensive energy cost.