I. Introduction
Due to the popularity of various user terminals such as wearable medical devices, industrial intelligent sensors and smart vehicles, the new generation of intelligent applications such as smart healthcare [1], industry [2] and transportation [3] have become the focus of research and development in the fields of communication, computer and artificial intelligence (AI). At the same time, user-side data also ushered in explosive growth, which promoted the rapid development of data-driven AI technologies such as deep learning (DL) [4]. Building this kind of intelligent application needs to collect a large number of user data to a single spot, and through DL, these data are transformed into neural network models with image recognition, word processing and voice recognition functions [5], [6], [7]. Although this centralized training paradigm has performance advantages, it also brings potential risks on privacy, data property rights and single point of failure, which make it difficult to guarantee the reliability of the system. In addition to model performance, some metrics about quality of service (QoS), such as communication latency and energy consumption, are also important evaluation indicators of application competence [8].