I. Introduction
With the rapid development of 5G, Mobile Edge Computing (MEC) has become a promising computing paradigm for delay-sensitive and compute-intensive AI tasks [1]. In MEC scenarios, AI models such as Convolutional Neural Networks (CNNs) are deployed on edge nodes with computing power, while User Equipment (UE) unloads multimedia data to those nodes for model reasoning and obtains the reasoning results [2]. Compared with cloud computing, MEC has a shorter delay since task computing is performed completely at the edge, avoiding the transmission delay of sending data to the cloud. However, the resources of edge nodes are usually limited. When deploying CNNs, MEC networks face the problem of insufficient node computing power and memory resources. As is shown in Figure 1, an effective solution is to split AI tasks into multiple phases and calculate them on different edge nodes, and make up for the insufficient single-node computing power by utilizing the collaboration of edge nodes. Nevertheless, this method generates extra data transmission delay because the intermediate results of the calculation (i.e. tensors) need to be transmitted between edge nodes. When the tasks to be processed on each node are unevenly distributed, high-load nodes will have large queuing delays, which directly affects users’ Quality of Experience (QoE).