Introduction
In recent years, with the continuous development of 5G networks, the number of multimedia services and smart terminals in mobile networks has increased rapidly, leading to a significant increase in mobile data volume [1]. According to Cisco's latest forecast report [2], much of the new significant traffic will originate from mobile multimedia services, which will rapidly increase as a percentage of total traffic due to the sheer volume of data. In 2017, mobile multimedia services accounted for 59% of all mobile data traffic. By 2023, this figure will jump to 79%.
Ultra-low latency, intensive computing and massive transmission are the distinctive features of most mobile mul-timedia services, for example, webcasting, virtual reality services (VR), augmented reality services (AR), cloud computers, and online games. These mobile multimedia services often have high requirements on network latency, bandwidth and computing power. Meanwhile, due to the amount of traf-fic and computing on mobile users and devices increasing dramatically, multimedia devices need to handle many in-tensive mobile multimedia tasks such as video compression and transcoding [3], [4]. The huge amount of computation caused by intensive computing tasks puts a lot of pressure on users. However, due to the limited computing resources and storage capacity of multimedia devices, these devices cannot handle tasks locally with low latency as well as low power consumption.
In view of these problems, Mobile Cloud Computing (MCC) has been proposed as a solution, where large amounts of data are centralized in cloud servers to alleviate the burden on local devices [5]. However, traditional cloud computing suffers from problems such as high latency, high load, and core network congestion, as cloud servers are typically de-ployed at a distance from multimedia devices. In contrast, Mobile Edge Computing (MEC) offers a promising approach by deploying edge servers at edge nodes or base stations. This enables mobile terminals to offload their computing tasks to nearby edge nodes for processing, using wireless channels to reduce task processing delays, improve network utilization efficiency, and enhance Quality of Service [6]. Nevertheless, compared to cloud computing, edge computing is limited by offload decisions, wireless resources and computing resources. Wireless resources mainly include bandwidth and transmitting power. Computational resources generally refer to the CPU frequency of local mobile devices and edge servers. To fully leverage the advantages of the MEC paradigm, there is a need for joint optimization of offloading decisions, communication, and computational re-sources. This presents a major problem in wireless networks between user devices and MEC servers.
To address this challenge, several studies have investigated the joint allocation of wireless and computational resources in MEC systems [7]–[11]. The Lyapunov op-timization methods, online dynamic task scheduling, and game theory has been proposed to solve the problems of joint wireless resources, computational resources, and of-fioading decisions. The authors in Ref. [7] proposed a local compressed offload model to solve the resource allocation problem of multi-user mobile edge computing offload systems. In Ref. [8], the authors proposed a Lyapunov optimization based approach to study the task assignment scheduling scheme for maximum power consumption and execution delay in MEC systems with energy harvesting capability. The authors in Ref. [9] considered a heuristic algorithm for solving joint resource allocation decisions to minimize the time delay. The authors in Ref. [10] investigated a compu-tational resource allocation scheme based on potential game theory to reduce the energy consumption of MEC networks and improve the efficiency of computational resources. In Ref. [11], the paper proposed a suboptimal resource allocation algorithm that generates priorities for users based on their channel gain and locally calculated energy consumption, and implements different offloading schemes for different priorities to minimize the weighted sum of delay and energy consumption. However, these algorithms are usually time-consuming and computationally intensive in complex MEC networks because they need to constantly resolve the problem in a time-varying MEC network environment.
Deep reinforcement learning has become a trend as an approach in solving optimization problems in MEC systems, in recent years [12]–[17]. DRL can adjust its strategy in unstable environments and can adapt to complex MEC sce-narios by making different actions with its intelligences. The DRL agents can make adaptive offloading decisions and re-source allocation through the different actions it makes. In Ref. [12], the authors proposed a distributed machine learning approach that makes it possible for DRL to perform online offloading in an MEC environment. The authors in Ref. [13] considered a DRL-based video offload scheme to maximize its long-term performance. The authors in Ref. [14] studied a temporal attentional deterministic policy gradient based on a deep reinforcement learning algorithm called Deep Deterministic Policy Gradient (DDPG) to solve the joint optimization problem of computational offloading and resource allocation in MEC. Ref. [15], this paper pro-posed a DRL-based offloading scheme to enhance the utility of multimedia devices in dynamic MEC. Simulation results demonstrate that the DRL scheme reduces energy consumption, computational experiments and task failure rate. The authors in Ref. [16] proposed a DRL-based offloading frame-work that can be adaptive to the common patterns behind various applications to infer the optimal offloading strat-egy for different scenarios. Ref. [17], the authors propose an advanced deep learning based computational offloading algorithm for multistage vehicle edge cloud computing networks to minimize the total time and energy cost of the whole system. Although DRL is very resilient in complex MEC networks, because most DRL learn in a centralized manner, the required action space and configuration of parameters explode when multimedia devices are added, which directly leads to less efficient training and easier privacy disclosure. To solve this problem, Federated learning (FL) is proposed to optimize MEC networks [18].
Federated learning is a distributed machine learning that enables distributed multiple device nodes to co-communicate and participate in the aggregation of global models. Different devices can perform local model training separately, communicate with each other through federated learning and upload model parameters from local model training for global model aggregation. Federated learning allows the exchange of model parameters without sharing raw data and enhances the collaboration capability of multi-ple distributed devices and protects the privacy and security of the devices.
Several studies have investigated the resource allocation and computational offloading problems involved in FL for two optimization objectives based on system latency and energy consumption minimization [19], [20]. The authors in Ref. [19] minimized the value of the FL loss function by optimizing the joint resource allocation and UE selection, and satisfied both the latency and energy consumption requirements for performing FL. The authors in Ref. [20] proposed an alternative directional algorithm formulating the joint optimization of CPU frequency and power control as a nonlinear programming (NLP) problem to solve the problem of minimizing the energy consumption of all mul-timedia devices subject to federated learning time require-ments. References [21]–[23] focus on the combined learning of federation learning and deep reinforcement learning, i.e., training local DRL models and then integrating them to-gether to develop a comprehensive global DRL model. The authors in Ref. [21] proposed ajoint optimization scheme for optimal path selection and power allocation based on the federal deep Q-network learning algorithm, which maximizes network throughput while ensuring power constraints and mobility constraints, taking into account communication re-sources, but without considering a reasonable allocation of computational resources. In [22], this paper considered a multimodal deep reinforcement learning framework based on hybrid policies and proposes an online joint collaboration algorithm in combination with FL and validates the performance of the algorithm, however, the intelligent body agent in this work does not undertake some resource allocation operations such as allocation of power, computational offloading of tasks. The authors in [23] proposed a federate cooperative caching framework based on deep reinforcement learning but the work did not take into account task offloading.
We compare the objectives and resource optimization of ourstudy with some related work in the MEC systems, the resultsof which are shown in Table 1. It is obvious that our study can overcome the shortcomings of many previous works.
For mobile multimedia devices, their limited computing resources and battery capacity may hinder efficient task completion. In such cases, of Hoading tasks to edge or cloud servers becomes necessary. The of Hoading decision made by the multimedia device plays a critical role in controlling the overall MEC system overhead and ensuring a good user experience. Additionally, task of Hoading consumes wire-less channel resources, necessitating reasonable allocation of these resources in MEC systems. In this paper, we propose an adaptive of Hoading framework based on federated deep reinforcement learning to jointly optimize transmit power, computational resources, and of Hoading decisions, with the aim of minimizing delay and energy consumption for mobile multimedia devices in task completion. The contributions of this paper can be summarized as follows:
We transform the optimization problem into a multi-objective optimization problem with the objective of minimizing the weighted sum of delay and energy consumption required by the system to perform the task. To solve this complex problem, we jointly allocate compu-tational and communication resources and transform the nonlinear planning problem into a federated deep reinforcement learning problem for multiple intelligent agents.
For multimedia devices, the changing location and different kinds of multimedia task of devices cause non-IID data. To reduce the impact of non-IID data. In this paper, we propose a mechanism for selection of participating federal learning devices. To ensure the communication overhead as well as convergence of FL learning.
We design an adaptive of Hoading algorithm based on FL and DRL, which jointly allocates computational re-sources and task of Hoading, which not only increases the overall scalability of the system but also accelerates the learning speed of deep reinforcement learning. It maintains relatively stable performance in the complex MEC network environment and outperforms other DRL algorithms.
The rest of this article is organized as follows: Section 2 describe the system model and the problem formulation is described in Sect. 2. In Sect. 4, we present the design of the FDRL-DDQN algorithm. Section 5 presents simulation results. Finally, Sect. 6 shows the conclusion of this paper.
System Model
In this paper, we consider a MEC network configuration that consists of a MEC server, an MCC server, a MEC base station (BS) and a set of
We consider the time into consecutive time frames, which are divided into \begin{equation*}
x_{i}+y_{i}+z_{i}=1, \forall i\in \mathrm{N}.\tag{1}\end{equation*}
2.1 Computing Model
This section focuses on modeling the delay and energy consumption experienced by multimedia devices during the execution of multimedia tasks. When multimedia device \begin{equation*}
r_{i}=B \log_{2}(1+\frac{p_{i}h_{i}}{\sigma^{2}}),\tag{2}\end{equation*}
The communication delay and the energy consumption of mobile task offloading are respectively given by
\begin{align*}
T_i^{b s} & =\frac{L_i}{r_i}, \tag{3}\\
E_i^{b s} & =p_i T_i^{b s},\tag{4}\end{align*}
Due to limited computing power and battery capacity, multimedia devices offload tasks to edge servers or cloud services to meet QoS requirements. The computation delay of MEC server and MCC, while offloading is given, respectively, as follows:
\begin{align*}
& T_i^e=\frac{C_i}{F^e}, \tag{5}\\
& T_i^c=\frac{C_i}{F^c},\tag{6}\end{align*}
\begin{align*}
& T_i^E=T_i^e+T_i^{b s},\tag{7} \\
& T_i^C=T_i^c+T_i^{b s},\tag{8}\end{align*}
Assuming that user-submitted multimedia tasks are se-lected for execution on the local multimedia device and they do not need to be offloaded to the edge server for processing, the processing delay and energy consumption for task on device is defined as
\begin{align*}
T_i^L & =\frac{C_i}{f_i^L}, \tag{9}\\
E_i^L & =\kappa_i\left(f_i^L\right)^2,\tag{10}\end{align*}
In this paper, we aim to optimize the computational resource allocation as well as the offloading policy, which minimizes the multimedia task execution cost. The long-term expected cost of each multimedia device is a weighted sum of execution delay and energy consumption. Each of multimedia device cost is given by
\begin{align*}
& T_i\left(p_i, f_i, x_i, y_i, z_i\right)=x_i T_i^L+y_i T_i^E+z_i T_i^C, \tag{11}\\
& E_i\left(p_i, f_i, x_i, y_i, z_i\right)=x_i E_i^L+y_i E_i^e+z_i E_i^c,\tag{12}\end{align*}
Problem Formulation
In this paper, the problem is formulated to solve the joint minimization of long-term delay and energy consumption of multimedia devices over time
In solving the optimization problem of offloading de-cisions and computational resource allocation for MEC systems, the objective of this paper is to minimize the total cost of the combination of execution delay and energy consumption of the devices in the MEC system. Based on the above analysis, the optimization problem can be described as follows:
\begin{align*}
& \min _{p_i, f_i, x_i, y_i, z_i} \omega T_i+\lambda E_i \\
& \text{subject to:} \\
& C 1: f_i^L \leq F_{\max}, \forall i \in N \\
& C 2: x_i E_i^L+y_i E_i^e+z_i E_i^c \leq E_{\max, i}, \forall i \in N\tag{13} \\
& C 3: T_i \leq T_{\max}, \forall i \in N \\
& C 4: x_i+y_i+z_i=1, \forall i \in N \\
& C 5: x_i, y_i, z_i \in\{0,1\}, \forall i \in N,\end{align*}
However, to satisfy the requirement of minimizing the total system cost under the multimedia task execution de-lay as well as energy consumption tolerance. With binary offloading variables
The Proposed FDRL-DDQN Algortihm
In this section, we present our solution to address the complex and non-convex optimization problem. We propose a deep reinforcement learning algorithm that combines federated learning, and for offloading actions, we adopt the Dueling DQN algorithm. This algorithm is referred to as FDRL-DDQN.
The framework of the FDRL-DDQN algorithm is il-lustrated in Fig. 3. The FDRL-DDQN algorithm contains three main components: the training of offloading decision and resource allocation, federated aggregation and update of local model parameters. In the first step, devices participating in federated learning are selected. Next, the local model is trained to learn multimedia task offloading decisions and resource allocation. Subsequently, the trained model param-eters are federated and aggregated. Finally, the updated parameters are distributed to each multimedia device involved in federated learning. Algorithm 1 provides a detailed de-scription of the proposed FDRL-DDQN algorithm in this paper. And We give a flow chart of the Federation frame-work in Fig. 4.
In a complex MEC network environment, mobile mul-timedia devices are faced with three options for computing multimedia tasks. This results in a total of
The FDRL-DDQN Algorithm
Input: wireless channel gains
Output: offloading action
Set the total time frame
Initialize the networks parameters
while
Set
Select the set of participating training devices.
for
Offloading action
Compute
Select the optimal offloading action
for each device
Interact with environment and calculate the cost
Train the model and set
Upload the network parameters
Federated average use by (20) in MEC server and get the
Transmits
end for
end for
end while
4.1 Select Device
In cases where a large number of devices are involved in joint learning, it can result in increased drop rates and un-necessary communication overhead. To address this issue, we introduce a device selection strategy in this paper. At the beginning of each iteration of the FDRL-DDQN algorithm, a specific set of multimedia device agents is carefully chosen to participate in the learning process. The principles of how to select devices in this article are as follows
\begin{equation*}\arg \max _{i \in N} \text{MSE}\left(\frac{d_i P_{\max, i}}{F_{\max, i}}\right),\tag{14}\end{equation*}
4.2 Local Model Training
For the training of local agents in the FDRL-DDQN algorithm, each multimedia device employs the DDQN algorithm to train its own local model and learn of Hoading and resource allocation strategies. The Dueling DQN algorithm utilizes an experience pool to store data for each state at time
1. State Space
\begin{equation*}
S_{i}=\{L_{i},\ h_{i}\},\tag{15}\end{equation*}
2. Action Space
In the FDRL-DDQN model considered in this paper, the intelligence is responsible for making appropriate decisions based on the computational multimedia tasks. The decisions include, determining whether the computational multimedia tasks are of Hoaded to the edge server or the cloud server, and how much computational resources should be allocated when the multime-dia tasks are executed locally. The action space consists of two parts, the multimedia device of Hoading decision
3. Reword Function
The cost of each agent is the weighted sum of the delay and energy consumption in the objective function. The optimization objective of this paper is to minimize the cost, so the reward function should be negatively corre-lated with the cost, so the reward function as shown
\begin{equation*}
R_i=-\left(\frac{\omega\left(y_i T_i^E+z_i T_i^C\right)+\lambda\left(y_i E_i^e+z_i E_i^e\right)}{\omega x_i T_i^L+\lambda x_i E_i^L}\right),\tag{16}\end{equation*}
The Dueling DQN algorithm is utilized to address complex decision control challenges in real-world multimedia environments. It combines Q-learning algorithms, empirical replay mechanisms, and target Q-values based on action value functions to approximate the Q-value of the optimal policy. Q-learning selects the action with the highest Q-value by consulting the Q-table, while dueling DQN uses a neural network to obtain the corresponding Q-value based on the input, resulting in improved operational speed and stability. As depicted in Fig. 5, the Dueling DQN architecture divides the fully connected layer of the network into two branches, each with its specific output. The upper branch represents the state value function, which quantifies the value of the static state environment itself, irrespective of actions taken. The lower branch represents the state-dependent action ad-vantage function, which captures the average action payoff relative to states, indicating the additional value brought by decision-making behavior. These two branches are then combined to derive the Q-value for each action. This approach allows for mutual supervision, eliminates redundant degrees of freedom, mitigates the risk of inflated Q-value estimates, and enhances algorithm stability. Therefore, in this paper, the notation \begin{equation*}
Q_i(s, a)=u_i(s, a)+\gamma \sum_{s^{\prime} \in S} P_{s s^{\prime}}(a) \max Q_i\left(s^{\prime}, a^{\prime}\right),\tag{17}\end{equation*}
\begin{equation*}
y_i=u_i(s, a)+\gamma Q_i\left(s^{\prime}, \underset{a^{\prime} \in A}{\arg \max} Q_i\left(s^{\prime}, a^{\prime}; \theta_i^{e v a l}\right), \theta_i^{t \arg e t}\right). \tag{18}\end{equation*}
Meanwhile, to obtain the optimal strategy and minimize the gap between the target value and the evaluated value, we set the loss function as
\begin{equation*}
L(\theta)=E[(y_{i}-Q_{i}(s,a))^{2}].\tag{19}\end{equation*}
4.3 Parameter Aggregation and Update
At the start of each learning round in FDRL-DDQN, the participating local devices upload their network parameter mod-els to the MEC server for model aggregation. This aggregation process combines the models to create a global model within the MEC. Subsequently, the MEC server distributes the aggregated global model parameters to each multimedia device participating in FDRL-DDQN as the network param-eters for the next round. In this paper, we employ FedAvg [24] as the model aggregation method.
\begin{equation*} \theta^{global}=\frac{\sum_{i\in\mathcal{W}}\theta_{i}^{eval}}{\vert \mathcal{W}\vert},\tag{20}\end{equation*}
Simulation Results
In this section, we use tensorflow1.0 GPU version to imple-ment the FDRL-DDQN framework in python and perform simulations to evaluate its performance. The main simulation parameter settings in this paper are shown in Table 3.
To simulate the proposed FDRL-DDQN algorithm, we constructe a network comprising 50 multimedia devices. However, only 10 devices were selected for each training round. Each device has a maximum computational capacity of 1 Gbps and a maximum energy consumption of 23 dBm. Due to the limitations of computing power on the multimedia devices in the MEC network, we utilize the smallest feasible neural network for our algorithm. Considering the computational constraints, our neural network consisted of an input layer, two hidden layers, and an output layer. The first and second layers consisted of 32 and 16 neurons, respectively. ReLU activation functions were used throughout the network.
5.1 Convergence Performance
In this section, we evaluate the convergence performance of the FDRL-DDQN algorithm and compare it with the dis-tributed DDQN. We examine the convergence of the two schemes using selected devices and all devices to partici-pate in the federation to address the non-IID data issue of multimedia devices. Additionally, we analyze the impact of learning rate and batch size on the convergence of the FDRL-DDQN algorithm.
Firstly, we assess the convergence speed of the training loss in the FDRL-DDQN algorithm. In Fig. 6, the average training loss
In Fig. 7, we address the non-IID problem among mobile multimedia devices by selecting only a fraction of de-vices to participate in each round of federated learning, en-suring the convergence speed of the overall algorithm. To evaluate the effect of the device selection method on the convergence of the FDRL-DDQN algorithm, we compare the approach of adding all devices to federated learning with the partial device addition. The convergence performance of the FDRL-DDQN algorithm is validated using 5W randomly generated multimedia tasks, which are offloaded for optimal resource allocation decisions. In each iteration round, the in-telligence derives a reward value based on its decision. The average reward of the FDRL-DDQN algorithm is plotted, showing an increasing trend with the number of iterations as the intelligence improves its decision-making ability. The algorithm converges after approximately 200 iterations. It can be observed from the figure that the average reward value of the overall algorithm, after utilizing the device selection mechanism, is significantly higher and converges faster compared to the approach of adding all devices to federated learning.
Furthermore, we examine the impact of learning rate and batch size on the convergence of the FDRL-DDQN algorithm. Figure 8 illustrates the effect of the intelligence's learning rate on the convergence performance of the FDRL-DDQN framework. We experiment with learning rates of 0.0001,0.001, and 0.002. While a higher learning rate leads to faster learning, the figure shows that a learning rate of 0.0001 results in slow convergence due to the low learning rate. On the other hand, a learning rate of 0.002 increases learning efficiency, but it compromises algorithm stability, causing repeated oscillations that hinder convergence. Thus, in this paper, we set the learning rate to 0.001 in the simulation.
Another parameter of interest is the batch size for multimedia task processing. Figure 9 demonstrates that increasing the batch size improves the convergence of the FDRL-DDQN algorithm. With a small batch size of 10, convergence takes around 380 iterations. However, when the batch size is in-creased to 20, convergence occurs after 310 iterations, and with a batch size of 30, the algorithm converges quickly in only 180 iterations. Larger batch sizes enable training with more instances, providing the intelligence with faster expe-rience accumulation. Consequently, the executed actions can reach optimal solutions more rapidly, resulting in faster algorithm convergence.
5.2 Comparison of Total Cost
In this section, we first compare the proposed FDRL-DDQN algorithm with the distributed DDQN algorithm. To further evaluate the algorithm's performance, we also compare it with the centralized DDQN algorithm, centralized DQN algorithm, and two baseline computation offloading policies: mobile execution and edge node execution. Mobile execution refers to local device computation of multimedia tasks, while edge node execution involves offloading all tasks to the edge node. We investigate the effect of delay weights on the FDRL-DDQN algorithm in comparison to the four mentioned centralized algorithms. Additionally, we discusse the trade-off between delay and energy consumption.
In Fig. 10, we experiment with the weights of delay and energy consumption on algorithm performance. To handle different types of mobile multimedia tasks, we set the weights of delay
Figure 11 illustrates the equilibrium trend of the network's average delay and average energy consumption as the delay weight varies. In this simulation, the number of users is set to N=5. From Fig. 11, it is apparent that the network's average delay gradually decreases as the delay weight increases, while the average energy consumption of the network increases. Both the average delay and average energy consumption of the network stabilize when the delay weight reaches a certain threshold when the
Figure 12 compares the average total cost of the pro-posed FDRL-DDQN algorithm with the distributed DDQN algorithm. We select three multimedia devices which are trained individually using the distributed DDQN algorithm without any parameter exchange between the three multimedia devices during the period. When the training is fin-ished, we add the three multimedia devices to the FDRL-DDQN framework for retraining until convergence. The experimental results show that the cost of each device is reduced using the FDRL-DDQN algorithm, where the aver-age consumption can be reduced by 20.3%. By combining federated learning with deep reinforcement learning, cooperati ve training between devices is achieved. It avoids the equipment alone training by environmental instability, action space, state space and other inexperienced impact, and provides a relatively stable intelligent body learning environment, combining different devices together intelligently. Because of the devices involved in training only upload the model parameters needed for learning, it can effectively pro-tect the privacy and security of users. In addition, federated learning enables knowledge sharing between devices and delay or increased average energy consumption.
To demonstrate the performance benefits of the FDRL-DDQN algorithm, in Fig. 13, it shows the impact of multimedia tasks data size on the average total cost of devices. We compared FDRL-DDQN algorithm with centralized DDQN algorithm, centralized DQN algorithm, mobile execution algorithm and edge note execution algorithm. It can be seen that the FDRL-DDQN has a faster learning speed than the basic centralized DDQN, and the cost rises more slowly with the number of tasks, with the smallest optimization performance at a total cost of 600 Kbits and the largest optimization performance at a total cost of 1000 Kbits. This is due to the fact that when the multimedia task data size increases, more and more information be interacted between devices, the advantage of federated learning can be fully reflected, and the learning speed of the intelligence will be faster and faster, while the centralized DDQN will decrease due to the increase of the multimedia task data size, which leads to the exponential growth of the action space received by the intelligence. FDRL-DDQN algorithm can reduce the total cost by 7.1 % when the system multimedia task volume is 600 Kbit and 31.3% when the multimedia task volume is 1000 Kbit compared with centralized DDQN, while FDRL-DDQN algorithm can reduce up to 35.3% compared with mobile local offloading algorithm and 34.8% compared with edge node offloading algorithm, because Federated Deep Re-inforcement Learning combines FL and DRL are organically combined to obtain the exact optimal policy by intelligent and effective learning for multiple device parameters. In addition, FDRL-DDQN can reduce the cost by up to 30.1 % compared to the centralized DQN algorithm which has al-ready widely used to offload multimedia task policies. The biggest advantage of the DDQN algorithm over the DQN algorithm is that it can ensure the stability of the target network, which helps the whole system to update the parameters and thus converge faster to get the optimal policy. When FL is combined with DDQN its effect is more obvious, FL makes DDQN algorithm more stable making the final result more accurate and faster convergence.
Conclusion
In this paper, we propose an adaptive offloading algorithm FDRL-DDQN that combines federation learning and deep reinforcement learning. For the computational offloading problem in the MEC scenario of mobile multimedia dynamic multimedia task arrival, we jointly allocate computational and communication resources with the goal of minimizing latency and energy consumption, and make reasonable of-floading decisions. For the non-IID data problem with different multimedia devices, we design an adaptive device selection mechanism as a way to ensure the convergence of FL. In addition, we compared FDRL-DDQN with centralized Du-eling DQN, distributed DDQN, mobile algorithm and edge algorithm with better results. Simulation results show that the algorithm has good latency and energy performance. In future work, we will consider resource coordination among multiple MEC servers, as well as investigate more flexible and generalized resource allocation and computational of-floading strategies.