Journals & Magazines >IEEE Access >Volume: 11

A Deep Reinforcement Learning-Based Optimal Computation Offloading Scheme for VR Video Transmission in Mobile Edge Networks

Large bandwidth, Low latency and intensive computing are the main challenge in high-performance virtual reality (VR) video transmission. This work focuses on the autonomo...

Abstract:

Large bandwidth, Low latency and intensive computing are the main challenge in high-performance virtual reality (VR) video transmission. As mobile edge computing (MEC) ca...Show More

Metadata

Abstract:

Large bandwidth, Low latency and intensive computing are the main challenge in high-performance virtual reality (VR) video transmission. As mobile edge computing (MEC) can provide computation and storage resources closer to terminals, it has been a promising mode in VR video transmission to substantially improve communication quality. This work focuses on the autonomous perception ability in MEC-supported VR video transmission, and introduces deep reinforcement learning to investigate optimal task offloading solutions. Therefore, this paper proposes a deep reinforcement learning-based optimal computation offloading scheme for VR video transmission in mobile edge networks. Specifically, a Deep Deterministic Policy Gradient-based computation offloading algorithm in designed as the main technical framework. The optimal planning of computation offloading strategies is viewed as a Markov decision problem, and a deep Q-Network is employed to deal with it. Finally, the setting of MEC-supported VR video transmission scenes is simulated, in which the proposed scheme is implemented for evaluation. The results are displayed in visualization format and show that the proposed task computation scheme can possess proper performance results in MEC-supported VR video transmission scenes.

Large bandwidth, Low latency and intensive computing are the main challenge in high-performance virtual reality (VR) video transmission. This work focuses on the autonomo...

Published in: IEEE Access ( Volume: 11)

Page(s): 122772 - 122781

Date of Publication: 26 October 2023

Electronic ISSN: 2169-3536

DOI: 10.1109/ACCESS.2023.3327921

Funding Agency:

Contents

CCBY - IEEE is not the copyright holder of this material. Please follow the instructions via https://creativecommons.org/licenses/by/4.0/ to obtain full-text articles and stipulations in the API documentation.

SECTION I.

Introduction

Since 2016, the virtual reality industry has been growing rapidly. But due to the high demand for local computing and rendering equipment, users are still mainly a few enthusiasts and VR business is difficult to serve ordinary users [1]. Cloud virtual reality (VR) is a cloud-based real-time virtual reality technology, which uses cloud servers instead of users’ local computing devices [2]. However, due to the intensive data volume of VR video, the bandwidth and latency limitation of network transmission becomes the new bottleneck of the whole system after the cloud computing and rendering [3]. For the basic 4KB resolution CloudVR service, the bandwidth requirement needs at least 40Mbit/s [4]. While the round trip time (RTT) should be less than 70 ms to provide a good experience for users [5]. In VR-based mobile network structures, currently, the transmission distance between the side of users and the side of servers usually remains in metropolitan level. Without considering the time consumption in procedures of forwarding and transmission, the RTT is as high as 20 to 40 ms for fiber optic transmission alone [6]. Such circumstances make it difficult to reach the requirement of instantaneous VR communication [7], [8]. With the development and implement of 5G networks, the bandwidth of mobile network is greatly improved. Mobile edge computing (MEC) technology deploys the server at the edge of the base station near the user, which greatly reduces the transmission delay by sinking the user-plane gateway, application edge, etc., making CloudVR possible [9]. On the other hand, the source side (edge server) and the channel side (base station) are more closely connected in the MEC scenario, and the bandwidth is sufficient to support baseband data at the server [10].

Each frame of VR video contains full view information, but the user can only see a small portion of the image within its FOV (field of view) when watching VR video, which means that there is a large amount of redundancy in each frame of VR image [11]. Ideally, only the valid image information within the user’s FOV can be pushed based on the user’s viewpoint information [12]. However, due to the limitations of network latency and bandwidth and the special nature of VR video viewing, this approach will lead to severe lag and image switching lag (switching images only when a new frame arrives) [13]. Therefore, the current mainstream program is based on the user’s point-of-view information transmission quality uneven code stream program [14]. It is expected to ensure the quality of the image within the FOV at the same time, as far as possible to reduce the quality of redundant images [15]. When the user’s viewing direction changes slightly, the user does not need to wait for the arrival of new-frame data, and can complete the screen switch locally, solving the problem of lag and lag [16]. The server side dynamically adjusts the FOV position of the transmitted video based on the user’s uploaded viewpoint information to match the user’s FOV as much as possible and realize the dynamic pushing of the user’s viewpoint perception [17].

For the construction of non-uniform quality panoramic images, the most common solution is to split the panoramic video into different tiles and transmit different quality tiles according to the user’s FOV, which largely saves network bandwidth [18]. These solutions are based on HTTP adaptive streaming protocols such as DASH, and the extend temporal segmentation to spatial segmentation [19]. Firstly, the panoramic video is divided into multiple blocks in time and space, and then multiple quality versions are generated for each block separately [20]. According to the user’s point-of-view information, the appropriate quality version is selected for each block for transmission, and the closer the quality is to FOV, the higher the quality is, realizing dynamic adaptive pushing flow [21]. In essence, these methods are still based on ERP (equirectangular projection) panoramic map for block coding, and the quality of the image is relatively rigid between blocks, which affects the user’s viewing experience. Another type of methods are based on projection transformation, such as tetrahedral and cube projection. They use the classical map projection idea to divide the sphere into many spherical trapezoids and project them onto some kind of polyhedra, with the characteristics of small distortion and high compression efficiency [22].

While common polyhedral projection methods have equal size on each side, a pyramidal projection scheme proposed by Facebook projects the spherical surface onto a positive quadrilateral and uses the difference in projected areas of different surfaces to generate a non-uniform mass image with a clear bottom surface and blurred sides [23]. This method integrates the generation of non-uniform quality images into the projection transformation, and the image quality changes more naturally. The above studies mainly focus on the source perspective and reduce the redundant data by coding or projection methods. An VR transmission mechanism based on joint source-channel coding is proposed by Cheng et al. [24], which maximizes the viewing quality within the user’s FOV by using different levels of error protection after chunking the VR video with the user’s FOV information. In the study of [25], in order to measure the user’s viewing experience, a QoE (quality of experience) metric is defined and an efficient algorithm is presented for controlling the code rate and to maximize the QoE [26].

The research [27] investigates how to use SDN-based microcellular network architecture in 5G for multi-path collaborative VR video transmission, hoping to enhance the quality of VR video transmission using microcellular and edge data center (EDC) to improve the user experience. The main idea is that based on the pre-cached content in the edge data center, millimeter wave transmission of cached content is used between microcell sites to satisfy the requests of users within the coverage area microcell sites, thus satisfying the demand for low latency. The research [28] explores the transmission of VR video multicast in 5G networks and discusses the challenges of VR video multicast over 5G microcellular networks, while they propose a single frequency network (SFN)-based implementation and a millimeter wave band-based scheme, respectively [29]. In addition, a mechanism for VR video multicast transmission using D2D assistance in 5G is proposed in the research [30], which aims to efficiently utilize spectrum resources for VR video transmission in a multicast manner.

The research [31] proposes a multi-path layered transmission mechanism in 5G-based heterogeneous networks, where the base and augmented layer VR video content is mainly transmitted via WIFI, while the 5G network is mainly used for data retransmission as well as correction. The research [32] utilizes the mode of 5G MEC to investigate an adaptive VR video transmission scheme, where the transmission rate is minimized by optimizing the computation offloading and caching strategies with limited latency and energy consumption. The research [33] focuses on user viewpoint-based caching strategies in MEC and a viewpoint-aware caching strategy is presented to increase the hit rate of cached content. In the scenario of multiple micro-base stations with multi-user multicast, research [34] proposes a deep learning assisted scheduling and content quality adaptive ground transmission mechanism to solve the problem of VR video for micro-base station multicast transmission. The research [35] discusses the advantages of millimeter wave in VR video transmission and proposes a solution using mobile edge computing as well as pre-caching, aiming to provide reliable VR video transmission, taking the more interactive VR games as the research object. In the research [36], the problem of fusing millimeter wave communication, mobile edge computing, and pre-caching solution is further modeled, the corresponding solution algorithm is given, and the solution is verified by simulation to have lower latency compared to the conventional solution. Based on the MEC architecture, this paper proposes a resource optimization model based on the 5G smart edge, which can guarantee the service demand of mobile VR by collaboratively optimizing communication, computation and storage.

Given above analysis and description, we summarize main contributions of this paper as follows:

The latency model of mobile VR services based on 5G edge computing are formulated in order to adapt the reinforcement learning model.
We adopt reinforcement learning to tackle the offloading decision problem of mobile VR in edge computing. The proposed offloading algorithm is designed for improving the efficiency of edge resource.
We conduct considerable amount of simulation experiments to verify performance of the proposed computation offloading scheme.

SECTION II.

System Model

Figure 1 shows the mobile VR service architecture with 5G smart edge converged communication, computation and storage. The architecture mainly includes mobile VR terminal, 5G access network, and MEC. this paper focuses on lightweight VR device and mobile terminal-based VR headset. 5G access network provides access services for users, and this paper adopts 5G typical network structure, where users transmit services through a remote radio head unit (RRH). The edge computing server is deployed near by the baseband pool (BBU), which provides storage and computing resources.

FIGURE 1.

System model.

Show All

The latency of mobile VR services based on 5G edge computing mainly consists of two parts: content transmission latency and computation offloading latency, as shown in Figure 2, which illustrates the composition of computation offloading latency for offloading computation tasks to edge nodes. It is assumed that there are $\bf {U}$ mobile VR user, $\bf {N}$ VR video, $\bf {R}~5\text{G}$ base station and $\bf {K}$ edge computing node. The bandwidth and average signal-to-noise ratio of the user between associated base stations are denoted as ${B_{r,u}}$ and ${\beta _{r,u}}$ , respectively, and the data transfer rates of the forward haul from the base stations to the BBU and the backhual from the BBU to the cloud service are denoted as ${d_{f}}$ , ${d_{b}}$ respectively. Therefore, the transmission delay of 1-bit data from the cloud server to the BBU, from the BBU to the base station, and from the base station to the VR user terminal is written as follows:

$\begin{align*} {t_{b}} &= \frac {1}{d_{b}/U} \tag{1}\\ {t_{f}} &= \frac {1}{d_{f}/U} \tag{2}\\ {t_{w}} &= \frac {1}{{{B_{r,u}} \cdot {\log _{2}}(1 + {\beta _{r,u}})}} \tag{3}\end{align*}$ View Source

FIGURE 2.

Latency model for mobile VR.

Show All

Through multi-level caching based on interest and popularity prediction, the VR contents requested by users are then all at the C-RAN edge nodes. Therefore, the content transmission delay from the C-RAN edge node to the user terminal is $D_{u}^{n,\tau }({t_{f}} + {t_{w}})$ . In this paper, two types of intensive computation tasks, 3D scene reconstruction and high realism rendering, are computationally offloaded. The computationally offloaded delay mainly consists of two parts, one is the computation delay of the computation task calculation; the other part is the transmission delay of the computation task offloading and return the computation result data.

The computational task of Mobile VR user $u$ is denoted as ${C_{u}} = \{ C_{u}^{n,\tau },D_{u}^{n,\tau }\}$ ,where $C_{u}^{n,\tau }$ denotes the CPU cycles required for the computational task of 3D scene reconstruction and rendering, $D_{u}^{n,\tau }$ is the amount of data currently requested by the user for VR content as shown in Equation (3), where $q_{n,\tau,m}^{k}$ is the bit rate of the first Tile of the first video segment of the first VR video. The data volume of the 3D scene reconstruction and rendering computation tasks, and the data volume of the computation task return results are defined as $\varsigma _{u}^{n,\tau }$ , $\xi _{u}^{n,\tau }$ . In addition, for user terminals and edge nodes, computing power of them are denoted as $H_{u}$ and $H_{k}$ , respectively.

$\begin{equation*} D_{u}^{n,\tau } = \sum \limits _{k}^{K} {\sum \limits _{m}^{M} {q_{n,\tau,m}^{k}} } \tag{4}\end{equation*}$ View Source

In this paper, the following three computation modes are used: 1) the mobile terminal completes the computation task locally, 2) the computation task is offloaded to the edge node, and 3) a hybrid mode of local computation and edge node computation offloading. Therefore, the computational delay for local 3D scene reconstruction and rendering $t_{l}$ , the computational delay for projection and rendering at the edge node $t_{k}$ , and the computational delay for rendering the computational tasks locally and at the edge node $t_{h}$ , respectively, are expressed as Equations (5), (6), and (7), respectively:

$\begin{align*} {t_{l}} &= \frac {{C_{u}^{n,\tau }}}{H_{u}} \tag{5}\\ {t_{k}} &= \frac {{C_{u}^{n,\tau }}}{{{H_{u,k}}}} \tag{6}\\ {t_{h}} &= \frac {{\delta C_{u}^{n,\tau }}}{H_{u}} + \frac {{(1 - \delta)C_{u}^{n,\tau }}}{{{H_{u, k}}}} \tag{7}\end{align*}$ View Source

SECTION III.

Deep Reinforcement Learning-based Task Offloading Scheme

A. Problem formulation

The massive amount of information in mobile VR leads to high processing complexity, which puts a strong demand on the intensive computing capability of the system. VR information processing requires real-time completion of intensive computing tasks to ensure that users can obtain a natural and smooth experience. Due to the limitation of computing power, mobile terminals are difficult to support mobile VR intensive computing tasks in mobile VR computing tasks, mobile terminals. There are still obvious gaps in terms of computing power to meet the needs of mobile VR intensive computing. If the calculation is migrated to a remote cloud server, its latency performance is difficult to guarantee. Therefore, it is necessary to effectively make full use of computing ability of 5G mobile terminals from comprehensive views, so as to fulfill the intensive computing requirements of VR. Through the analysis of the intensive computing tasks of mobile VR, it is found that 3D scene reconstruction and rendering require the largest amount of computation in the whole process, which cannot be satisfied by the current computing power of mobile terminals, while such computing tasks as video/audio decoding can meet the computational requirements. Therefore, this paper will focus on two types of intensive computation tasks, namely 3D scene reconstruction and high realism rendering, to study the computation offloading strategy.

The time-varying characteristics of the wireless spectrum lead to fluctuations in the communication links of mobile VR users. In this paper, the time-varying characteristics of the wireless channel need to be considered in the computational offloading, and the computational offloading allocation algorithm for the channel quality of mobile terminals will be designed. The computing capacity of edge computing nodes is limited to a certain extent, especially in the current situation that edge computing nodes cannot meet the offloading of intensive computing tasks for multiple users under the nodes, and it is necessary to offload some computing tasks to the core network edge computing nodes to meet the offloading demand of intensive computing. Based on the above analysis, this paper models the offloading optimization problem of VR computing tasks as the following problem:

$\begin{align*} &\min \limits _{{\mathbf{A}},{\mathrm{ B}},\Delta } \sum \limits _{u}^{U} {\sum \limits _{\tau} ^{\Gamma} {{t_{de}} + \big(\alpha {t_{l}} + \lambda {t_{k}} + \big(1 - \alpha - \lambda\big)} } {t_{h}}\big) \\ & s.t. {\mathrm{ }}{t_{\max }}C_{u}^{n,\tau } \le {H_{u}} \\ &\hphantom { s.t. }{\mathrm{ }}{t_{\max }}\big(1 - \delta\big){\mathrm{ }}\sum \limits _{U} {C_{u}^{n,\tau }} \le \sum \limits _{U} {{H_{u,k}}} \\ &\hphantom { s.t. }{\mathrm{ }}a \in \{ 0,1\} \\ &\hphantom { s.t. }{\mathrm{ }}\lambda \in \{ 0,1\} \tag{8}\end{align*}$ View Source

where

$t_{de}$

is the content transmission delay from the C-RAN edge node to the user terminal.

$\begin{equation*} t_{de} = D_{u}^{n,\tau }({t_{f}} + {t_{w}}) \tag{9}\end{equation*}$

View Source

B. Proposed DQN algorithm

Markov Decision Process (MDP) is the most classic in reinforcement learning A modeling approach, it is a mathematical model of sequential decision making used to de-model in a system the stochasticity of policies achievable by an intelligence with the reward values returned by the environment. When MDP is used to deal with the ofloading decision problem of edge computing, the main objects can be represented by a triple $\{S,A, R\}$ . Among, $S$ denotes the whole state space of scenes, $A$ denotes the whole action space of agents, and $R$ denotes the rewarding function for actions taken by agents. And we consider the algorithmic process within the control node as an intelligent body, which will automatically obtain from each interaction with the environment $\{S_{1},A_{1}, R_{1},\ldots.S_{i},A_{i}, R_{i}\}$ sequence. As more experience is gained, the intelligence performs better in decision making. The definition of $\{S,A, R\}$ is as follows:

$S$ : Network state. The computing entities in the proposed network model in this paper are endpoints, edge servers and cloud servers. The cloud server is approximated as an infinite resource, so there is no task queue duration in calculating the computational latency of the cloud server. The network state of edge computing is N terminals ${ q_{1},\ldots.q_{n}}$ with $E$ edge servers ${ q_{1},\ldots.q_{e}}$ the task queue.

$A$ : Offloading action. At each instant, taking into account the task’s individual delay threshold, an action may be taken by the agent, so as to assign tasks for a server to make processing. We designate the computational offloading action as $A({A^{d \to \tilde e}},{A^{\tilde e \to \mathord {\stackrel {\scriptscriptstyle \leftharpoonup } e} }})$ .

$R$ : Reward function definition. For each action made by the intelligence, the environment will provide a reward automatically. Then, we can define the cumulative reward as:

$\begin{align*} {R_{i}}& = \begin{cases} \displaystyle { - {t_{i}}}\\ \displaystyle {\ln \left({1 - \frac {1}{{{e^{\sqrt {t_{i}} }}}}}\right)} \end{cases} \tag{10}\\ G &= \sum \limits _{t = 1}^{T} {R(s_{t}', a_{t}',{s_{t + 1}})} \tag{11}\end{align*}$ View Source

The final objective is:

$\begin{equation*} \mathop {Max}\limits _{A} E\left[{\sum \limits _{t = 1}^{T} {R(s_{t}',a_{t}',{s_{t + 1}})}}\right] \tag{12}\end{equation*}$ View Source

The MDP problem can be modeled to a quadratic $< S,A,P,R>$ . In order to make the resource allocation decisions, the agent has interactions between the environment to acquire samples, and estimates the reward by the resulting samples. The final goal is to solve the optimal policy $\pi _{*}$ . To fit for the above analyzed problem scenario, the DDPG-based approach is chosen to construct the decision-making strategy. DDPG is an enhanced gradient algorithm for Policy, and is driven from the dual network of deep Q-network (DQN). The deterministic policy, as opposed to the random policy, has only one action selection, simplifying the sample size during training.

As shown in Figure 3, the deployment of the intelligent decision algorithm comprised of two components: the environment and the intelligent agent. The intelligent agent responsible for offloading decisions for edge computing is deployed on the core router, and the task information generated on the device is handed over to the intelligent agent for decision making. The intelligent agent is deployed with a DDPG-based offloading algorithm for edge computing. DDPG employs a policy $\pi _{\theta }(s)$ for resource allocation decision-making, which deterministically converts a state into a specific action, and chooses only one action compared to a random policy, which significantly enhances the training convergence. Within the Actor-Critic network, the agent is boosted by the Actor network using the Policy Gradient method, and the action with the highest probability is selected directly via the policy function $\pi _{\theta }(s)$ at the current state. Correspondingly, the action of Actor’s evaluation is determined by the Critic network. The formula for DQN-based policy gradient is:

$\begin{equation*} {\nabla _{\theta} }J({\pi _{\theta} }) = {E_{s,{\rho ^{\pi} }}}[{\nabla _{\theta} }{\pi _{\theta} }(s){\nabla _{\pi} }{Q_{\pi} }(s,a)|a = {\pi _{\theta} }(s)] \tag{13}\end{equation*}$ View Source

FIGURE 3.

DDPG based intelligent decision algorithm for VR task offloading.

Show All

As depicted in Figure 3 the DRL-based algorithm for distributed dynamic resource optimization consists of four components. They all have their own identical structures. First, the agent gets the current network state through having interaction with the environment. And it records the present state vector, action vector, reward and next state vector as $\phi (S)$ , $A$ , $R$ , $\phi (S')$ respectively. It stores them within the experience pool. With the purpose of gaining deeper insights into the action space, a fixed amount of noise is introduced to the actions chosen by Actor. Then, subsequent to the accumulation of a specific data volume in experience pool, a mini-batch size data block, with a specific size, is extracted from it and fed into the Critic network to calculate the following actions. Correspondingly, the loss function is calculated as:

$\begin{equation*} loss = \frac {1}{m}\sum \limits _{j = 1}^{m} {{{({y_{j}} - {\hat y_{j}})}^{2}}} \tag{14}\end{equation*}$ View Source

where

${\hat y_{j}}$

is denoted as:

$\begin{equation*} {\hat y_{j}} = Q(\phi ({S_{j}},{a_{j}},w) \tag{15}\end{equation*}$

View Source

Once the loss function has been obtained, the agent then updates all the parameters of Critic network by passing the gradient backwards through $w$ of the neural network. Leveraging the loss function, Eval-Net update the parameters of the network uses with gradient descent scheme, after a specific episode, the latest parameters are copied from Eval-Net to Target-Net. DDQN uses the TD difference method to achieve the update in each step. After multiple episodes of parameter updating, the loss function of Eval-Net will converge to the optimal policy $\pi$ after applying the trained network parameters $w$ . The update of the parameters for the Actor-Critic neural network is defined as follows:

$\begin{align*} {w^{\prime} } &\leftarrow \tau w + (1 - \tau){w^{\prime} } \tag{16}\\ {\theta ^{\prime} } &\leftarrow \tau \theta + (1 - \tau){\theta ^{\prime} } \tag{17}\end{align*}$ View Source

In contrast to DQN, the proposed algorithm utilizes a step-wise update method, where only a part of the parameters are updated each time. At the same time, aiming at the achieving better exploration of the whole solution space, the learning process incorporates the introduction of noise $\eta$ as well, which increases the randomness when selecting actions, and the action can be selected via the following formula:

$\begin{equation*} A = {\pi _{\theta} }(S) + \eta \tag{18}\end{equation*}$ View Source

Subsequently, the loss functions for both the Critic network and the Actor network are represented as:

$\begin{equation*} J(w) = \frac {1}{m}\sum \limits _{j = 1}^{m} {{{({y_{j}} - {\hat y_{j}})}^{2}}} \tag{19}\end{equation*}$

View Source

And the gradient of the loss function pertaining to the Actor network is articulated in the subsequent manner.

$\begin{equation*} {\nabla _{\theta} }J({\pi _{\theta} }) = {E_{s,{\rho ^{\pi} }}}\left ({{\Omega _{1} \cdot {\Omega _{2}}} }\right) \tag{20}\end{equation*}$

View Source

where

$\begin{align*} {\Omega _{1}} &= {\nabla _{\pi} }{Q_{\pi} }(s,a){|_{s = {s_{i}},a = {\pi _{\theta} }(s)}} \tag{21}\\ {\Omega _{2}} &= {\nabla _{\theta} }{\pi _{\theta} }(s)|s = {s_{i}} \tag{22}\end{align*}$

View Source

The DDPG-based computation offloading algorithm encompasses two components. At first the decisions executed by the intelligent agent resemble those of the stochastic algorithm, and after the DDPG-based computation offloading algorithm learns from the interaction, the offloading policy becomes closer to the optimal. The parameters of the trained DDPG neural network will be updated at the end of the learning iteration.

SECTION IV.

Simulation and Analysis

A. Setup

In this paper, the simulation of edge environment is conducted using the Networkx library in python, while the construction of the proposed algorithm in this paper is achieved through the application of tensorflow. This experiment divides the edge servers into three layers, which includes the terminal layer, the edge layer and the cloud layer. The network model assumes that there are 3–7 wireless edge servers and in the edge layer,there are 2–5 wired edge servers, and the wireless edge servers are served in a certain range. Initially, for each AP,there exist N=3 edge devices within the service range, and each edge in each time slot follows the Poisson distribution for the probability of offload requests. The computing capacity of each edge server is set from 4GHZ to 6GHZ, the computation capacity of the cloud is 10GHZ, and the computation capacity of each edge is set as 0.5GHZ. The bandwidth between the edge server and terminal is set as 100MB/S. The bandwidth between edge server and the wireless terminal is set as 50MB/S. The bandwidth among different edge servers is set as 300MB/S. The transmission delay among edge servers is set as 10ms. And for link between the edge server and the cloud server, transmission delay is set as 30ms. It is assumed that both of the Actor network and Critic network are set as a three-layer structure. And the second fully connected layer is assumed to consist of 200 neurons. The neural network for this experiment is implemented by Tensorflow, and the hyper-parameters of networks are given in Table 1. The specific parameters of the compared DQN are presented in Table 2.

TABLE 1 Parameters of Simulation

TABLE 2 Parameters of DQN

B. Results

In this paper, firstly, we compared the convergence of this algorithm throughout the training process, followed by a comparison with the performance of other algorithms in the identical scenario. As illustrated in Figure 4, the average task latency drops rapidly in the first 30000 training rounds. However, after 30000 of training rounds, the task latency stabilizes at 13.8 ms or less. As shown in Figure 5, the task completion rate increases significantly in the first 20000 rounds of training, and the training converges to 85%-90% after the 20000th iteration. The above results show that the training effect slows down after 20000 iterations of the algorithm, and the training convergence is achieved.

FIGURE 4.

Performance of latency of tasks with DDPG.

Show All

FIGURE 5.

Performance of tasks completion rate with DDPG.

Show All

In the simulation, we are trying to examine the performance of the proposed method with different situation. As shown in Figure 6, the number of users and mobile edge is fixed, and with the increase of computation of mobile edge, the latency of tasks are decreased. As shown in Figure 6, the DDPG algorithm is significantly better than the local computation and server computation mode, and the DDPG is also better than the DQN since the action space of the DQN algorithm is discrete values. when the server computation capacity improves, the average task latency of offload method decreases except for the terminal local computation, and the local computation is limited by bandwidth and the computation capacity of the terminal’s processor. Then, the relationship between the number of servers and the task latency expectation is given in Figure 7, the capacity of edge servers and the number of users (i.e. the computation tasks) are fixed, with the increase of the number of edge servers, the individual task latency expectation is gradually decreasing, limited by the bandwidth between servers. From the figure we can see that with the increase of servers count to 8, it is not obvious to increase to improve the effect. The effect of the number of mobile terminals on the expected task delay is shown as Figure 8. We observe that the DDPG-based algorithm outperforms better the rest of the algorithms, and the expected task delay experiences a rapid surge with the growing number of mobile terminals. In the context of mobile edge computing, the growing number of mobile terminals results in a corresponding rise in task offloading requests during each time slot, consequently amplifying the computational burden on the edge server.

FIGURE 6.

Comparison of latency performance VS. computation of mobile edge.

Show All

FIGURE 7.

Comparison of latency performance VS. number of mobile edge.

Show All

FIGURE 8.

Comparison of latency performance VS. number of mobile VR device.

Show All

Figure 9 shows the task success rate for each algorithm. It is evident that the DDPG-based task offloading decision mechanism outperforms other mechanisms when considering the same task queue, followed by the probability of completing the task within the specified time frame for DQN and edge server offloading computation, and the local computing scheme without task computation offloading has difficulty in meeting the computation time frame requirements of the application. The reason for this result is that the action space of DDPG is continuous. Therefore, the action granularity of computation offloading is more delicate and precise than that of the DQN-based computation offloading mechanism. The computation offloading method in which the tasks are all executed on the edge server will waste the computation resources of the end device and the cloud server, so although the edge server computation offloading method can better meet the requirements of the device application, the service effect is still inferior to the two methods based on DRL. End devices with weak processors will have difficulty in meeting the computationally intensive tasks in the task queue without using compute offloading services.

FIGURE 9.

Comparison of task completion performance with different methods.

Show All

SECTION V.

Conclusion

This paper utilizes reinforcement learning to address the offloading decision issue concerning mobile VR in edge computing. First, The latency model of mobile VR services in 5G edge computing network are formulated in order to adapt the reinforcement learning model. Then, a DDPG-based offloading algorithm is designed for improving the efficiency of edge resource. Finally, Plenty simulations are conducted to verify that the proposed DDPG-based resource offloading algoirthm can improve the performance of edge computing offloading service.

References is not available for this document.

MIT Libraries

MIT Libraries

A Deep Reinforcement Learning-Based Optimal Computation Offloading Scheme for VR Video Transmission in Mobile Edge Networks

Abstract:

Metadata

Abstract:

Funding Agency:

Introduction

System Model