Introduction
Since 2016, the virtual reality industry has been growing rapidly. But due to the high demand for local computing and rendering equipment, users are still mainly a few enthusiasts and VR business is difficult to serve ordinary users [1]. Cloud virtual reality (VR) is a cloud-based real-time virtual reality technology, which uses cloud servers instead of users’ local computing devices [2]. However, due to the intensive data volume of VR video, the bandwidth and latency limitation of network transmission becomes the new bottleneck of the whole system after the cloud computing and rendering [3]. For the basic 4KB resolution CloudVR service, the bandwidth requirement needs at least 40Mbit/s [4]. While the round trip time (RTT) should be less than 70 ms to provide a good experience for users [5]. In VR-based mobile network structures, currently, the transmission distance between the side of users and the side of servers usually remains in metropolitan level. Without considering the time consumption in procedures of forwarding and transmission, the RTT is as high as 20 to 40 ms for fiber optic transmission alone [6]. Such circumstances make it difficult to reach the requirement of instantaneous VR communication [7], [8]. With the development and implement of 5G networks, the bandwidth of mobile network is greatly improved. Mobile edge computing (MEC) technology deploys the server at the edge of the base station near the user, which greatly reduces the transmission delay by sinking the user-plane gateway, application edge, etc., making CloudVR possible [9]. On the other hand, the source side (edge server) and the channel side (base station) are more closely connected in the MEC scenario, and the bandwidth is sufficient to support baseband data at the server [10].
Each frame of VR video contains full view information, but the user can only see a small portion of the image within its FOV (field of view) when watching VR video, which means that there is a large amount of redundancy in each frame of VR image [11]. Ideally, only the valid image information within the user’s FOV can be pushed based on the user’s viewpoint information [12]. However, due to the limitations of network latency and bandwidth and the special nature of VR video viewing, this approach will lead to severe lag and image switching lag (switching images only when a new frame arrives) [13]. Therefore, the current mainstream program is based on the user’s point-of-view information transmission quality uneven code stream program [14]. It is expected to ensure the quality of the image within the FOV at the same time, as far as possible to reduce the quality of redundant images [15]. When the user’s viewing direction changes slightly, the user does not need to wait for the arrival of new-frame data, and can complete the screen switch locally, solving the problem of lag and lag [16]. The server side dynamically adjusts the FOV position of the transmitted video based on the user’s uploaded viewpoint information to match the user’s FOV as much as possible and realize the dynamic pushing of the user’s viewpoint perception [17].
For the construction of non-uniform quality panoramic images, the most common solution is to split the panoramic video into different tiles and transmit different quality tiles according to the user’s FOV, which largely saves network bandwidth [18]. These solutions are based on HTTP adaptive streaming protocols such as DASH, and the extend temporal segmentation to spatial segmentation [19]. Firstly, the panoramic video is divided into multiple blocks in time and space, and then multiple quality versions are generated for each block separately [20]. According to the user’s point-of-view information, the appropriate quality version is selected for each block for transmission, and the closer the quality is to FOV, the higher the quality is, realizing dynamic adaptive pushing flow [21]. In essence, these methods are still based on ERP (equirectangular projection) panoramic map for block coding, and the quality of the image is relatively rigid between blocks, which affects the user’s viewing experience. Another type of methods are based on projection transformation, such as tetrahedral and cube projection. They use the classical map projection idea to divide the sphere into many spherical trapezoids and project them onto some kind of polyhedra, with the characteristics of small distortion and high compression efficiency [22].
While common polyhedral projection methods have equal size on each side, a pyramidal projection scheme proposed by Facebook projects the spherical surface onto a positive quadrilateral and uses the difference in projected areas of different surfaces to generate a non-uniform mass image with a clear bottom surface and blurred sides [23]. This method integrates the generation of non-uniform quality images into the projection transformation, and the image quality changes more naturally. The above studies mainly focus on the source perspective and reduce the redundant data by coding or projection methods. An VR transmission mechanism based on joint source-channel coding is proposed by Cheng et al. [24], which maximizes the viewing quality within the user’s FOV by using different levels of error protection after chunking the VR video with the user’s FOV information. In the study of [25], in order to measure the user’s viewing experience, a QoE (quality of experience) metric is defined and an efficient algorithm is presented for controlling the code rate and to maximize the QoE [26].
The research [27] investigates how to use SDN-based microcellular network architecture in 5G for multi-path collaborative VR video transmission, hoping to enhance the quality of VR video transmission using microcellular and edge data center (EDC) to improve the user experience. The main idea is that based on the pre-cached content in the edge data center, millimeter wave transmission of cached content is used between microcell sites to satisfy the requests of users within the coverage area microcell sites, thus satisfying the demand for low latency. The research [28] explores the transmission of VR video multicast in 5G networks and discusses the challenges of VR video multicast over 5G microcellular networks, while they propose a single frequency network (SFN)-based implementation and a millimeter wave band-based scheme, respectively [29]. In addition, a mechanism for VR video multicast transmission using D2D assistance in 5G is proposed in the research [30], which aims to efficiently utilize spectrum resources for VR video transmission in a multicast manner.
The research [31] proposes a multi-path layered transmission mechanism in 5G-based heterogeneous networks, where the base and augmented layer VR video content is mainly transmitted via WIFI, while the 5G network is mainly used for data retransmission as well as correction. The research [32] utilizes the mode of 5G MEC to investigate an adaptive VR video transmission scheme, where the transmission rate is minimized by optimizing the computation offloading and caching strategies with limited latency and energy consumption. The research [33] focuses on user viewpoint-based caching strategies in MEC and a viewpoint-aware caching strategy is presented to increase the hit rate of cached content. In the scenario of multiple micro-base stations with multi-user multicast, research [34] proposes a deep learning assisted scheduling and content quality adaptive ground transmission mechanism to solve the problem of VR video for micro-base station multicast transmission. The research [35] discusses the advantages of millimeter wave in VR video transmission and proposes a solution using mobile edge computing as well as pre-caching, aiming to provide reliable VR video transmission, taking the more interactive VR games as the research object. In the research [36], the problem of fusing millimeter wave communication, mobile edge computing, and pre-caching solution is further modeled, the corresponding solution algorithm is given, and the solution is verified by simulation to have lower latency compared to the conventional solution. Based on the MEC architecture, this paper proposes a resource optimization model based on the 5G smart edge, which can guarantee the service demand of mobile VR by collaboratively optimizing communication, computation and storage.
Given above analysis and description, we summarize main contributions of this paper as follows:
The latency model of mobile VR services based on 5G edge computing are formulated in order to adapt the reinforcement learning model.
We adopt reinforcement learning to tackle the offloading decision problem of mobile VR in edge computing. The proposed offloading algorithm is designed for improving the efficiency of edge resource.
We conduct considerable amount of simulation experiments to verify performance of the proposed computation offloading scheme.
System Model
Figure 1 shows the mobile VR service architecture with 5G smart edge converged communication, computation and storage. The architecture mainly includes mobile VR terminal, 5G access network, and MEC. this paper focuses on lightweight VR device and mobile terminal-based VR headset. 5G access network provides access services for users, and this paper adopts 5G typical network structure, where users transmit services through a remote radio head unit (RRH). The edge computing server is deployed near by the baseband pool (BBU), which provides storage and computing resources.
The latency of mobile VR services based on 5G edge computing mainly consists of two parts: content transmission latency and computation offloading latency, as shown in Figure 2, which illustrates the composition of computation offloading latency for offloading computation tasks to edge nodes. It is assumed that there are \begin{align*} {t_{b}} &= \frac {1}{d_{b}/U} \tag{1}\\ {t_{f}} &= \frac {1}{d_{f}/U} \tag{2}\\ {t_{w}} &= \frac {1}{{{B_{r,u}} \cdot {\log _{2}}(1 + {\beta _{r,u}})}} \tag{3}\end{align*}
Through multi-level caching based on interest and popularity prediction, the VR contents requested by users are then all at the C-RAN edge nodes. Therefore, the content transmission delay from the C-RAN edge node to the user terminal is
The computational task of Mobile VR user \begin{equation*} D_{u}^{n,\tau } = \sum \limits _{k}^{K} {\sum \limits _{m}^{M} {q_{n,\tau,m}^{k}} } \tag{4}\end{equation*}
In this paper, the following three computation modes are used: 1) the mobile terminal completes the computation task locally, 2) the computation task is offloaded to the edge node, and 3) a hybrid mode of local computation and edge node computation offloading. Therefore, the computational delay for local 3D scene reconstruction and rendering \begin{align*} {t_{l}} &= \frac {{C_{u}^{n,\tau }}}{H_{u}} \tag{5}\\ {t_{k}} &= \frac {{C_{u}^{n,\tau }}}{{{H_{u,k}}}} \tag{6}\\ {t_{h}} &= \frac {{\delta C_{u}^{n,\tau }}}{H_{u}} + \frac {{(1 - \delta)C_{u}^{n,\tau }}}{{{H_{u, k}}}} \tag{7}\end{align*}
Deep Reinforcement Learning-based Task Offloading Scheme
A. Problem formulation
The massive amount of information in mobile VR leads to high processing complexity, which puts a strong demand on the intensive computing capability of the system. VR information processing requires real-time completion of intensive computing tasks to ensure that users can obtain a natural and smooth experience. Due to the limitation of computing power, mobile terminals are difficult to support mobile VR intensive computing tasks in mobile VR computing tasks, mobile terminals. There are still obvious gaps in terms of computing power to meet the needs of mobile VR intensive computing. If the calculation is migrated to a remote cloud server, its latency performance is difficult to guarantee. Therefore, it is necessary to effectively make full use of computing ability of 5G mobile terminals from comprehensive views, so as to fulfill the intensive computing requirements of VR. Through the analysis of the intensive computing tasks of mobile VR, it is found that 3D scene reconstruction and rendering require the largest amount of computation in the whole process, which cannot be satisfied by the current computing power of mobile terminals, while such computing tasks as video/audio decoding can meet the computational requirements. Therefore, this paper will focus on two types of intensive computation tasks, namely 3D scene reconstruction and high realism rendering, to study the computation offloading strategy.
The time-varying characteristics of the wireless spectrum lead to fluctuations in the communication links of mobile VR users. In this paper, the time-varying characteristics of the wireless channel need to be considered in the computational offloading, and the computational offloading allocation algorithm for the channel quality of mobile terminals will be designed. The computing capacity of edge computing nodes is limited to a certain extent, especially in the current situation that edge computing nodes cannot meet the offloading of intensive computing tasks for multiple users under the nodes, and it is necessary to offload some computing tasks to the core network edge computing nodes to meet the offloading demand of intensive computing. Based on the above analysis, this paper models the offloading optimization problem of VR computing tasks as the following problem:\begin{align*} &\min \limits _{{\mathbf{A}},{\mathrm{ B}},\Delta } \sum \limits _{u}^{U} {\sum \limits _{\tau} ^{\Gamma} {{t_{de}} + \big(\alpha {t_{l}} + \lambda {t_{k}} + \big(1 - \alpha - \lambda\big)} } {t_{h}}\big) \\ & s.t. {\mathrm{ }}{t_{\max }}C_{u}^{n,\tau } \le {H_{u}} \\ &\hphantom { s.t. }{\mathrm{ }}{t_{\max }}\big(1 - \delta\big){\mathrm{ }}\sum \limits _{U} {C_{u}^{n,\tau }} \le \sum \limits _{U} {{H_{u,k}}} \\ &\hphantom { s.t. }{\mathrm{ }}a \in \{ 0,1\} \\ &\hphantom { s.t. }{\mathrm{ }}\lambda \in \{ 0,1\} \tag{8}\end{align*}
\begin{equation*} t_{de} = D_{u}^{n,\tau }({t_{f}} + {t_{w}}) \tag{9}\end{equation*}
B. Proposed DQN algorithm
Markov Decision Process (MDP) is the most classic in reinforcement learning A modeling approach, it is a mathematical model of sequential decision making used to de-model in a system the stochasticity of policies achievable by an intelligence with the reward values returned by the environment. When MDP is used to deal with the ofloading decision problem of edge computing, the main objects can be represented by a triple
\begin{align*} {R_{i}}& = \begin{cases} \displaystyle { - {t_{i}}}\\ \displaystyle {\ln \left({1 - \frac {1}{{{e^{\sqrt {t_{i}} }}}}}\right)} \end{cases} \tag{10}\\ G &= \sum \limits _{t = 1}^{T} {R(s_{t}', a_{t}',{s_{t + 1}})} \tag{11}\end{align*}
The final objective is:\begin{equation*} \mathop {Max}\limits _{A} E\left[{\sum \limits _{t = 1}^{T} {R(s_{t}',a_{t}',{s_{t + 1}})}}\right] \tag{12}\end{equation*}
The MDP problem can be modeled to a quadratic
As shown in Figure 3, the deployment of the intelligent decision algorithm comprised of two components: the environment and the intelligent agent. The intelligent agent responsible for offloading decisions for edge computing is deployed on the core router, and the task information generated on the device is handed over to the intelligent agent for decision making. The intelligent agent is deployed with a DDPG-based offloading algorithm for edge computing. DDPG employs a policy \begin{equation*} {\nabla _{\theta} }J({\pi _{\theta} }) = {E_{s,{\rho ^{\pi} }}}[{\nabla _{\theta} }{\pi _{\theta} }(s){\nabla _{\pi} }{Q_{\pi} }(s,a)|a = {\pi _{\theta} }(s)] \tag{13}\end{equation*}
As depicted in Figure 3 the DRL-based algorithm for distributed dynamic resource optimization consists of four components. They all have their own identical structures. First, the agent gets the current network state through having interaction with the environment. And it records the present state vector, action vector, reward and next state vector as \begin{equation*} loss = \frac {1}{m}\sum \limits _{j = 1}^{m} {{{({y_{j}} - {\hat y_{j}})}^{2}}} \tag{14}\end{equation*}
\begin{equation*} {\hat y_{j}} = Q(\phi ({S_{j}},{a_{j}},w) \tag{15}\end{equation*}
Once the loss function has been obtained, the agent then updates all the parameters of Critic network by passing the gradient backwards through \begin{align*} {w^{\prime} } &\leftarrow \tau w + (1 - \tau){w^{\prime} } \tag{16}\\ {\theta ^{\prime} } &\leftarrow \tau \theta + (1 - \tau){\theta ^{\prime} } \tag{17}\end{align*}
In contrast to DQN, the proposed algorithm utilizes a step-wise update method, where only a part of the parameters are updated each time. At the same time, aiming at the achieving better exploration of the whole solution space, the learning process incorporates the introduction of noise \begin{equation*} A = {\pi _{\theta} }(S) + \eta \tag{18}\end{equation*}
\begin{equation*} J(w) = \frac {1}{m}\sum \limits _{j = 1}^{m} {{{({y_{j}} - {\hat y_{j}})}^{2}}} \tag{19}\end{equation*}
\begin{equation*} {\nabla _{\theta} }J({\pi _{\theta} }) = {E_{s,{\rho ^{\pi} }}}\left ({{\Omega _{1} \cdot {\Omega _{2}}} }\right) \tag{20}\end{equation*}
\begin{align*} {\Omega _{1}} &= {\nabla _{\pi} }{Q_{\pi} }(s,a){|_{s = {s_{i}},a = {\pi _{\theta} }(s)}} \tag{21}\\ {\Omega _{2}} &= {\nabla _{\theta} }{\pi _{\theta} }(s)|s = {s_{i}} \tag{22}\end{align*}
The DDPG-based computation offloading algorithm encompasses two components. At first the decisions executed by the intelligent agent resemble those of the stochastic algorithm, and after the DDPG-based computation offloading algorithm learns from the interaction, the offloading policy becomes closer to the optimal. The parameters of the trained DDPG neural network will be updated at the end of the learning iteration.
Simulation and Analysis
A. Setup
In this paper, the simulation of edge environment is conducted using the Networkx library in python, while the construction of the proposed algorithm in this paper is achieved through the application of tensorflow. This experiment divides the edge servers into three layers, which includes the terminal layer, the edge layer and the cloud layer. The network model assumes that there are 3–7 wireless edge servers and in the edge layer,there are 2–5 wired edge servers, and the wireless edge servers are served in a certain range. Initially, for each AP,there exist N=3 edge devices within the service range, and each edge in each time slot follows the Poisson distribution for the probability of offload requests. The computing capacity of each edge server is set from 4GHZ to 6GHZ, the computation capacity of the cloud is 10GHZ, and the computation capacity of each edge is set as 0.5GHZ. The bandwidth between the edge server and terminal is set as 100MB/S. The bandwidth between edge server and the wireless terminal is set as 50MB/S. The bandwidth among different edge servers is set as 300MB/S. The transmission delay among edge servers is set as 10ms. And for link between the edge server and the cloud server, transmission delay is set as 30ms. It is assumed that both of the Actor network and Critic network are set as a three-layer structure. And the second fully connected layer is assumed to consist of 200 neurons. The neural network for this experiment is implemented by Tensorflow, and the hyper-parameters of networks are given in Table 1. The specific parameters of the compared DQN are presented in Table 2.
B. Results
In this paper, firstly, we compared the convergence of this algorithm throughout the training process, followed by a comparison with the performance of other algorithms in the identical scenario. As illustrated in Figure 4, the average task latency drops rapidly in the first 30000 training rounds. However, after 30000 of training rounds, the task latency stabilizes at 13.8 ms or less. As shown in Figure 5, the task completion rate increases significantly in the first 20000 rounds of training, and the training converges to 85%-90% after the 20000th iteration. The above results show that the training effect slows down after 20000 iterations of the algorithm, and the training convergence is achieved.
In the simulation, we are trying to examine the performance of the proposed method with different situation. As shown in Figure 6, the number of users and mobile edge is fixed, and with the increase of computation of mobile edge, the latency of tasks are decreased. As shown in Figure 6, the DDPG algorithm is significantly better than the local computation and server computation mode, and the DDPG is also better than the DQN since the action space of the DQN algorithm is discrete values. when the server computation capacity improves, the average task latency of offload method decreases except for the terminal local computation, and the local computation is limited by bandwidth and the computation capacity of the terminal’s processor. Then, the relationship between the number of servers and the task latency expectation is given in Figure 7, the capacity of edge servers and the number of users (i.e. the computation tasks) are fixed, with the increase of the number of edge servers, the individual task latency expectation is gradually decreasing, limited by the bandwidth between servers. From the figure we can see that with the increase of servers count to 8, it is not obvious to increase to improve the effect. The effect of the number of mobile terminals on the expected task delay is shown as Figure 8. We observe that the DDPG-based algorithm outperforms better the rest of the algorithms, and the expected task delay experiences a rapid surge with the growing number of mobile terminals. In the context of mobile edge computing, the growing number of mobile terminals results in a corresponding rise in task offloading requests during each time slot, consequently amplifying the computational burden on the edge server.
Figure 9 shows the task success rate for each algorithm. It is evident that the DDPG-based task offloading decision mechanism outperforms other mechanisms when considering the same task queue, followed by the probability of completing the task within the specified time frame for DQN and edge server offloading computation, and the local computing scheme without task computation offloading has difficulty in meeting the computation time frame requirements of the application. The reason for this result is that the action space of DDPG is continuous. Therefore, the action granularity of computation offloading is more delicate and precise than that of the DQN-based computation offloading mechanism. The computation offloading method in which the tasks are all executed on the edge server will waste the computation resources of the end device and the cloud server, so although the edge server computation offloading method can better meet the requirements of the device application, the service effect is still inferior to the two methods based on DRL. End devices with weak processors will have difficulty in meeting the computationally intensive tasks in the task queue without using compute offloading services.
Conclusion
This paper utilizes reinforcement learning to address the offloading decision issue concerning mobile VR in edge computing. First, The latency model of mobile VR services in 5G edge computing network are formulated in order to adapt the reinforcement learning model. Then, a DDPG-based offloading algorithm is designed for improving the efficiency of edge resource. Finally, Plenty simulations are conducted to verify that the proposed DDPG-based resource offloading algoirthm can improve the performance of edge computing offloading service.