Introduction
With the increase in population and the development of urbanization, the transportation system is facing unprecedented pressure [1]. Many V2X (vehicle-to-everything) services have emerged to adapt to complex traffic situations and offer enjoyable driving experiences. Up to now, the Third Generation Partnership Project (3GPP) has defined 57 use cases of V2X [2], [3], containing V2V (vehicle to vehicle) services, V2P (vehicle to pedestrian) services, V2I (vehicle to infrastructure) services, and V2N (vehicle to network) services. Different from conventional services for stationary or low-speed equipment, V2X services own exclusive transmission features and key performance indicators (KPIs). To reflect how various V2X services influence the performance of the internet of vehicles (IoV), representative use cases are detailedly summarized in Table I. It is not difficult to see that there are extremely diversified and even conflicting service characteristics among use cases, which poses critical pressure on the networking infrastructure [4].
Network slicing has emerged as a promising paradigm to meet diverse service demands. It enables multiple independent logical networks (i.e., slices) to run on a common physical network infrastructure [5], [9]. However, as V2X applications advance, the predefined slice for ultra-reliable low-latency communications (URLLC) can hardly meet more and more stringent and heterogeneous service characteristics by one-shot resource allocation [6], [7], [8]. Taking the advancements in existing studies, three types of slices are proposed to accommodate existing and future V2X use cases without excessively segmenting network resources. Specifically, the slices for basic road safety services, enhanced road safety services, and non-safety related services are used to deliver basic driving information, achieve high-level automatic driving, and improve driving comfort and efficiency, respectively. The illustration of representative use cases and their corresponding slices are depicted in Fig. 1.
Illustration of typical V2X applications in vehicular networks. The basic road safety services slice provides position, heading, speed, etc; the enhanced road safety services slice provides raw sensor data, vehicles intention data, coordination, confirmation of future maneuvers, and so on; the non-safety related services slice provides traffic flow optimization and software updates. An exclusive “slice sandwich” for each V2X services slice is made up of jagged multi-dimensional resources.
Unfortunately, constrained by computing capability or transmission delay, it is difficult to process multiple tasks by a single paradigm [11], [12], [13]. Multi-tier computing as a new system-level computing architecture provides a new resolution for the problem. It involves three tiers with the users at tier one, edge cloud at tier two, and remote cloud at tier three [14]. By reasonably orchestrating available resources along its continuum, the strict KPIs of each slice are expected to be met. However, exploiting this hierarchical computing architecture for service provisioning entails joint allocation of multi-dimensional resources [15], [16]. Besides, the high mobility of vehicles introduces more complexity to resource management [17], [18], [19]. How to effectively allocate multi-tier resources to multiple slices according to time-varying network conditions is a thorny problem.
To cope with the problem, existing studies usually adopt hierarchical resource allocation methods [20], [21], [22], [23], [24]. Although these studies obtained certain results in improving resource utilization, they are not applicable to the IoV. That is because they ignored the exclusive characteristics of V2X services and the importance of multi-tier dynamic resources. Thus, considering the spatiotemporal correlation between service traffic and physical resources [25], a Two-Time-scale Resource Management Scheme (2Ts-RMS) is proposed. Specifically, the scheme is divided into two stages, namely inter-slice resource configuration and intra-slice resource scheduling. At the beginning of each large timescale (i.e., period), the infrastructure provider (InP) configures resources for service providers (SPs) according to service traffic. Due to the long-term trend of the service traffic, the configuration policy remains unchanged within each large timescale. The SPs create customized slices with obtained multi-dimensional resources. Because the inherent characteristics of slices make its demand for multi-dimensional resources appear jagged, the shape of a “slice sandwich” is naturally formed. Then, to adapt the real-time status of the physical layer, each SP will dynamically schedule available resources at each small timescale (i.e., slot) of a large timescale to provide high-quality services for its subscribers. In this way, system revenue could be maximized while guaranteeing the delay and reliability requirements of mobile users.
It is noted that the resource configuration made in the InP will influence the scheduling process in SPs; meanwhile, the performance of SPs will also affect the decision-making of the InP. The interaction between InP and SPs makes it very challenging to implement conventional mathematical methods to solve the proposed problem. Deep Reinforcement Learning (DRL) as intelligent approaches provide promising solutions to the challenge. In the stage of inter-slice resource configuration, since the status of service requests is only up to the users, it does not change by the selected policy of resource configuration. Thereupon, a Joint Allocation algorithm of Multi-dimensional Resources (JAMR) based on the improved NeuralUCB (Neural bandits with Upper Confidence Bounds) approach is proposed. The algorithm can effectively avoid the curse of dimensionality and learns the unknown system revenue. As for the intra-slice resource scheduling problem, state transitions need to be considered, because the scheduling policies of resource allocation and task offloading will generate different effects on the physical layer states. To adapt to the time-varying physical layer, a Joint Offloading and Resource Allocation algorithm based on the Double Deep Q Network (JORA-DDQN) approach is proposed to obtain optimal scheduling policies. The major contributions of this paper are summarized as follows.
Three types of refined network slices for V2X services are proposed to simultaneously accommodate multiple V2X services over a common infrastructure.
In view of differentiated KPIs among slices, a dual Timescale Intelligent Resource Management Scheme (2Ts-IRMS) is proposed to jaggedly divide multi-tier resources into multiple slices in time-varying IoV.
In order to fit reality, real world road conditions and traffic models are set up in Simulation of Urban Mobility (SUMO). Numerical experiments using Pytorch verify that the proposed scheme can more economically and efficiently utilize network resources.
The rest of this paper is organized as follows. Section II presents an overview of the related works. In Section III, we describe the considered system framework. Section IV presents the two timescale resource allocation problem. In Section V, the solutions based on JAMR and JORA-DDQN are proposed. Section VI evaluates the network performance and compares its performance with some benchmarks. Finally, Section VII concludes the paper.
Related Work
A. Network Slicing for V2X Services
Up to now, the 3GPP has defined standardized slices to support enhanced mobile broadband (eMBB), URLLC, and massive machine type communication (mMTC) [29], [30]. With the evolution of V2X services, more and more rigorous and heterogeneous KPIs need to be satisfied. Mapping V2X services into existing reference slices or a single V2X slice is no longer appropriate [32], [33]. Network slicing for concrete application scenarios is still emerging, especially for vehicular scenarios [34]. In [35], the authors customized slices for safety and non-safety V2X services, respectively. According to the sensitivity of V2X services to delay, Wu et al. proposed delay-sensitive and delay-tolerant slices [36]. As described in Table I, there are great differences between basic road safety services and enhanced road safety services. One slice for safety or delay-sensitive V2X services is still insufficient to simultaneously cope with the differences.
For dealing with this problem, Campolo et al. designed four slices for autonomous driving, tele-operated driving, remote diagnostic, and vehicular infotainment [32]. Similarly, the authors proposed a general network slicing architecture for four typical use cases, namely localization and navigation, transportation safety, autonomous driving, and infotainment services [34]. A common problem of the aforementioned studies is that the validity of the proposed schemes did not be verified. The complexity of slicing management increases with the number of slices. Dividing V2X services into three slices is a more reasonable solution, which is similar to the slicing way for traditional mobile services. In [31], Ge et al. proposed three types of service slices, which are used to transmit state-report, event-driven, and entertainment-application messages, respectively. In [39], Cui et al. divided the common network infrastructure into three slices to provide short message service, call service, and internet service for vehicles. Different from the existing studies, the proposed slices in this paper fully consider the exclusive characteristics (i.e., transmission features and KPIs) of V2X services. They can cover all V2X use cases defined in [2], [3] without excessively segmenting resources.
B. Resource Allocation for Network Slicing
In addition to slicing services with benign granularity, it is important to effectively allocate resources among slices. In [42], the authors developed a fuzzy logic-based resource allocation algorithm to simultaneously satisfy the diversified requirements of V2X services. Although the scheme achieved higher resource utilization, its computation complexity is high as the InP directly allocates its resources to users. Most of the existing studies tend to adopt hierarchical resource allocation methods to reduce the burden of the InP. In [6], Han et al. proposed a two-dimension-time-scale resource allocation scheme including inter-slice resource pre-allocation in large time periods and intra-slice resource scheduling in small time slots. The scheme achieves a near-optimal tradeoff among the performance of slices. In [20], Mei et al. designed a slicing strategy with two-layer control granularity. The upper-level and lower-level controllers are used to guarantee the quality of services and improve the spectrum efficiency of each slice, respectively. However, these efforts only concentrate on spectrum resource allocation. The significance of computing resources is ignored, which are necessities to satisfy the KPIs of V2X services.
To address the multi-dimensional resources allocation issue, Mohammed et al. proposed a multi-dimensional resources slicing scheme [49]. Both the InP and SPs adopt the dominant resource fairness (DRF) approach to allocate multi-dimensional resources. In [23], the authors introduced a generalized Kelly mechanism (GKM) to address the multi-dimensional resource allocation issue between the InP and SPs. Meanwhile, each SP utilizes Karush–Kuhn–Tucker (KKT) conditions to derive the optimal scheduling strategy of communication resources. Although these studies make progress in improving the aggregate revenue of SPs, they cannot be directly applied to the IoV with multiple V2X services. On the one hand, when the InP equally treats all slices, it is hard to guarantee road safety in real-world situations. On the other hand, the differentiated characteristics of multi-tier computing resources have not been extensively explored, which will further cut down the system revenue. In our work, we adopt intelligent approaches to economically allocate multi-tier resources to multiple V2X slices while guaranteeing the delay and reliability requirements of mobile users.
C. DRL-Enabled Network Slicing
In the dynamic IoV, conventional mathematical models face with high computation complexity and lack adaptability and robustness. Advanced DRL algorithms have been widely applied in network slicing [38]. From the perspective of the effect of the action on the status, DRL can be divided into DRL based on multi-armed bandits (MAB) and DRL based on Markov Decision Process (MDP) [26], [27]. Because the policies of resource allocation and task offloading will generate different effects on the physical layer states, the intra-slice resource scheduling problem is usually formulated as an MDP problem. In [22], Chen et al. leveraged the DDQN algorithm to learn the optimal policies of packet scheduling and computation offloading. In [20], the authors further verified the effectiveness of DDQN in jointly optimizing resource allocation and computation offloading. In this paper, each SP equipped with an exclusive agent implements resource scheduling to guarantee the isolation among slices.
As for the inter-slice resources configuration problem, it is impossible to find the optimal configuration policy before the end of a period. That is because the future status of the IoV is unknowable. In addition, it is impractical to traverse all configuration policies at each period. Taking advantage of the characteristic that the policy of resource configuration will not change the status of service requests, many studies adopt DRL algorithms based on MAB to learn the unknown reward function. In [44], Zanzi et al. developed a radio slicing orchestration scheme based on MAB. With no prior knowledge of channel quality statistics, SPs can make adaptive slicing decisions. In [45], Zhao et al. formulated resource configuration as a contextual MAB problem and adopted the upper-confidence-bound (UCB) algorithm to solve it. However, these studies assumed a linear relationship between the expected reward and the context vector. Furthermore, the effectiveness of DRL algorithms based on MAB will be greatly reduced when the number of candidate actions is large. The curse of dimensionality is inevitable when we jointly consider multi-dimensional resources. Therefore, in this paper, we design a pre-allocation mechanism based on service priories and adopt the NeuralUCB algorithm to obtain an optimal configuration policy of multi-dimensional resources.
System Model
This section describes the system model in detail. Specifically, we first present the network model (Section III-A) and multi-tier resources model (Section III-B) of the IoV. Then, we elaborate on the process of transmission (Section III-III-C) and offloading (Section III-D) for vehicular tasks. Finally, key performance indicators of V2X services will be presented (Section III-E). For convenience, Table II summarizes the major notations of this paper.
A. Network Model
The physical infrastructure of the IoV mainly includes a macro base station (MBS) connected to remote cloud servers, roadside units (RSUs) equipped with MEC servers, and vehicular user equipment (VUEs) with diverse numbers of vehicular computing units. Note that an RSU essentially is a statically logical entity. It supports V2X applications by using the functionality provided by a 3GPP network or user equipment (UE) [46]. Thus, we assume all UEs, which consist of VUEs and RSUs, are within the coverage of the MBS and VUEs can only access the internet via RSUs. Let
As mentioned in Table I, there are great differences among V2X services. Therefore, we propose three kinds of network slices to reflect the differences without excessively segmenting resources. Specifically, the three kinds of network slices embrace the slice for basic road safety services, the slice for enhanced road safety services, and the slice for non-safety related services. The specific characteristics and requirements of each slice are described as follows.
\begin{align*}
{r_{{{N}_{m}},j,t}^{{{N}_{m^{\prime }}}}=} \left\lbrace \begin{array}{llll}B{{\log }_{2}}\left(1+\frac{p_{{{N}_{m}}}^{{{N}_{m^{\prime }}}}\cdot h_{{{N}_{m}},j,t}^{{{N}_{m^{\prime }}}}}{{{\sigma }^{2}}}\right), & \text{for the long packet transmission}; \\
B{{\log }_{2}}\left(1+\frac{p_{{{N}_{m}}}^{{{N}_{m^{\prime }}}}\cdot h_{{{N}_{m}},j,t}^{{{N}_{m^{\prime }}}}}{{{\sigma }^{2}}}\right)-\sqrt{\frac{V_{{{N}_{m}},j,t}^{{{N}_{m^{\prime }}}}}{{{\tau }_{{{N}_{m}}}}}}\cdot \frac{{{G}^{-1}}(\varpi) }{\ln 2}, & \text{for the short packet transmission,} \end{array}\right.\\
\tag{1a}\\
\tag{1b}
\end{align*}
The slice for basic road safety services is mainly aimed at the services that require high timeliness and reliability but low data rates, such as collision warnings and emergency stops. V2V is the prevalent radio access technology to satisfy the requirements of latency and reliability. Note that the packet size of basic safety services is usually small. Thus, instead of offloading tasks to MEC servers, vehicular computing resources are sufficient to process them.
The slice for enhanced road safety services aims to enable high-level autopilot. Compared to basic road safety services, the slice requires higher reliability, data rate, beacon frequency, and lower latency. Similarly, to effectively transmit messages among vehicles, low-latency V2V communication is the main communication mode. Due to the limited processing capability of VUEs and the long transmission latency of remote cloud servers, a proportion of data processing should be performed in MEC servers.
The slice for non-safety related services has a low sensibility to delay and reliability, but usually has high requirements of data rate. As a result, it is expected to use multiple access technologies to seek higher throughput and to process tasks in MEC servers or cloud servers.
In this paper, an SP corresponds to a slice and provides a class of V2X services. Therefore, we will not distinguish the concepts of slice and SP in the following text. To facilitate analysis, let
B. Multi-Tier Resources Model
As described above, each SP simultaneously needs computing resources and communication resources to service its users. The inherent attributes (i.e., KPIs and transmission features) of V2X services make their demand for multi-dimensional resources appear jagged. Thus, the jagged resource slicing on the multi-tier computing architecture is adopted in this paper. Generally, the architecture tends to use three tiers with users at tier one, edge cloud at tier two, and remote cloud services at tier three. Before determining the most suitable communication method and computing location for any service, the hierarchical and distributed characteristics of multi-dimensional resources should be considered. In the tier of terminal devices, vehicular computing resources usually have relatively small computing capabilities. The purpose of local execution is to reduce communication delay and errors caused by transmission and protocols. Significantly, a VUE can concurrently subscribe to multiple slices in our system model, which is consistent with actual cases. Therefore, let
As for the edge tier, MEC servers have powerful computing capabilities. However, the computing resources of each MEC server are limited. It means that only a part of VUEs can offload their computing tasks to MEC servers by V2I links. To guarantee isolation among slices, the shared edge computing resources
Illustration of the jagged allocation of virtualized multi-tier resources to refined network slices of V2X services, compared to flat slicing resources to three generic usage scenarios of 5G.
C. Signal Transmission Model
In conventional services, the data rate of large-sized packets can be directly calculated through the Shannon formula. However, unlike conventional services, the packet size of most V2X services is short, which ranges from 32 to 200 bytes [20]. Since the negative effect of channel dispersion and coding length, the data rate of a short packet cannot be accurately obtained by the Shannon formula. In [47], based on finite block-length theory, a new method used to approximately calculate the data rate of short packets is proposed. Therefore, the available data rate between VTx
\begin{equation*}
V_{{{N}_{m}},j,t}^{{{N}_{m^{\prime }}}}=1-{{\left(1+\frac{p_{{{N}_{m}}}^{{{N}_{m^{\prime }}}}\cdot h_{{{N}_{m}},j,t}^{{{N}_{m^{\prime }}}}}{{{\sigma }^{2}}} \right)}^{-2}}. \tag{2}
\end{equation*}
In addition, during the phase of data transmission, each V2V link maintains an individual queue to buffer the arriving packets. The packet is delivered according to the first-come-first-serve policy [22]. As for link
\begin{equation*}
{{W}_{l,t+1}}=\min \lbrace {{W}_{l,t}}-\Delta t\cdot r_{l,t}^{v}/Z{}_{l}+{{A}_{l,t}},W_{l}^{\max }\rbrace, \tag{3}
\end{equation*}
\begin{equation*}
r_{l,t}^{v}=\sum \limits _{j\in \mathcal {J}}{{{\rho }_{l,j,t}}\cdot r_{{{N}_{l}},j,t}^{{{N}_{l^{\prime }}}}}. \tag{4}
\end{equation*}
D. Task Offloading Model
In this paper, we consider a hybrid computation offloading scenario [21]. The computing task of a vehicle can be executed locally. It can also select to be offloaded to the MEC server by V2I communication or the remote cloud computing servers through relayed V2I and high-speed fronthaul links. As for the computing task of link
\begin{equation*}
{D_{l,b,t}^{cp}=}\left\lbrace \begin{array}{ll}\frac{{{Z}_{l}}\cdot {{\beta }_{l}}}{Y_{m,i}^{v}\cdot {{f}_{v}}}, & {{e}_{l,t}}=0; \qquad \qquad \qquad \qquad (5\mathrm {a})\\
\frac{{{Z}_{l}}\cdot {{\beta }_{l}}}{{{f}_{u}}}+\frac{{{Z}_{l}}}{r_{l,t}^{u}}, & {{e}_{l,t}}=1; \qquad \qquad \qquad \qquad (5\mathrm {b})\\
\frac{{{Z}_{l}}\cdot {{\beta }_{l}}}{{{f}_{c}}}+\frac{{{Z}_{l}}}{r_{l,t}^{u}}+{{t}_{c}}, & {{e}_{l,t}}=2,\qquad \qquad \qquad \qquad (5\mathrm {c}) \end{array}\right.
\end{equation*}
\begin{equation*}
r_{l,t}^{u}=\sum \limits _{j\in \mathcal {J}}{{{\rho }_{l,j,t}}\cdot r_{{{N}_{l}},j,t}^{{{N}_{0}}}}. \tag{6}
\end{equation*}
E. Key Performance Indicators
As defined in [3], whole end-to-end (E2E) communication refers to the process that transfers a given piece of information from a source to a destination at the application level. Generally, the E2E delay consists of waiting time in the queue, transmission time, network latency, and processing latency [48]. In this paper, we have assumed that all VUEs are in the coverage of the RSUs and they can only grasp data from RSUs. Consequently, it is reasonable to ignore the network delay during the process of data receiving. Thus, we mainly consider waiting, transmission, and processing delays. At slot
\begin{equation*}
D_{l,b,t}^{^{\text{E2E}}}=D_{l,b,t}^{cw}+D_{l,b,t}^{ct}+D_{l,b,t}^{cp}, \tag{7}
\end{equation*}
In addition to delay, reliability is another key performance indicator [10]. From the view of service provisioning, the probability of receiving or dropping data packets is usually used as a measure of reliability [42]. When the delay of a packet exceeds the maximum tolerant delay, we consider the packet as dropout, otherwise as receiving. In this paper, we choose the packet reception ratio (PRR) as the index to evaluate reliability, which can be expressed as
\begin{equation*}
{{\varphi }_{l,t}}=\Pr \left\lbrace D_{l,b,t}^{^{\text{E2E}}}< D_{l}^{^{\max }} \right\rbrace, \tag{8}
\end{equation*}
Problem Formulation
In this paper, the resource allocation problem is decomposed into two stages. First, at the beginning of each large-time period
Schematic of two-time scales resource allocation. First, at the beginning of each large-time period, the InP allocates shared physical resources to SPs (inter-slice resource configuration). Then, each SP elastically assigns exclusive resources to its users at each small slot (intra-slice resource scheduling).
A. Large Timescale Problem Formulation
At the beginning of each period
\begin{equation*}
V({{C}_{k}})=\sum \limits _{i\in \mathcal {I}}{\left[ v({{c}_{i,k}})-{{q}^{cm}}{{J}_{i,k}}-{{q}^{cu}}Y_{i,k}^{u}-{{q}^{cc}}Y_{i,k}^{c} \right]} \tag{9}
\end{equation*}
\begin{align*}
& \text{P1 : }\underset{{{C}_{k}}}{\mathop {{\max}}}\; \left[ V({{C}_{k}}) \right] \\
& \text{subject to : } \\
& \text{C1 : }\sum \limits _{i\in \mathcal {I}}{Y_{m,i,k}^{v}\leq Y_{m}^{v}; \ \forall m\in [1,M],k\in \mathcal {K}}, \\
& \text{C2 : }\sum \limits _{i\in \mathcal {I}}{{{J}_{i,k}}}\leq J; \ \forall k\in \mathcal {K}, \\
& \text{C3 : }\sum \limits _{i\in \mathcal {I}}{Y_{i,k}^{u}}\leq {{Y}^{u}}; \ \forall k\in \mathcal {K}, \\
& \text{C4 : }Y_{m,i,k}^{v}\in [0,Y_{m}^{v}]; \ \forall m\in [1,M],i\in \mathcal {I},k\in \mathcal {K}, \\
& \text{C5 : }{{J}_{i,k}}\in [0,J]; \ \forall i\in \mathcal {I},k\in \mathcal {K}, \\
& \text{C6 : }Y_{i,k}^{u}\in [0,{{Y}^{u}}]; \ \forall i\ne i^{\prime }\in \mathcal {I},k\in \mathcal {K}, \\
& \text{C7 : }\mathcal {Y}_{_{m,i,k}}^{v}\cap \mathcal {Y}_{_{m,i^{\prime },k}}^{v}=\varnothing ; \ \forall m\in [1,M],i\ne i^{\prime }\in \mathcal {I},k\in \mathcal {K}, \\
& \text{C8 : }{{\mathcal {J}}_{i,k}}\cap {{\mathcal {J}}_{i^{\prime },k}}=\varnothing ; \ \forall i\ne i^{\prime }\in \mathcal {I},k\in \mathcal {K}, \\
& \text{C9 : }\mathcal {Y}_{i,k}^{u}\cap \mathcal {Y}_{_{i^{\prime },k}}^{u}=\varnothing ; \ \forall i\ne i^{\prime }\in \mathcal {I},k\in \mathcal {K}, \tag{10}
\end{align*}
B. Small Timescale Problem Formulation
After determining the resource configuration for all slices, each slice utilizes acquired multi-dimensional resources to provide services for its subscribers to maximize the fee charged by it. As for slice
\begin{equation*}
{{U}_{l,t}}={{\alpha }_{d}}\cdot U_{_{l}}^{(1)}(D_{l,t}^{^{\text{E2E}}})+{{\alpha }_{r}}\cdot U_{_{l}}^{(2)}({{\varphi }_{l,t}}), \tag{11}
\end{equation*}
\begin{align*}
U_{_{l}}^{(1)}(D_{l,t}^{^{\text{E2E}}})=\left\lbrace \begin{array}{llll}\exp (-D_{l,t}^{^{\text{E2E}}}), & D_{l,t}^{^{\text{E2E}}}\leq D_{l}^{^{\max }}; \qquad (12\mathrm {a})\\
\exp (-D_{l,t}^{^{\text{E2E}}})-{{\psi }_{i}},& D_{l,t}^{^{\text{E2E}}}>D_{l}^{^{\max }}, \qquad (12\mathrm {b}) \end{array}\right.\\
U_{_{l}}^{(2)}({{\varphi }_{l,t}})=\left\lbrace \begin{array}{llll}\exp (-(1-{{\varphi }_{l,t}})), & {{\varphi }_{l,t}}\geq \varphi _{l}^{^{\min }}; \qquad (13\mathrm {a})\\
\exp (-(1-{{\varphi }_{l,t}}))-&{{\psi }_{i}},{{\varphi }_{l,t}}< \varphi _{l}^{^{\min }}, (13\mathrm {b}) \end{array}\right.
\end{align*}
\begin{align*}
& \text{P2 : }\underset{{{e}},{{\rho }}}{\mathop {\max }}\;[v({{c}_{i,k}})]=\underset{{{e}},{{\rho }}}{\mathop {\max }}\;\left[ q_{i}^{se}\sum \limits _{t=1}^{T}{\sum \limits _{l\in {{\mathcal {L}}_{i,k}}}{{{U}_{l,t}}}} \right] \\
& \text{subject to : } \\
& \text{C10 : }\sum \limits _{l\in {{\mathcal {L}}_{i,k}}}{{{\rho }_{l,j,t}}\leq 1;\forall t\in [1,T],j\in {{\mathcal {J}}_{i,k}}}, \\
& \text{C11 : }\sum \limits _{l\in {{\mathcal {L}}_{i,k}}}{\sum \limits _{j\in {{\mathcal {J}}_{i,k}}}{{{\rho }_{l,j,t}}}}\leq {{J}_{i,k}};\forall t\in [1,T], \\
& \text{C12 : }{{e}_{l,t}}\in [0,1,2];\forall t\in [1,T],l\in {{\mathcal {L}}_{i,k}}, \qquad \qquad \ \ \\
& \text{C13 : }\sum \limits _{l\in {{\mathcal {L}}_{i,k}}}{{{e}_{l,t}}=1}\leq Y_{_{i,k}}^{u};\forall t\in [1,T], \tag{14}
\end{align*}
Dual Timescale Intelligent Resource Management Scheme
The optimization problems described in Section IV are difficult to solve as they are NP-hard problems. In addition, the IoV needs an intelligent resource management scheme to adapt to dynamic network conditions. Hence, in this section, a novel 2Ts-IRMS is proposed to address the resource allocation problem in the IoV. Specifically, we adopt the proposed JAMR algorithm to address inter-slice resource configuration at each large-time period (Section V-A) while the JORA-DDQN algorithm is used to solve intra-slice resource scheduling at each small-time slot (Section V-B).
A. Inter-Slice Resource Configuration
At large timescales, a central question is how the InP allocates multi-tier resources to SPs to maximize system revenue. Obviously, it is impossible to find the optimal resource configuration of P1 in (10) before the end of period
Algorithm 1: Inter-Slice Vehicular Computing Resources Configuration Based on Service Priorities.
for
Input:
for
Input:
Compute
Compute
if
Let
Let
else
Let
Let
end for
end for
Return
1) Vehicular Computing Resources
Different from other application scenarios, the IoV contains a large number of safety-related services. It is necessary to guarantee their resource requirements to avoid traffic accidents, at first. As described in Section III, when vehicular computing resources are sufficient, local execution is the first choice to process the computing tasks of safety-related services. It can avoid unnecessary transmission delay and error. Thus, based on the consideration of service priorities, we first explore vehicular computing resource allocation among slices. Obviously, safety-related services have a higher service priority than non-safety-related services in practical IoV. As for safety-related services, we consider that basic road safety services have a higher service priority than enhanced road safety services. That is because basic driving functions should be guaranteed at first under the condition of insufficient computing resources. To facilitate analysis, the slice for basic road safety services, the slice for enhanced road safety services, and the slice for non-safety services are denoted as
It is noteworthy that the real-time policies of task offloading and resource scheduling at small timescales have little dependence on vehicular computing resource configuration. Furthermore, the computing resources of each vehicle only can be used by themselves. Therefore, service requests can be considered the only influencing factor for vehicular computing resource configuration. As for VUE
\begin{equation*}
Y_{m,i,k}^{v,req}=\left\lceil \frac{\sum \limits _{l\in {{\mathcal {L}}_{i,k}}}{W_{l}^{\max }\cdot {{Z}_{l}}\cdot {{\beta }_{l}}\cdot {{\Lambda }_{(l^{\prime }=m)}}}}{{{f}_{v}}} \right\rceil, \tag{15}
\end{equation*}
\begin{align*}
&Y_{m,i,k}^{v,\text{rem}} \\
& =\left\lbrace \begin{array}{llll}Y_{m}^{v}, & i=i{}_{1}; \\
0, & i\ne i{}_{1};Y_{m,i^{\prime },k}^{v,\text{rem}}< Y_{m,i^{\prime },k}^{v,\text{req}}; \\
Y_{m,i^{\prime },k}^{v,\text{rem}}-Y_{m,i^{\prime },k}^{v,\text{req}}, &i\ne i{}_{1};Y_{m,i^{\prime },k}^{v,\text{rem}}\geq Y_{m,i^{\prime },k}^{v,\text{req}}, \end{array}\right.\\
\tag{16a}\\
\tag{16b}\\
\tag{16c}
\end{align*}
2) Edge Computing Resources & Radio Communication Resources
In order to ensure that each computing task has a processing location and to avoid wasting resources, the multi-tier computing resources configuration of slice
\begin{equation*}
{{L}_{i,k}}=Y_{i,k}^{v}+Y_{i,k}^{u}+Y_{i,k}^{c}=\sum \limits _{m=1}^{M}{E_{m,i,k}^{v}}+Y_{i,k}^{u}+Y_{i,k}^{c}, \tag{17}
\end{equation*}
As described above, we constitute problem P1 in (10) as a contextual MAB problem. Specifically, at the beginning of each period, the InP considered the agent first observes its context in the form of a feature vector. The vector indicates the characteristics of requested services and the allocation status of vehicular computing resources of all slices. At period
Then, the agent chooses to pull an available arm
The key idea of NeuralUCB is to use a neural network
\begin{align*}
{{P}_{k,{{C}_{k}}}}= & f({{O}_{k,{{C}_{k}}}};{{\omega }_{k-1}})+{{\mu }_{k-1}}\sqrt{g{{({{O}_{k,{{C}_{k}}}};{{\omega }_{k-1}})}^{\text{T}}}H_{k-1}^{-1}g({{O}_{k,{{C}_{k}}}};{{\omega }_{k-1}})/\delta }, \tag{18}\\
{{\mu }_{k}}= & \sqrt{1+{{\chi }_{4}}{{\delta }^{-1/6}}\sqrt{\log \delta }\delta {{^{\prime }}^{4}}{{k}^{7/6}}{{\chi }_{1}}^{-7/6}}\cdot \left(\vartheta \sqrt{\log \frac{\det {{H}_{k}}}{\det {{\chi }_{1}}\text{I}}+{{\chi }_{5}}{{\delta }^{-1/6}}\sqrt{\log \delta }\delta {{^{\prime }}^{4}}{{k}^{5/3}}{{\chi }_{1}}^{-1/6}-2\log {{\chi }_{2}}}+\sqrt{{{\chi }_{1}}}{{\chi }_{3}} \right) \\
& +({{\chi }_{1}}+{{\chi }_{6}}k\delta ^{\prime }) \left[{{(1-\eta \delta {{\chi }_{1}})}^{\eta ^{\prime }/2}}\sqrt{k/{{\chi }_{1}}}+{{\delta }^{-1/6}}\sqrt{\log \delta }\delta {{^{\prime }}^{7}}{{k}^{5/3}}{{\chi }_{1}}^{-5/3} \left(1+\sqrt{k/{{\chi }_{1}}}\right)\right], \tag{19}
\end{align*}
\begin{equation*}
{{H}_{k}}={{H}_{k-1}}+g({{O}_{k,{{C}_{k}}}};{{\omega }_{k-1}})g{{({{O}_{k,{{C}_{k}}}};{{\omega }_{k-1}})}^{\text{T}}}/\delta . \tag{20}
\end{equation*}
\begin{align*}
{{L}^{\text{NU}}}(\omega) &=\sum \limits _{k=1}^{K}{{{(f({{O}_{k,{{C}_{k}}}};\omega) -{{V}_{k}}({{C}_{k}}))}^{2}}/2} \\
&+{\delta {{\chi }_{1}}||\omega -{{\omega }^{(0)}}||_{2}^{2}/2}, \tag{21}
\end{align*}
Edge computing resources and radio communication resources allocation based on NeuralUCB algorithm.
B. Intra-Slice Resource Scheduling
In a period, once the resource configuration is determined, the remaining problem is how to effectively allocate resources from an SP to UEs to maximize the satisfaction of all links. The optimization problem P2 in (14) is difficult to solve since the time-varying nature of the physical layer. Besides, the decisions of task offloading and resource scheduling cause changes in link states (e.g., queue characteristics and channel quality), and service satisfaction also depends on link states. Therefore, we utilize the DRL method based on MDP to solve the proposed intra-slice resource scheduling problem. First, we formulate our problem as an MDP to accurately describe the process of resource allocation and task offloading.
Algorithm 2: NeuralUCB for Edge Computing Resources and Radio Communication Resources Allocation.
Initialization:
for
Observe
for
Compute
Let
end for
Play
Compute
for
end for
Return
Compute
end for
At each slot
For link
The goal of resource allocation at this stage is to maximize the satisfaction level of all links within limited resources. Therefore, we set rewards based on the constraint conditions and objective function. After taking action
\begin{align*}
{{r}_{t}}&={{\ell }_{1}}\cdot \left(\sum \limits _{l\in {{\mathcal {L}}_{i,k}}}{{{U}_{l,t}}} \right) \\
& +{{\ell }_{2}}\cdot \left(\sum \limits _{l\in {{\mathcal {L}}_{i,k}}}{{{\rho }_{l,j,t}}-1} \right)\cdot {{\Lambda }_{\left(\sum \limits _{l\in {{\mathcal {L}}_{i,k}}}{{{\rho }_{l,j,t}}\leq 1} \right)}} \\
& +{{\ell }_{3}}\cdot \left(\sum \limits _{l\in {{\mathcal {L}}_{i,k}}}{\sum \limits _{j\in {{\mathcal {J}}_{i,k}}}{{{\rho }_{l,j,t}}}}-{{J}_{i,k}} \right)\cdot {{\Lambda }_{\left(\sum \limits _{l\in {{\mathcal {L}}_{i}}}{\sum \limits _{j\in {{\mathcal {J}}_{i,k}}}{{{\rho }_{l,j,t}}}}\leq {{J}_{i,k}} \right)}} \\
& +{{\ell }_{4}}\cdot \left(\sum \limits _{l\in {{\mathcal {L}}_{i,k}}}({{{e}_{l,t}}=1})-Y_{_{i,k}}^{u}\right)\cdot {{\Lambda }_{\left(\sum \limits _{l\in {{\mathcal {L}}_{i,k}}}{({{e}_{l,t}}=1)}\leq Y_{_{i,k}}^{u} \right)}} \tag{22}
\end{align*}
In the IoV, each slice is regarded as an agent and owns a private neural network. Each agent aims to find the best policy
\begin{equation*}
{{R}_{t}}=\sum \limits _{i=0}^{T-1}{{{\gamma }^{i}}{{r}_{t+i}}}, \tag{23}
\end{equation*}
\begin{equation*}
{{Q}^{\pi }}({s,a})=\mathbb {E}[{{R}_{t}}|s,\pi ]. \tag{24}
\end{equation*}
\begin{equation*}
{{\pi }^{*}}(s)=\underset{a\in \mathcal {A}}{\mathop {\arg \max }}\;Q(s,a),\forall s\in \mathcal {S}. \tag{25}
\end{equation*}
\begin{align*}
{{Q}^{\pi }}({{s}_{t}},{{a}_{t}})& \leftarrow {{Q}^{\pi }}({{s}_{t}},{{a}_{t}}) \\
&+\alpha \left({{r}_{t}}+\gamma \underset{{{a}_{t+1}}\in \mathcal {A}}{\mathop {\max }}\;{{Q}^{\pi }}({{s}_{t+1}},{{a}_{t+1}}) -{{Q}^{\pi }}({{s}_{t}},{{a}_{t}})\right). \tag{26}
\end{align*}
However, in high-dimensional state spaces, the classic Q-learning method cannot efficiently compute the Q-function for all states. To remedy this problem, DDQN improves the Q-learning by combining the neural networks with Q-learning [41]. Specifically, raw data is input into neural networks as the state. Then, the Q-function is approximated by deep neural networks. It is worth noting that DDQN has two separate networks: the main network and the target network. The main network approximates the Q-function, while the target network gives the temporal difference (TD) target for updating the main network. During the training phase, the main network parameters
\begin{equation*}
{{L}^{\text{DDQN}}}(\theta) =\frac{1}{2}\cdot \mathbb {E}\left[{{\left(y_{t}^{\text{DDQN}}-{{Q}^{\pi }}\left({{s}_{t}},{{a}_{t}};{{\theta }_{t}}\right)\right)}^{2}}\right], \tag{27}
\end{equation*}
\begin{equation*}
y_{t}^{\text{DDQN}}={{r}_{t}}+\gamma {{Q}^{\pi }}\left({{s}_{t+1}},\underset{{{a}_{t+1}}\in \mathcal {A}}{\mathop {\arg \max }}\;Q({{s}_{t+1}},{{a}_{t+1}};{{\theta }_{t}});\theta _{t}^{-} \right). \tag{28}
\end{equation*}
Performance Evaluation
JORA-DDQN for Intra-Slice Resource Scheduling.
Initialization: main network weights
target network weights
experience replay buffer.
for
Receive the initial observation
for
Take action
Get reward
Store the experience
experience replay buffer;
Get a batch
replay memory;
Calculate the target Q-value
network by (28);
Update the main network by minimizing the loss
step on
Every G steps, update the target network
end for
end for
A. Simulation Environment
In our simulation, we utilize Pytorch 1.10.0 on Ubuntu 18.04.6 LTS to implement the 2Ts-IRMS algorithm and compare it with multiple comparison algorithms. For experimental purposes, a cellular V2X network environment based on the SUMO platform is established, which consists of a real road network, an MBS, and several VUEs and RSUs. Specifically, to fit the reality, we import the road network around the Beijing University of Posts and Telecommunication from OpenStreetMap to SUMO at first [40]. Then, the whole road network is divided into 9 blocks, which is consistent with the road partitioning strategy of the Manhattan case [50]. An RSU is deployed in the center of each block and can communicate with vehicles within its coverage, which is depicted in Fig. 7. In order to reflect traffic in urban regions as much as possible, vehicles randomly choose lanes of departure, positions of departure, and speed of departure to enter the generated road network and follow the car-following model of Krauss and the lane-changing model of LC2013 for movement [37].
Real road conditions simulation of Beijing University of Posts and Telecommunications based on SUMO.
For the communication resources in the IoV, we assume that there are 50 RBs with 180 kHz bandwidth to be allocated. For the computing resources, we let the computation capacity (i.e., CPU frequency) of a single CPU core for the VUEs be
In addition, to better evaluate how different V2X services affect the network performance, various combinations of services have been considered in the simulations. By selecting these combinations, we want to test if the proposed 2Ts-RL can satisfy the requirements of multiple services, especially for safety-related services. To simplify, we make slices 1, 2, and 3 represent the slice for basic road safety services, the slice for enhanced road safety services, and the slice for non-safety-related services, respectively. The simulation parameters and neural network parameters are summarized in Tables III and IV, respectively. Afterward, we compare the 2Ts-IRMS algorithm with multiple compared algorithms, which are described as follows:
Hierarchical resource allocation schemes: The two-timescale bidding resource management scheme (2Ts-BRMS) adopts the generalized Kelly mechanism (GKM) to address the inter-slice multi-dimensional resource configuration problem and allocates resources to users according to channel quality (CQ) [23]. In the two-timescale fair resource management scheme (2Ts-FRMS), both the InP and SPs adopt the dominant resource fairness (DRF) approach to allocate multi-dimensional resources [49].
Inter-slice resource configuration schemes: The proportional allocation scheme (PA) proportionally allocates resources to slices based on the number of subscribers and average resource requirements [51]. As for the context-aware configuration scheme (CA), it adjusts inter-slice resource configuration based on the localized service requests and traditional UCB algorithm [45].
Intra-slice resource scheduling schemes: As for communication resource allocation, the queue-aware resource allocation strategy (QA) calculates the queue length of each link. The longer the queue length, the more communication resources are allocated. In the fair resource allocation strategy (FA), the communication resources are equally shared by all links. As for computing resource allocation, the local execution scheme (LE), the edge execution scheme (EE), and the cloud execution scheme (CE) make all tasks to be executed on user terminals, MEC servers, and remote cloud servers, respectively [21].
B. Simulation Results
Fig. 8 compares the achieved values of the valuation function of three hierarchical resource allocation schemes under various combinations of services. With the fixed number of VUEs in the cellular network, the proportion of users in slices 1, 2, and 3 iterates through all possible combinations. Compared to other comparison algorithms, 2Ts-IRMS has the highest valuation value while owning more stale performance. That is because it can dynamically adjust resource allocation according to the different number of users and service requests. It is noted that there are two main reasons for the low performance of 2Ts-BRMS. On the one hand, in the phase of inter-slice resource configuration, 2Ts-BRMS equally treats all slices which leads to the failure to meet safety-related services resulting in a greater negative impact. On the other hand, when SPs allocate resources to users, only considering communication resources affects the quality of services, especially for enhanced road safety services. Besides, the value of the valuation function decreases with the increase in the number of users, because the number of resources is limited and cannot meet too many users.
Comparison of the achieved valuation of hierarchical resource allocation schemes under various combinations of services.
To further analyze the impact of the JAMR approach on inter-slice resource configuration, we evaluate the performance of multiple inter-slice resource configuration schemes by adjusting the unit price of a certain service, which is depicted in Fig. 9. When the unit price of other slices remains unchanged, it can be observed that the system revenue increases with the unit price of the current service. Significantly, JAMR can maintain the highest valuation at any price setting, which further validates its self-adaptive capability. For the three slices proposed, JAMR provides a gain of 24%, 40%, and 76% with respect to DRF, GKM, and CA on average, respectively. The CA scheme with limited fitting ability has a lower revenue since its performance is seriously affected by the nonlinear problem. Furthermore, the fluctuation of the system revenue in the slice for non-safety-related services is obviously smaller than in other slices. The reason for this phenomenon is that the utility of non-safety-related services has a smaller impact on the system performance. Similarly, Fig. 10 depicts the system revenue of multiple inter-slice resource configuration schemes under different punishing values of services. Although the increase in the penalties of services leads to a decrease in the system revenue, JAMR still maintains the highest revenue no matter how the penalties change.
Comparison of system revenue of multiple inter-slice resource configuration schemes under different unit prices of services. (a) Revenue under different unit prices of basic road safety services; (b) Revenue under different unit prices of enhanced road safety services; (c) Revenue under different unit prices of non-safety related services.
Comparison of system revenue of multiple inter-slice resource configuration schemes under different punishing values of services. (a) Revenue under different punishing values of basic road safety services; (b) Revenue under different punishing values of enhanced road safety services; (c) Revenue under different punishing values of non-safety related services.
Fig. 11 shows the number of communication resources (i.e., RBs) and edge computing resources (i.e., CPU cores) allocated to different slices under different inter-slice resource configuration schemes. When the number of vehicles remains unchanged at 20 and the proportion of users in slices 1, 2, and 3 is 2:2:1, JAMR assigns 10 RBs to slice 1, 28 RBs to slice 2, and 12 RBs to slice 3. At the same time, 25% of the edge computing resources are allocated to slice 3 and all remaining edge computing resources are allocated to slice 2. Significantly, although the PA scheme allocates enough resources to the slice for enhanced road safety services, the performance of other slices is seriously compromised.
Resource configuration among slices under different inter-slice resource configuration schemes. (a) Allocated proportion of radio blocks; (b) Allocated proportion of edge CPU cores.
After determining the resource configuration policy for all slices, each SP allocates obtained resources to its subscribers to maximize the long-term satisfaction of all links. To guarantee the isolation among slices, each SP is equipped with an exclusive agent to implement resource scheduling among users based on the proposed JORA-DDQN scheme. To illustrate the convergence performance of the JORA-DDQN scheme in different slices, we plot the variation trend of the SLA violation probability over training episodes for each slice in Fig. 12. In this paper, the SLA mainly refers to delay and reliability. At the beginning of the training, the value of the SLA violation probability is high. With the increase of training episodes, the SLA violation probability gradually decreases. After 1500 episodes, the SLA violation probability is leveling off, which means that all of the slices have converged. Moreover, the slice for enhanced road safety services has the lowest SLA violation probability, which is consistent with KPIs requirements in Table I.
Fig. 13 depicts the performance of links in the slice for basic road safety services during an episode. It consists of the cumulative distribution functions (CDF) of packet delay, CDF of packet reception ratio, and cumulative satisfaction of links. In view of the characteristics of basic road safety services (i.e., small packet size and high timeliness and reliability), it is more important to allocate radio resources than computing resources. That is because most tasks are sufficient to be processed at terminal devices without occupying the computing resources of the MEC server or remote cloud servers. Thus, the DRF, CA, QA, and FA are selected as benchmark schemes to compare with the JORA-DDQN approach. Meanwhile, to ensure fairness, the task offloading policy of benchmark schemes is consistent with JORA-DDQN. Obviously, due to the flexible resource management paradigm, the proposed JORA-DDQN scheme significantly outperforms benchmark schemes whether in terms of delay, reliability, or cumulative satisfaction. Specifically, the average packet delay of link is 93.033 ms and the algorithm can maintain the packet reception ratio of link at least 90%. Besides, JORA-DDQN has the highest service satisfaction and provides a gain of 50% with respect to DRF.
Performance indicators of link in the slice for basic road safety services under different intra-slice resource scheduling schemes. (a) CDF of packet delay of link; (b) CDF of packet reception ration of link; (c) Cumulative satisfaction of links.
As for the slice for enhanced road safety services, the packet size is much larger than basic road safety services, and more CPU cycles are needed to process data. The computing resources of the terminal devices are insufficient to support the simultaneous processing of numerous data. It is necessary to access the MEC server. Thus, the DRF, CA, LE, and EE are selected as benchmark schemes to compare with the JORA-DDQN approach. Similarly, to ensure fairness, the task offloading policy of CA and the RB scheduling policy of LE and EE are consistent with JORA-DDQN. Fig. 14 depicts the performance of links in the slice for enhanced road safety services during an episode. In the proposed scheme, the average packet delay of link is 9.608 ms and the algorithm can maintain the packet reception ratio of link at least 99%. Meanwhile, JORA-DDQN is able to maintain the highest service satisfaction and provides a gain of 52% with respect to DRF. Notably, the LE scheme with the worst performance fails to meet the KPIs requirements of most links.
Performance indicators of link in the slice for enhanced road safety services under different intra-slice resource scheduling schemes. (a) CDF of packet delay of link; (b) CDF of packet reception ration of link; (c) Cumulative satisfaction of links.
As described in Section VI-A, the slice for non-safety-related services is expected to offload computing tasks to the MEC server or remote cloud servers. Thus, the DRF, CA, EE, and CE are selected as benchmark schemes to compare with the JORA-DDQN approach. Considering that the slice has a low sensitivity to the reliability, we only draw the curves of delay and satisfaction in Fig. 15. It is observed that JORA-DDQN can effectively use limited resources to reduce task execution delay as much as possible. The CE scheme has poor performance because it will generate additional communication delays.
Performance indicators of link in the slice for non-safety related services under different intra-slice resource scheduling schemes. (a) CDF of packet delay of link; (b) Cumulative satisfaction of links.
Conclusion
In this paper, we propose three types of network slicing to accommodate diversified V2X services over a common physical infrastructure. Specifically, the slice for basic road safety services is used to fulfill the need for imminent warning to nearby entities in time; the slice for enhanced road safety services aims to achieve a higher level of automatic driving; the slice for non-safety related services focuses on improving driving comfort and efficiency of users. Furthermore, in order to take full advantage of multi-tier resources and consider time-varying network conditions, a novel dual timescale intelligent resource management scheme is proposed. First, at the beginning of each period, the InP jaggedly tunes multi-tier resource configuration among slices to improve system revenue. Then, constrained by limited resources obtained from the InP, each SP carries out real-time task offloading and resource scheduling decisions to maximize the long-term service satisfaction of all users. Finally, based on the effect of the action on the states, we propose JAMR and JORA-DDQN algorithms for learning the optimal strategies of proposed problems, respectively. Simulation results show that our proposed 2Ts-IRMS can effectively guarantee the performance requirements of users and improve the system revenue compared with the benchmark algorithms.