Journals & Magazines >IEEE Transactions on Mobile C... >Volume: 23 Issue: 5

Slice Sandwich: Jagged Slicing Multi-Tier Dynamic Resources for Diversified V2X Services

Abstract:

With the advancement of intelligent transportation systems, a series of diversified V2X applications come into being, which have different key performance indicators (KPI...Show More

Metadata

Abstract:

With the advancement of intelligent transportation systems, a series of diversified V2X applications come into being, which have different key performance indicators (KPIs) and transmission features. Moreover, multi-tier computing as a new system-level architecture distributes computing and communication capabilities anywhere between the cloud and the end-user. Unfortunately, the existing network paradigm for V2X services adopts a one-shot allocation of resources ignoring the inherent differences of V2X service. To cope with these problems, three types of refined network slices for V2X services are first proposed to simultaneously support heterogeneous service characteristics without excessively splitting resources. Considering the spatiotemporal correlation between service traffic and physical resources, a jagged slicing in multi-tier dynamic resources, which forms a “slice sandwich” brightly, is realized by a dual timescale intelligent resource management scheme. The inter-slice resource configuration is based on neural bandits with upper confidence bounds at each large-time period, while the exclusive resources are managed elastically by deep Q-learning in terms of the real-time changing network state in the small slot. We developed a simulation environment by Simulation of Urban Mobility (SUMO) including real-world road conditions and traffic models. The experiment results demonstrate that the proposed scheme can effectively guarantee KPIs of V2X services and improve the system revenue compared with benchmark algorithms.

Published in: IEEE Transactions on Mobile Computing ( Volume: 23, Issue: 5, May 2024)

Page(s): 4285 - 4302

Date of Publication: 22 June 2023

ISSN Information:

DOI: 10.1109/TMC.2023.3288637

Funding Agency:

Citations are not available for this document.

Contents

CCBY - IEEE is not the copyright holder of this material. Please follow the instructions via https://creativecommons.org/licenses/by/4.0/ to obtain full-text articles and stipulations in the API documentation.

SECTION I.

Introduction

With the increase in population and the development of urbanization, the transportation system is facing unprecedented pressure [1]. Many V2X (vehicle-to-everything) services have emerged to adapt to complex traffic situations and offer enjoyable driving experiences. Up to now, the Third Generation Partnership Project (3GPP) has defined 57 use cases of V2X [2], [3], containing V2V (vehicle to vehicle) services, V2P (vehicle to pedestrian) services, V2I (vehicle to infrastructure) services, and V2N (vehicle to network) services. Different from conventional services for stationary or low-speed equipment, V2X services own exclusive transmission features and key performance indicators (KPIs). To reflect how various V2X services influence the performance of the internet of vehicles (IoV), representative use cases are detailedly summarized in Table I. It is not difficult to see that there are extremely diversified and even conflicting service characteristics among use cases, which poses critical pressure on the networking infrastructure [4].

TABLE I Transmission Features and Key Performance Indicators of Typical V2X Service Use Cases

Network slicing has emerged as a promising paradigm to meet diverse service demands. It enables multiple independent logical networks (i.e., slices) to run on a common physical network infrastructure [5], [9]. However, as V2X applications advance, the predefined slice for ultra-reliable low-latency communications (URLLC) can hardly meet more and more stringent and heterogeneous service characteristics by one-shot resource allocation [6], [7], [8]. Taking the advancements in existing studies, three types of slices are proposed to accommodate existing and future V2X use cases without excessively segmenting network resources. Specifically, the slices for basic road safety services, enhanced road safety services, and non-safety related services are used to deliver basic driving information, achieve high-level automatic driving, and improve driving comfort and efficiency, respectively. The illustration of representative use cases and their corresponding slices are depicted in Fig. 1.

Fig. 1.

Illustration of typical V2X applications in vehicular networks. The basic road safety services slice provides position, heading, speed, etc; the enhanced road safety services slice provides raw sensor data, vehicles intention data, coordination, confirmation of future maneuvers, and so on; the non-safety related services slice provides traffic flow optimization and software updates. An exclusive “slice sandwich” for each V2X services slice is made up of jagged multi-dimensional resources.

Show All

Unfortunately, constrained by computing capability or transmission delay, it is difficult to process multiple tasks by a single paradigm [11], [12], [13]. Multi-tier computing as a new system-level computing architecture provides a new resolution for the problem. It involves three tiers with the users at tier one, edge cloud at tier two, and remote cloud at tier three [14]. By reasonably orchestrating available resources along its continuum, the strict KPIs of each slice are expected to be met. However, exploiting this hierarchical computing architecture for service provisioning entails joint allocation of multi-dimensional resources [15], [16]. Besides, the high mobility of vehicles introduces more complexity to resource management [17], [18], [19]. How to effectively allocate multi-tier resources to multiple slices according to time-varying network conditions is a thorny problem.

To cope with the problem, existing studies usually adopt hierarchical resource allocation methods [20], [21], [22], [23], [24]. Although these studies obtained certain results in improving resource utilization, they are not applicable to the IoV. That is because they ignored the exclusive characteristics of V2X services and the importance of multi-tier dynamic resources. Thus, considering the spatiotemporal correlation between service traffic and physical resources [25], a Two-Time-scale Resource Management Scheme (2Ts-RMS) is proposed. Specifically, the scheme is divided into two stages, namely inter-slice resource configuration and intra-slice resource scheduling. At the beginning of each large timescale (i.e., period), the infrastructure provider (InP) configures resources for service providers (SPs) according to service traffic. Due to the long-term trend of the service traffic, the configuration policy remains unchanged within each large timescale. The SPs create customized slices with obtained multi-dimensional resources. Because the inherent characteristics of slices make its demand for multi-dimensional resources appear jagged, the shape of a “slice sandwich” is naturally formed. Then, to adapt the real-time status of the physical layer, each SP will dynamically schedule available resources at each small timescale (i.e., slot) of a large timescale to provide high-quality services for its subscribers. In this way, system revenue could be maximized while guaranteeing the delay and reliability requirements of mobile users.

It is noted that the resource configuration made in the InP will influence the scheduling process in SPs; meanwhile, the performance of SPs will also affect the decision-making of the InP. The interaction between InP and SPs makes it very challenging to implement conventional mathematical methods to solve the proposed problem. Deep Reinforcement Learning (DRL) as intelligent approaches provide promising solutions to the challenge. In the stage of inter-slice resource configuration, since the status of service requests is only up to the users, it does not change by the selected policy of resource configuration. Thereupon, a Joint Allocation algorithm of Multi-dimensional Resources (JAMR) based on the improved NeuralUCB (Neural bandits with Upper Confidence Bounds) approach is proposed. The algorithm can effectively avoid the curse of dimensionality and learns the unknown system revenue. As for the intra-slice resource scheduling problem, state transitions need to be considered, because the scheduling policies of resource allocation and task offloading will generate different effects on the physical layer states. To adapt to the time-varying physical layer, a Joint Offloading and Resource Allocation algorithm based on the Double Deep Q Network (JORA-DDQN) approach is proposed to obtain optimal scheduling policies. The major contributions of this paper are summarized as follows.

Three types of refined network slices for V2X services are proposed to simultaneously accommodate multiple V2X services over a common infrastructure.
In view of differentiated KPIs among slices, a dual Timescale Intelligent Resource Management Scheme (2Ts-IRMS) is proposed to jaggedly divide multi-tier resources into multiple slices in time-varying IoV.
In order to fit reality, real world road conditions and traffic models are set up in Simulation of Urban Mobility (SUMO). Numerical experiments using Pytorch verify that the proposed scheme can more economically and efficiently utilize network resources.

The rest of this paper is organized as follows. Section II presents an overview of the related works. In Section III, we describe the considered system framework. Section IV presents the two timescale resource allocation problem. In Section V, the solutions based on JAMR and JORA-DDQN are proposed. Section VI evaluates the network performance and compares its performance with some benchmarks. Finally, Section VII concludes the paper.

SECTION II.

Related Work

A. Network Slicing for V2X Services

Up to now, the 3GPP has defined standardized slices to support enhanced mobile broadband (eMBB), URLLC, and massive machine type communication (mMTC) [29], [30]. With the evolution of V2X services, more and more rigorous and heterogeneous KPIs need to be satisfied. Mapping V2X services into existing reference slices or a single V2X slice is no longer appropriate [32], [33]. Network slicing for concrete application scenarios is still emerging, especially for vehicular scenarios [34]. In [35], the authors customized slices for safety and non-safety V2X services, respectively. According to the sensitivity of V2X services to delay, Wu et al. proposed delay-sensitive and delay-tolerant slices [36]. As described in Table I, there are great differences between basic road safety services and enhanced road safety services. One slice for safety or delay-sensitive V2X services is still insufficient to simultaneously cope with the differences.

For dealing with this problem, Campolo et al. designed four slices for autonomous driving, tele-operated driving, remote diagnostic, and vehicular infotainment [32]. Similarly, the authors proposed a general network slicing architecture for four typical use cases, namely localization and navigation, transportation safety, autonomous driving, and infotainment services [34]. A common problem of the aforementioned studies is that the validity of the proposed schemes did not be verified. The complexity of slicing management increases with the number of slices. Dividing V2X services into three slices is a more reasonable solution, which is similar to the slicing way for traditional mobile services. In [31], Ge et al. proposed three types of service slices, which are used to transmit state-report, event-driven, and entertainment-application messages, respectively. In [39], Cui et al. divided the common network infrastructure into three slices to provide short message service, call service, and internet service for vehicles. Different from the existing studies, the proposed slices in this paper fully consider the exclusive characteristics (i.e., transmission features and KPIs) of V2X services. They can cover all V2X use cases defined in [2], [3] without excessively segmenting resources.

B. Resource Allocation for Network Slicing

In addition to slicing services with benign granularity, it is important to effectively allocate resources among slices. In [42], the authors developed a fuzzy logic-based resource allocation algorithm to simultaneously satisfy the diversified requirements of V2X services. Although the scheme achieved higher resource utilization, its computation complexity is high as the InP directly allocates its resources to users. Most of the existing studies tend to adopt hierarchical resource allocation methods to reduce the burden of the InP. In [6], Han et al. proposed a two-dimension-time-scale resource allocation scheme including inter-slice resource pre-allocation in large time periods and intra-slice resource scheduling in small time slots. The scheme achieves a near-optimal tradeoff among the performance of slices. In [20], Mei et al. designed a slicing strategy with two-layer control granularity. The upper-level and lower-level controllers are used to guarantee the quality of services and improve the spectrum efficiency of each slice, respectively. However, these efforts only concentrate on spectrum resource allocation. The significance of computing resources is ignored, which are necessities to satisfy the KPIs of V2X services.

To address the multi-dimensional resources allocation issue, Mohammed et al. proposed a multi-dimensional resources slicing scheme [49]. Both the InP and SPs adopt the dominant resource fairness (DRF) approach to allocate multi-dimensional resources. In [23], the authors introduced a generalized Kelly mechanism (GKM) to address the multi-dimensional resource allocation issue between the InP and SPs. Meanwhile, each SP utilizes Karush–Kuhn–Tucker (KKT) conditions to derive the optimal scheduling strategy of communication resources. Although these studies make progress in improving the aggregate revenue of SPs, they cannot be directly applied to the IoV with multiple V2X services. On the one hand, when the InP equally treats all slices, it is hard to guarantee road safety in real-world situations. On the other hand, the differentiated characteristics of multi-tier computing resources have not been extensively explored, which will further cut down the system revenue. In our work, we adopt intelligent approaches to economically allocate multi-tier resources to multiple V2X slices while guaranteeing the delay and reliability requirements of mobile users.

C. DRL-Enabled Network Slicing

In the dynamic IoV, conventional mathematical models face with high computation complexity and lack adaptability and robustness. Advanced DRL algorithms have been widely applied in network slicing [38]. From the perspective of the effect of the action on the status, DRL can be divided into DRL based on multi-armed bandits (MAB) and DRL based on Markov Decision Process (MDP) [26], [27]. Because the policies of resource allocation and task offloading will generate different effects on the physical layer states, the intra-slice resource scheduling problem is usually formulated as an MDP problem. In [22], Chen et al. leveraged the DDQN algorithm to learn the optimal policies of packet scheduling and computation offloading. In [20], the authors further verified the effectiveness of DDQN in jointly optimizing resource allocation and computation offloading. In this paper, each SP equipped with an exclusive agent implements resource scheduling to guarantee the isolation among slices.

As for the inter-slice resources configuration problem, it is impossible to find the optimal configuration policy before the end of a period. That is because the future status of the IoV is unknowable. In addition, it is impractical to traverse all configuration policies at each period. Taking advantage of the characteristic that the policy of resource configuration will not change the status of service requests, many studies adopt DRL algorithms based on MAB to learn the unknown reward function. In [44], Zanzi et al. developed a radio slicing orchestration scheme based on MAB. With no prior knowledge of channel quality statistics, SPs can make adaptive slicing decisions. In [45], Zhao et al. formulated resource configuration as a contextual MAB problem and adopted the upper-confidence-bound (UCB) algorithm to solve it. However, these studies assumed a linear relationship between the expected reward and the context vector. Furthermore, the effectiveness of DRL algorithms based on MAB will be greatly reduced when the number of candidate actions is large. The curse of dimensionality is inevitable when we jointly consider multi-dimensional resources. Therefore, in this paper, we design a pre-allocation mechanism based on service priories and adopt the NeuralUCB algorithm to obtain an optimal configuration policy of multi-dimensional resources.

SECTION III.

System Model

This section describes the system model in detail. Specifically, we first present the network model (Section III-A) and multi-tier resources model (Section III-B) of the IoV. Then, we elaborate on the process of transmission (Section III-III-C) and offloading (Section III-D) for vehicular tasks. Finally, key performance indicators of V2X services will be presented (Section III-E). For convenience, Table II summarizes the major notations of this paper.

TABLE II Major Notations Used in This Paper

A. Network Model

The physical infrastructure of the IoV mainly includes a macro base station (MBS) connected to remote cloud servers, roadside units (RSUs) equipped with MEC servers, and vehicular user equipment (VUEs) with diverse numbers of vehicular computing units. Note that an RSU essentially is a statically logical entity. It supports V2X applications by using the functionality provided by a 3GPP network or user equipment (UE) [46]. Thus, we assume all UEs, which consist of VUEs and RSUs, are within the coverage of the MBS and VUEs can only access the internet via RSUs. Let $\mathcal {N}=\lbrace {{N}_{0}},{{N}_{1}},{\ldots },{{N}_{m}},{\ldots },{{N}_{M}}\rbrace$ be the set of UEs covered by RSU ${{N}_{0}}$. Furthermore, we assume that VUE ${{N}_{m}}\in \mathcal {N}(m\ne 0)$ is equipped with $Y_{m}^{v}$ central processing unit (CPU) cores. As for RSU ${{N}_{0}}$, there are ${{Y}^{u}}$ CPU cores deployed at its MEC server. It means that the MEC server can serve at most ${{Y}^{u}}$ VUEs at the same time. ${{f}_{v}}$ and ${{f}_{u}}$ represent the CPU frequency of each CPU core of the VUE and MEC server, respectively.

As mentioned in Table I, there are great differences among V2X services. Therefore, we propose three kinds of network slices to reflect the differences without excessively segmenting resources. Specifically, the three kinds of network slices embrace the slice for basic road safety services, the slice for enhanced road safety services, and the slice for non-safety related services. The specific characteristics and requirements of each slice are described as follows. \begin{align*} {r_{{{N}_{m}},j,t}^{{{N}_{m^{\prime }}}}=} \left\lbrace \begin{array}{llll}B{{\log }_{2}}\left(1+\frac{p_{{{N}_{m}}}^{{{N}_{m^{\prime }}}}\cdot h_{{{N}_{m}},j,t}^{{{N}_{m^{\prime }}}}}{{{\sigma }^{2}}}\right), & \text{for the long packet transmission}; \\ B{{\log }_{2}}\left(1+\frac{p_{{{N}_{m}}}^{{{N}_{m^{\prime }}}}\cdot h_{{{N}_{m}},j,t}^{{{N}_{m^{\prime }}}}}{{{\sigma }^{2}}}\right)-\sqrt{\frac{V_{{{N}_{m}},j,t}^{{{N}_{m^{\prime }}}}}{{{\tau }_{{{N}_{m}}}}}}\cdot \frac{{{G}^{-1}}(\varpi) }{\ln 2}, & \text{for the short packet transmission,} \end{array}\right.\\ \tag{1a}\\ \tag{1b} \end{align*} View Source

The slice for basic road safety services is mainly aimed at the services that require high timeliness and reliability but low data rates, such as collision warnings and emergency stops. V2V is the prevalent radio access technology to satisfy the requirements of latency and reliability. Note that the packet size of basic safety services is usually small. Thus, instead of offloading tasks to MEC servers, vehicular computing resources are sufficient to process them.
The slice for enhanced road safety services aims to enable high-level autopilot. Compared to basic road safety services, the slice requires higher reliability, data rate, beacon frequency, and lower latency. Similarly, to effectively transmit messages among vehicles, low-latency V2V communication is the main communication mode. Due to the limited processing capability of VUEs and the long transmission latency of remote cloud servers, a proportion of data processing should be performed in MEC servers.
The slice for non-safety related services has a low sensibility to delay and reliability, but usually has high requirements of data rate. As a result, it is expected to use multiple access technologies to seek higher throughput and to process tasks in MEC servers or cloud servers.

In this paper, an SP corresponds to a slice and provides a class of V2X services. Therefore, we will not distinguish the concepts of slice and SP in the following text. To facilitate analysis, let ${{\mathcal {L}}_{i}}$ be the set of V2V links subscribed to slice $i\in \mathcal {I}$ with $|\mathcal {I}|=3$. Then, $\mathcal {L}={{\cup }_{i\in \mathcal {I}}}{{\mathcal {L}}_{i}}$ denotes the set of all V2V links across the whole network. Each V2V link $l\in \mathcal {L}$ is composed of a transmitter (VTx) ${{N}_{l}}\in \mathcal {N}$ and a receiver (VRx) ${{N}_{l^{\prime }}}\in \mathcal {N}$.

B. Multi-Tier Resources Model

As described above, each SP simultaneously needs computing resources and communication resources to service its users. The inherent attributes (i.e., KPIs and transmission features) of V2X services make their demand for multi-dimensional resources appear jagged. Thus, the jagged resource slicing on the multi-tier computing architecture is adopted in this paper. Generally, the architecture tends to use three tiers with users at tier one, edge cloud at tier two, and remote cloud services at tier three. Before determining the most suitable communication method and computing location for any service, the hierarchical and distributed characteristics of multi-dimensional resources should be considered. In the tier of terminal devices, vehicular computing resources usually have relatively small computing capabilities. The purpose of local execution is to reduce communication delay and errors caused by transmission and protocols. Significantly, a VUE can concurrently subscribe to multiple slices in our system model, which is consistent with actual cases. Therefore, let $Y_{m,i}^{v}$ be the number of vehicular CPU cores allocated to slice $i$ by VUE ${{N}_{m}} (m\ne 0)$.

As for the edge tier, MEC servers have powerful computing capabilities. However, the computing resources of each MEC server are limited. It means that only a part of VUEs can offload their computing tasks to MEC servers by V2I links. To guarantee isolation among slices, the shared edge computing resources ${{Y}^{u}}$ (i.e., CPU cores) and the set of shared wireless communication resources $\mathcal {J}$ with $|\mathcal {J}|=J$ (i.e., physical resource blocks with bandwidth $B$) are orthogonally divided into three parts. Let ${{\mathcal {J}}_{i}}$ with $|{{\mathcal {J}}_{i}}|={{J}_{i}}$ be the set of the total wireless communication resources allocated to slice $i$. $Y_{_{i}}^{u}$ is the number of CPU cores of the MEC server allocated to slice $i$. The cloud tier consists of a large number of remote cloud servers, which has sufficient computing resources. Furthermore, the RSUs are connected to the MBS and cloud computing center via high-speed fronthaul links. Thus, when the VUEs decide to offload computing tasks to remote cloud servers, it is reasonable to ignore the constraint of the number of communication and computing resources. To reflect the usage of cloud computing resources, let $Y_{i}^{c}$ be the number of VUEs offloading computing tasks to remote cloud servers. Fig. 2 depicts a diagram of the jagged allocation of virtualized resources to multiple V2X slices, where each slice consists of multi-dimensional resources to form a “slice sandwich”.

Fig. 2.

Illustration of the jagged allocation of virtualized multi-tier resources to refined network slices of V2X services, compared to flat slicing resources to three generic usage scenarios of 5G.

Show All

C. Signal Transmission Model

In conventional services, the data rate of large-sized packets can be directly calculated through the Shannon formula. However, unlike conventional services, the packet size of most V2X services is short, which ranges from 32 to 200 bytes [20]. Since the negative effect of channel dispersion and coding length, the data rate of a short packet cannot be accurately obtained by the Shannon formula. In [47], based on finite block-length theory, a new method used to approximately calculate the data rate of short packets is proposed. Therefore, the available data rate between VTx ${{N}_{m}}\in \mathcal {N}$ and VRx ${{N}_{m^{\prime }}}\in \mathcal {N}$ on the resource block $j\in \mathcal {J}$ at slot $t$ can be calculated as formula (1a) or formula (1b) shown at the bottom of this page, where ${{\sigma }^{2}}$ is the power of additive white Gaussian noise on each resource block (RB). $p_{{{N}_{m}}}^{{{N}_{m^{\prime }}}}$ is the transmission power when ${{N}_{m}}$ communicates to ${{N}_{m^{\prime }}}$. $h_{{{N}_{m}},j,t}^{{{N}_{m^{\prime }}}}$ denotes the channel coefficient on RB $j$ at slot $t$, which contains path loss, Rayleigh fading and shadowing effect. As for the short packet transmission in formula (1b), $V_{{{N}_{m}},j,t}^{{{N}_{m^{\prime }}}}$ is used to reflect the random variability of the channel. It is calculated as \begin{equation*} V_{{{N}_{m}},j,t}^{{{N}_{m^{\prime }}}}=1-{{\left(1+\frac{p_{{{N}_{m}}}^{{{N}_{m^{\prime }}}}\cdot h_{{{N}_{m}},j,t}^{{{N}_{m^{\prime }}}}}{{{\sigma }^{2}}} \right)}^{-2}}. \tag{2} \end{equation*} View Source${{G}^{-1}}(\cdot)$ and $\varpi$ are the inverse of the Gaussian Q-function and the effective decoding error probability, respectively. ${{\tau }_{{{N}_{m}}}}$ is the number of transmit symbols. Both of them are used to reflect the influence of coding on short packet transmission.

In addition, during the phase of data transmission, each V2V link maintains an individual queue to buffer the arriving packets. The packet is delivered according to the first-come-first-serve policy [22]. As for link $l$, the dynamic evolution of its queue can be written as \begin{equation*} {{W}_{l,t+1}}=\min \lbrace {{W}_{l,t}}-\Delta t\cdot r_{l,t}^{v}/Z{}_{l}+{{A}_{l,t}},W_{l}^{\max }\rbrace, \tag{3} \end{equation*} View Sourcewhere ${{W}_{l,t}}$ is the queue length (i.e., number of packets) at slot $t$. ${{A}_{l,t}}$ denotes the instantaneous packet arrival. $W_{l}^{\max }$ is the maximum length of the buffer queue, and ${{Z}_{l}}$ is the total packet size (in bits). $\Delta t$ refers to the duration of each slot. During the duration, the quality of wireless channels keeps stable. As for $r_{l,t}^{v}$, it depicts the total rate capacity from VTx ${{N}_{l}}$ to VRx ${{N}_{l^{\prime }}}$ at slot $t$, which can be expressed as \begin{equation*} r_{l,t}^{v}=\sum \limits _{j\in \mathcal {J}}{{{\rho }_{l,j,t}}\cdot r_{{{N}_{l}},j,t}^{{{N}_{l^{\prime }}}}}. \tag{4} \end{equation*} View SourceTerm ${{\rho }_{l,j,t}}$ is a binary variable. ${{\rho }_{l,j,t}}=1$ denotes the $j$-th RB is allocated to link $l$ at slot $t$, otherwise ${{\rho }_{l,j,t}}=0.$

D. Task Offloading Model

In this paper, we consider a hybrid computation offloading scenario [21]. The computing task of a vehicle can be executed locally. It can also select to be offloaded to the MEC server by V2I communication or the remote cloud computing servers through relayed V2I and high-speed fronthaul links. As for the computing task of link $l$ at slot $t$, let ${{e}_{l,t}}\in \lbrace 0 1 2 \rbrace$ be its offloading action. Specifically, ${{e}_{l,t}}=0$ represents local execution, ${{e}_{l,t}}=1$ indicates offloading the computing task to the MEC server, and ${{e}_{l,t}}=2$ means that the offloading position of the task is the remote cloud computing center. Considering the output size of the computing task is much smaller than the input size of the computing task, the download time of processed data is ignored [21], [43]. Thus, at slot $t$, the processing time for the $b$-th packet of link $l\in {{\mathcal {L}}_{i}}$ can be described as: \begin{equation*} {D_{l,b,t}^{cp}=}\left\lbrace \begin{array}{ll}\frac{{{Z}_{l}}\cdot {{\beta }_{l}}}{Y_{m,i}^{v}\cdot {{f}_{v}}}, & {{e}_{l,t}}=0; \qquad \qquad \qquad \qquad (5\mathrm {a})\\ \frac{{{Z}_{l}}\cdot {{\beta }_{l}}}{{{f}_{u}}}+\frac{{{Z}_{l}}}{r_{l,t}^{u}}, & {{e}_{l,t}}=1; \qquad \qquad \qquad \qquad (5\mathrm {b})\\ \frac{{{Z}_{l}}\cdot {{\beta }_{l}}}{{{f}_{c}}}+\frac{{{Z}_{l}}}{r_{l,t}^{u}}+{{t}_{c}}, & {{e}_{l,t}}=2,\qquad \qquad \qquad \qquad (5\mathrm {c}) \end{array}\right. \end{equation*} View Sourcewhere ${{\beta }_{l}}$ denotes that the input packet requires ${{\beta }_{l}}$ cycles/bit for processing. Term ${{f}_{c}}$ is the CPU frequency of each CPU core of the remote cloud server, and ${{t}_{c}}$ is the network delay between RSU ${{N}_{0}}$ and the cloud computing center. It is worth noting that $r_{l,t}^{u}$ is the available transmission rate for link $l$ to upload data to RSU ${{N}_{0}}$, which is denoted as \begin{equation*} r_{l,t}^{u}=\sum \limits _{j\in \mathcal {J}}{{{\rho }_{l,j,t}}\cdot r_{{{N}_{l}},j,t}^{{{N}_{0}}}}. \tag{6} \end{equation*} View Source

E. Key Performance Indicators

As defined in [3], whole end-to-end (E2E) communication refers to the process that transfers a given piece of information from a source to a destination at the application level. Generally, the E2E delay consists of waiting time in the queue, transmission time, network latency, and processing latency [48]. In this paper, we have assumed that all VUEs are in the coverage of the RSUs and they can only grasp data from RSUs. Consequently, it is reasonable to ignore the network delay during the process of data receiving. Thus, we mainly consider waiting, transmission, and processing delays. At slot $t$, the E2E delay of the $b$-th packet of link $l$ can be written as \begin{equation*} D_{l,b,t}^{^{\text{E2E}}}=D_{l,b,t}^{cw}+D_{l,b,t}^{ct}+D_{l,b,t}^{cp}, \tag{7} \end{equation*} View Sourcewhere $D_{l,b,t}^{cw}$ denotes the queuing delay at VTx ${{N}_{l}}$, and $D_{l,b,t}^{ct}$ refers to the transmission time between VTx ${{N}_{l}}$ and VRx ${{N}_{l^{\prime }}}$. To reflect the delay state of link $l$ at slot $t$, let ${{D}_{l,t}}$ be the average packet delay of queue ${{W}_{l,t}}$.

In addition to delay, reliability is another key performance indicator [10]. From the view of service provisioning, the probability of receiving or dropping data packets is usually used as a measure of reliability [42]. When the delay of a packet exceeds the maximum tolerant delay, we consider the packet as dropout, otherwise as receiving. In this paper, we choose the packet reception ratio (PRR) as the index to evaluate reliability, which can be expressed as \begin{equation*} {{\varphi }_{l,t}}=\Pr \left\lbrace D_{l,b,t}^{^{\text{E2E}}}< D_{l}^{^{\max }} \right\rbrace, \tag{8} \end{equation*} View Sourcewhere $D_{l}^{^{\max }}$ is the maximum tolerant E2E delay of link $l$.

SECTION IV.

Problem Formulation

In this paper, the resource allocation problem is decomposed into two stages. First, at the beginning of each large-time period $k$, the InP jaggedly allocates shared physical resources to SPs (Section IV-A). Then, at each small-time slot $t$, each SP independently manages acquired resources to provide services for its users (Section IV-B). Fig. 3 helps illuminate the process of two-time-scale resource allocation.

Fig. 3.

Schematic of two-time scales resource allocation. First, at the beginning of each large-time period, the InP allocates shared physical resources to SPs (inter-slice resource configuration). Then, each SP elastically assigns exclusive resources to its users at each small slot (intra-slice resource scheduling).

Show All

A. Large Timescale Problem Formulation

At the beginning of each period $k\in \mathcal {K}$, the multi-tier resources are jaggedly allocated to slices to maximize system revenue. The revenue consists of the fees charged by SPs and the fees paid for accessing resources. As for slice $i\in \mathcal {I}$ with resources configuration ${{c}_{i,k}}=\lbrace c_{i,k}^{v},c_{i,k}^{u},c_{i,k}^{c}\rbrace$, the fee charged by it is the value of the valuation function $v({{c}_{i,k}})$. Term ${c_{i,k}^{v}}=\lbrace Y_{m,i,k}^{v}|\forall m\in [1,M]\rbrace$, ${c_{i,k}^{u}}=\lbrace {{J}_{i,k}},Y_{i,k}^{u}\rbrace$, and ${c_{i,k}^{c}}=\lbrace Y_{i,k}^{c}\rbrace$ represent the resource configuration for the terminal tier, edge tier, and cloud tier, respectively. From the perspective of privacy and security, the computing resource of vehicles only can be used by themselves. Thus, it is reasonable to ignore the cost of using vehicular computing resources. Let ${{q}^{cc}}$ be the price to access the cloud computing center for each user. ${{q}^{cm}}$ and ${{q}^{cp}}$ are treated as the price to utilize unit communication resources and unit edge computing resources, respectively. Therefore, corresponding to the set of multi-dimensional resources configuration for all slices ${{C}_{k}}=\lbrace {{c}_{i,k}}|\forall i\in \mathcal {I} \rbrace \in \mathcal {C}$, the total system revenue $V({{C}_{k}})$ is expressed as \begin{equation*} V({{C}_{k}})=\sum \limits _{i\in \mathcal {I}}{\left[ v({{c}_{i,k}})-{{q}^{cm}}{{J}_{i,k}}-{{q}^{cu}}Y_{i,k}^{u}-{{q}^{cc}}Y_{i,k}^{c} \right]} \tag{9} \end{equation*} View SourceBased on the above analysis, the problem that the InP allocates multi-tier resources to SPs can be formulated as: \begin{align*} & \text{P1 : }\underset{{{C}_{k}}}{\mathop {{\max}}}\; \left[ V({{C}_{k}}) \right] \\ & \text{subject to : } \\ & \text{C1 : }\sum \limits _{i\in \mathcal {I}}{Y_{m,i,k}^{v}\leq Y_{m}^{v}; \ \forall m\in [1,M],k\in \mathcal {K}}, \\ & \text{C2 : }\sum \limits _{i\in \mathcal {I}}{{{J}_{i,k}}}\leq J; \ \forall k\in \mathcal {K}, \\ & \text{C3 : }\sum \limits _{i\in \mathcal {I}}{Y_{i,k}^{u}}\leq {{Y}^{u}}; \ \forall k\in \mathcal {K}, \\ & \text{C4 : }Y_{m,i,k}^{v}\in [0,Y_{m}^{v}]; \ \forall m\in [1,M],i\in \mathcal {I},k\in \mathcal {K}, \\ & \text{C5 : }{{J}_{i,k}}\in [0,J]; \ \forall i\in \mathcal {I},k\in \mathcal {K}, \\ & \text{C6 : }Y_{i,k}^{u}\in [0,{{Y}^{u}}]; \ \forall i\ne i^{\prime }\in \mathcal {I},k\in \mathcal {K}, \\ & \text{C7 : }\mathcal {Y}_{_{m,i,k}}^{v}\cap \mathcal {Y}_{_{m,i^{\prime },k}}^{v}=\varnothing ; \ \forall m\in [1,M],i\ne i^{\prime }\in \mathcal {I},k\in \mathcal {K}, \\ & \text{C8 : }{{\mathcal {J}}_{i,k}}\cap {{\mathcal {J}}_{i^{\prime },k}}=\varnothing ; \ \forall i\ne i^{\prime }\in \mathcal {I},k\in \mathcal {K}, \\ & \text{C9 : }\mathcal {Y}_{i,k}^{u}\cap \mathcal {Y}_{_{i^{\prime },k}}^{u}=\varnothing ; \ \forall i\ne i^{\prime }\in \mathcal {I},k\in \mathcal {K}, \tag{10} \end{align*} View Sourcewhere $\mathcal {Y}_{i,k}^{u}$ is the set of edge computing resources allocated to slice $i$ at period $k$. Similarly, $\mathcal {Y}_{_{m,i,k}}^{v}$ is the set of vehicular computing resources allocated to slice $i$ from VUE ${{N}_{m}}$ at period $k$. C1-C3 respectively guarantee that allocated resources do not exceed the capacity of vehicular computing resources $Y_{m}^{v}$, communication resources $J$, and edge computing resources ${{Y}^{u}}$. C4-C6 are the constraints to the value of $ Y_{m,i,k}^{v}$, ${{J}_{i,k}}$, and $Y_{i,k}^{u}$, respectively. These constraints ensure that the number of resources allocated to each SP must be a non-negative integer. C7-C9 ensure isolation among all slices. It is noteworthy that the cloud tier contains a large number of cloud servers and the RSUs connect to the cloud computing center via high-speed fronthaul links. As a result, resource constraints of offloading tasks from the edge tier to the cloud tier can be ignored.

B. Small Timescale Problem Formulation

After determining the resource configuration for all slices, each slice utilizes acquired multi-dimensional resources to provide services for its subscribers to maximize the fee charged by it. As for slice $i\in \mathcal {I}$, $q_{i}^{se}$ is the unit price to charge link $l$ for realizing service satisfaction ${{U}_{l,t}}$, which consumes computing resources to process arriving tasks and communication resources to transmit queued packets. Since both delay and reliability are KPIs to weigh the quality of V2X services, the service satisfaction ${{U}_{l,t}}$ of link $l$ at slot $t$ can be described as: \begin{equation*} {{U}_{l,t}}={{\alpha }_{d}}\cdot U_{_{l}}^{(1)}(D_{l,t}^{^{\text{E2E}}})+{{\alpha }_{r}}\cdot U_{_{l}}^{(2)}({{\varphi }_{l,t}}), \tag{11} \end{equation*} View Sourcewhere ${{\alpha }_{d}}$ and ${{\alpha }_{r}}$ are weighting factors that balance the importance between delay and reliability. Based on theoretical KPIs and actual indexes, the satisfaction of links shows a negative exponential [38]. Besides, to reflect the negative impact of violating KPIs, a penalty factor ${{\psi }_{i}}$ is introduced into service satisfaction. In general, penalty factors for safety-related services are larger than non-safety-related services. Thus, for link $l\in {{\mathcal {L}}_{i}}$, its delay satisfaction and reliability satisfaction are written as \begin{align*} U_{_{l}}^{(1)}(D_{l,t}^{^{\text{E2E}}})=\left\lbrace \begin{array}{llll}\exp (-D_{l,t}^{^{\text{E2E}}}), & D_{l,t}^{^{\text{E2E}}}\leq D_{l}^{^{\max }}; \qquad (12\mathrm {a})\\ \exp (-D_{l,t}^{^{\text{E2E}}})-{{\psi }_{i}},& D_{l,t}^{^{\text{E2E}}}>D_{l}^{^{\max }}, \qquad (12\mathrm {b}) \end{array}\right.\\ U_{_{l}}^{(2)}({{\varphi }_{l,t}})=\left\lbrace \begin{array}{llll}\exp (-(1-{{\varphi }_{l,t}})), & {{\varphi }_{l,t}}\geq \varphi _{l}^{^{\min }}; \qquad (13\mathrm {a})\\ \exp (-(1-{{\varphi }_{l,t}}))-&{{\psi }_{i}},{{\varphi }_{l,t}}< \varphi _{l}^{^{\min }}, (13\mathrm {b}) \end{array}\right. \end{align*} View Sourcewhere $\varphi _{l}^{^{\min }}$ is minimum PRR of link $l$. Note that each period is composed of $T$ slots and ${{C}_{k}}$ remains unchanged during the duration of period $k$. Once resource configuration is determined by the InP, the remaining problem for each SP is how to maximize the long-term satisfaction of all service requests. Thus, during period $k$, the problem that SP $i\in \mathcal {I}$ schedules resources among its links ${{\mathcal {L}}_{i}}$ is formulated as: \begin{align*} & \text{P2 : }\underset{{{e}},{{\rho }}}{\mathop {\max }}\;[v({{c}_{i,k}})]=\underset{{{e}},{{\rho }}}{\mathop {\max }}\;\left[ q_{i}^{se}\sum \limits _{t=1}^{T}{\sum \limits _{l\in {{\mathcal {L}}_{i,k}}}{{{U}_{l,t}}}} \right] \\ & \text{subject to : } \\ & \text{C10 : }\sum \limits _{l\in {{\mathcal {L}}_{i,k}}}{{{\rho }_{l,j,t}}\leq 1;\forall t\in [1,T],j\in {{\mathcal {J}}_{i,k}}}, \\ & \text{C11 : }\sum \limits _{l\in {{\mathcal {L}}_{i,k}}}{\sum \limits _{j\in {{\mathcal {J}}_{i,k}}}{{{\rho }_{l,j,t}}}}\leq {{J}_{i,k}};\forall t\in [1,T], \\ & \text{C12 : }{{e}_{l,t}}\in [0,1,2];\forall t\in [1,T],l\in {{\mathcal {L}}_{i,k}}, \qquad \qquad \ \ \\ & \text{C13 : }\sum \limits _{l\in {{\mathcal {L}}_{i,k}}}{{{e}_{l,t}}=1}\leq Y_{_{i,k}}^{u};\forall t\in [1,T], \tag{14} \end{align*} View Sourcewhere ${{e}}=\lbrace {{e}_{l,t}}|\forall t\in [1,T], l\in {{\mathcal {L}}_{i,k}}\rbrace$ is the set of offloading actions. $\rho$ $=\lbrace {{\rho }_{l,j,t}}|\forall t\in [1,T], l\in {{\mathcal {L}}_{i,k}}, j\in {{\mathcal {J}}_{i,k}}\rbrace$ is the set of the allocation of RBs. C10 refers that each RB only can be assigned to a link at each slot $t$. C11 indicates that the allocated communication resources cannot exceed the obtained communication resources ${{J}_{i,k}}$. C12 implies that the computing task of link $l$ can be handled in only one way, such as being executed locally, offloaded to the MEC server, or remote cloud computing servers. C13 indicates that the allocated edge computing resources to all links cannot exceed the acquired edge computing resources $Y_{_{i,k}}^{u}$.

SECTION V.

Dual Timescale Intelligent Resource Management Scheme

The optimization problems described in Section IV are difficult to solve as they are NP-hard problems. In addition, the IoV needs an intelligent resource management scheme to adapt to dynamic network conditions. Hence, in this section, a novel 2Ts-IRMS is proposed to address the resource allocation problem in the IoV. Specifically, we adopt the proposed JAMR algorithm to address inter-slice resource configuration at each large-time period (Section V-A) while the JORA-DDQN algorithm is used to solve intra-slice resource scheduling at each small-time slot (Section V-B).

A. Inter-Slice Resource Configuration

At large timescales, a central question is how the InP allocates multi-tier resources to SPs to maximize system revenue. Obviously, it is impossible to find the optimal resource configuration of P1 in (10) before the end of period $k$. That is because the future state of the IoV is unknowable and cannot be obtained in advance. In addition, we cannot acquire all $V({{C}_{k}}) (\forall {{C}_{k}}\in \mathcal {C})$ for each period in practice. Luckily, within a specific region, the long-term trend of network conditions can be characterized by service requests [20]. Meanwhile, the selection of resource configuration will not change the state of service requests. Consequently, it is reasonable to formulate P1 as a contextual MAB problem. However, the computation complexity is extremely high when the InP simultaneously allocates all network resources. To address this challenge, based on service requests, the problem of multi-tier resource allocation is approximately decomposed into several subproblems. Each subproblem focuses on the characteristics of resources at different tiers.

Algorithm 1: Inter-Slice Vehicular Computing Resources Configuration Based on Service Priorities.

for $m=1,2,{\ldots },M$ do

Input: $Y_{m}^{v}$.

for $i$ = ${{i}_{1}}$, ${{i}_{2}}$, ${{i}_{3}}$ do

Input: ${{\mathcal {L}}_{i,k}}$.

Compute $Y_{m,i,k}^{v,\text{req}}$ by (14).

Compute $Y_{m,i,k}^{v,\text{rem}}$ by (15).

if $Y_{m,i,k}^{v,\text{rem}}< Y_{m,i,k}^{v,\text{req}}$

Let $Y_{m,i,k}^{v}$ = $Y_{m,i,k}^{v,\text{rem}}$;

Let $E_{m,i,k}^{v}$ = 0;

10:

else

11:

Let $Y_{m,i,k}^{v}$ = $Y_{m,i,k}^{v,\text{req}}$.

12:

Let $E_{m,i,k}^{v}$ = 1;

13:

end for

14:

end for

15:

Return $E_{i,k}^{v}$.

1) Vehicular Computing Resources

Different from other application scenarios, the IoV contains a large number of safety-related services. It is necessary to guarantee their resource requirements to avoid traffic accidents, at first. As described in Section III, when vehicular computing resources are sufficient, local execution is the first choice to process the computing tasks of safety-related services. It can avoid unnecessary transmission delay and error. Thus, based on the consideration of service priorities, we first explore vehicular computing resource allocation among slices. Obviously, safety-related services have a higher service priority than non-safety-related services in practical IoV. As for safety-related services, we consider that basic road safety services have a higher service priority than enhanced road safety services. That is because basic driving functions should be guaranteed at first under the condition of insufficient computing resources. To facilitate analysis, the slice for basic road safety services, the slice for enhanced road safety services, and the slice for non-safety services are denoted as ${{i}_{1}}$, ${{i}_{2}}$, and ${{i}_{3}}$, respectively. Indexes are used to reflect their service priority and vehicular computing resources are orderly allocated based on this priority. The specific flow of inter-slice vehicular computing resources configuration is shown in Fig. 4.

Fig. 4.

The flow chart of vehicular computing resources configuration.

Show All

It is noteworthy that the real-time policies of task offloading and resource scheduling at small timescales have little dependence on vehicular computing resource configuration. Furthermore, the computing resources of each vehicle only can be used by themselves. Therefore, service requests can be considered the only influencing factor for vehicular computing resource configuration. As for VUE ${{N}_{m}} (m\ne 0)$, the number of required vehicular computing resources $Y_{m,i,k}^{v,\text{req}}$ for slice $i\in \mathcal {I}$ at period $k$ can be approximately calculated according to: \begin{equation*} Y_{m,i,k}^{v,req}=\left\lceil \frac{\sum \limits _{l\in {{\mathcal {L}}_{i,k}}}{W_{l}^{\max }\cdot {{Z}_{l}}\cdot {{\beta }_{l}}\cdot {{\Lambda }_{(l^{\prime }=m)}}}}{{{f}_{v}}} \right\rceil, \tag{15} \end{equation*} View Sourcewhere $\left\lceil x \right\rceil =\min \lbrace X\in \mathbb {Z}|X\geq x\rbrace$ is the ceiling function of $x$. ${{\mathcal {L}}_{i,k}}$ is the link set of slice $i$ at period $k$. ${{\Lambda }_{({l}^{\prime }=m)}}$ indicates whether the condition $(l^{\prime }=m)$ is satisfied. Specifically, ${{\Lambda }_{({l}^{\prime }=m)}}=1$ denotes VRx ${{N}_{l^{\prime }}}$ of link $l$ is VUE ${{N}_{m}}$; otherwise, ${{\Lambda }_{({l}^{\prime }=m)}}=0$. Since vehicular computing resources are allocated to different slices according to service priorities in turn, the number of vehicular computing resources that each slice can be used to allocate is different. Let $Y_{m,i,k}^{v,\text{rem}}$ be the number of available vehicular computing resources of VUE ${{N}_{m}}$ to slice $i\in \mathcal {I}$, which can be expressed as: \begin{align*} &Y_{m,i,k}^{v,\text{rem}} \\ & =\left\lbrace \begin{array}{llll}Y_{m}^{v}, & i=i{}_{1}; \\ 0, & i\ne i{}_{1};Y_{m,i^{\prime },k}^{v,\text{rem}}< Y_{m,i^{\prime },k}^{v,\text{req}}; \\ Y_{m,i^{\prime },k}^{v,\text{rem}}-Y_{m,i^{\prime },k}^{v,\text{req}}, &i\ne i{}_{1};Y_{m,i^{\prime },k}^{v,\text{rem}}\geq Y_{m,i^{\prime },k}^{v,\text{req}}, \end{array}\right.\\ \tag{16a}\\ \tag{16b}\\ \tag{16c} \end{align*} View Sourcewhere $i^{\prime }\in \mathcal {I}$ refers to the slice whose priority is one level higher than slice $i$. After the number of available and required vehicular computing resources of all slices are determined, the number of allocated vehicular computing resources $Y_{m,i,k}^{v}$ from VUE ${{N}_{m}}$ to slice $i$ at period $k$ can be obtained. Furthermore, to reflect whether the vehicular computing resources requirements of slice $i$ are satisfied, we define the state set of vehicular computing resources allocation at period $k$ as $E_{i,k}^{v}=\lbrace E_{m,i,k}^{v}|\forall m\in [1,M]\rbrace$. $E_{m,i,k}^{v}=1$ indicates the vehicular computing resources requirement of slice $i$ to VUE ${{N}_{m}}$ is satisfied, otherwise $E_{m,i,k}^{v}=0$. The detail of inter-slice vehicular computing resources configuration based on service priorities is described in Algorithm 1.

2) Edge Computing Resources & Radio Communication Resources

In order to ensure that each computing task has a processing location and to avoid wasting resources, the multi-tier computing resources configuration of slice $i\in \mathcal {I}$ meets: \begin{equation*} {{L}_{i,k}}=Y_{i,k}^{v}+Y_{i,k}^{u}+Y_{i,k}^{c}=\sum \limits _{m=1}^{M}{E_{m,i,k}^{v}}+Y_{i,k}^{u}+Y_{i,k}^{c}, \tag{17} \end{equation*} View Sourcewhere ${{L}_{i,k}}=|{{\mathcal {L}}_{i,k}}|$ is the total number of pending computing tasks at period $k$. $Y_{i,k}^{v}$ denotes the number of VUEs with sufficient vehicular computing resources allocated to slice $i$ to process relevant computing tasks. After the vehicular computing resources configuration is determined, the problem of large timescales is transformed into the problem that the InP adjusts the configuration of edge computing resources and radio communication resources among slices.

As described above, we constitute problem P1 in (10) as a contextual MAB problem. Specifically, at the beginning of each period, the InP considered the agent first observes its context in the form of a feature vector. The vector indicates the characteristics of requested services and the allocation status of vehicular computing resources of all slices. At period $k\in [K]$, we define the feature vector for arm ${{C}_{k}}\in \mathcal {C}$ as ${{O}_{k,{{C}_{k}}}}=\lbrace {{o}_{k,{{c}_{i,k}}}}|\forall i\in \mathcal {I} \rbrace$, where ${{o}_{k,{{c}_{i,k}}}}=\lbrace {{L}_{i,k}},\overset{\_}{\mathop {Z}}\;{{}_{i,k}},\overset{\_}{\mathop {\beta }}\;{{}_{i,k}},\overset{\_}{\mathop {A}}\;{{}_{i,k}},Y_{i,k}^{v},{{J}_{i,k}},Y_{i,k}^{u}\rbrace$ is the context of slice $i$. Term ${{\overset{\_}{\mathop {Z}}\;}_{i,k}}$, ${{\overset{\_}{\mathop {\beta }}\;}_{i,k}}$, and ${{\overset{\_}{\mathop {A}}\;}_{i,k}}$ are the average packet payload, average computing tasks workload, and average packet beacon frequency of links in slice $i$, respectively. ${{L}_{i,k}}$ is the total number of pending computing tasks at period $k$. $Y_{i,k}^{v}$ denotes the number of VUEs that allocate sufficient vehicular computing resources to slice $i$. ${{J}_{i,k}}$ is the number of the total RBs allocated to slice $i$. $Y_{i,k}^{u}$ is the number of CPU cores of the MEC server allocated to slice $i$.

Then, the agent chooses to pull an available arm $C$ from the candidate set of multi-tier resources configuration $\mathcal {C}$ with the aid of context information. After pulling an arm, the agent will observe reward ${{V}_{k}}({{C}_{k}})$ from selected arm ${{C}_{k}}$, but the rewards of the other arms are unknown. Over time, the agent aims to collect enough information about the relationship between the context vectors and rewards so that it can predict the next best arm to play by looking at the current context. However, linear contextual bandits make often fail to fit the relationship between the context vectors and rewards in practice. That is because they assume that the expected reward at each period is linear in the feature vector. Thus, in this paper, NeuralUCB is adopted to solve the resource configuration problem.

The key idea of NeuralUCB is to use a neural network $f({{O}_{k,{{C}_{k}}}};{{\omega }_{k-1}})$ to predict the reward of context ${{O}_{k,{{C}_{k}}}}$ and compute upper confidence bounds to guide exploration [28]. Specifically, at period $k$, upper confidence bound ${{P}_{k,{{C}_{k}}}}$ for each arm ${{C}_{k}}\in \mathcal {C}$ can be computed as formula (18) shown at the bottom of the next page, where $g({{O}_{k,{{C}_{k}}}};{{\omega }_{k-1}})$ is the gradient of neural network $f({{O}_{k,{{C}_{k}}}};{{\omega }_{k-1}})$. ${{\omega }_{k-1}}$ is the parameters of the current neural network. $\delta$ and $\delta ^{\prime }$ are the width and depth of the neural network, respectively. $H_{k}^{-1}$ is the inverse of matrix ${{H}_{k}}$. \begin{align*} {{P}_{k,{{C}_{k}}}}= & f({{O}_{k,{{C}_{k}}}};{{\omega }_{k-1}})+{{\mu }_{k-1}}\sqrt{g{{({{O}_{k,{{C}_{k}}}};{{\omega }_{k-1}})}^{\text{T}}}H_{k-1}^{-1}g({{O}_{k,{{C}_{k}}}};{{\omega }_{k-1}})/\delta }, \tag{18}\\ {{\mu }_{k}}= & \sqrt{1+{{\chi }_{4}}{{\delta }^{-1/6}}\sqrt{\log \delta }\delta {{^{\prime }}^{4}}{{k}^{7/6}}{{\chi }_{1}}^{-7/6}}\cdot \left(\vartheta \sqrt{\log \frac{\det {{H}_{k}}}{\det {{\chi }_{1}}\text{I}}+{{\chi }_{5}}{{\delta }^{-1/6}}\sqrt{\log \delta }\delta {{^{\prime }}^{4}}{{k}^{5/3}}{{\chi }_{1}}^{-1/6}-2\log {{\chi }_{2}}}+\sqrt{{{\chi }_{1}}}{{\chi }_{3}} \right) \\ & +({{\chi }_{1}}+{{\chi }_{6}}k\delta ^{\prime }) \left[{{(1-\eta \delta {{\chi }_{1}})}^{\eta ^{\prime }/2}}\sqrt{k/{{\chi }_{1}}}+{{\delta }^{-1/6}}\sqrt{\log \delta }\delta {{^{\prime }}^{7}}{{k}^{5/3}}{{\chi }_{1}}^{-5/3} \left(1+\sqrt{k/{{\chi }_{1}}}\right)\right], \tag{19} \end{align*} View SourceIt is worth noting that the scaling factor ${{\mu }_{k}}$ is composed of two parts. One is the confidence radius which is similar to linear UCB. The other one is the function approximation error which is newly added to adapt to the unknown nonlinear function. The exploration parameter $\vartheta$ is used to control that the choice is inclined to explore or exploit. The larger $\vartheta$ the more inclined the action choice is to explore, otherwise to exploit. The detailed calculation expression of ${{\mu }_{k}}$ is shown in (19) shown at the bottom of this page. Herein, $\eta$ and $\eta ^{\prime }$ are step size and the number of gradient descent steps, respectively. ${{\chi }_{1}}$, ${{\chi }_{2}}$, and ${{\chi }_{3}}$ are the regularization parameter, confidence parameter, and norm parameter, respectively. ${{\chi }_{4}}$, ${{\chi }_{5}}$, and ${{\chi }_{6}}$ are experimental parameters. After upper confidence bounds for all arms are determined, the arm ${{C}_{k}}$ with the largest ${{P}_{k,{{C}_{k}}}}$ is chosen and the agent receives the corresponding reward ${{V}_{k}}({{C}_{k}})$. Then, NeuralUCB will update ${{H}_{k}}$ as \begin{equation*} {{H}_{k}}={{H}_{k-1}}+g({{O}_{k,{{C}_{k}}}};{{\omega }_{k-1}})g{{({{O}_{k,{{C}_{k}}}};{{\omega }_{k-1}})}^{\text{T}}}/\delta . \tag{20} \end{equation*} View SourceAt the end of period $k$, neural network parameter ${{\omega }_{k}}$ is updated by using gradient descent to approximately minimize ${{L}^{\text{NU}}}(\omega)$. We define ${{L}^{\text{NU}}}(\omega)$ as \begin{align*} {{L}^{\text{NU}}}(\omega) &=\sum \limits _{k=1}^{K}{{{(f({{O}_{k,{{C}_{k}}}};\omega) -{{V}_{k}}({{C}_{k}}))}^{2}}/2} \\ &+{\delta {{\chi }_{1}}||\omega -{{\omega }^{(0)}}||_{2}^{2}/2}, \tag{21} \end{align*} View Sourcewhere ${{\omega }^{(0)}}$ is the initial parameter of the neural network. The detail of the NeuralUCB algorithm used to allocate edge computing resources and radio communication resources is described in Algorithm 2 and depicted in Fig. 5.

Fig. 5.

Edge computing resources and radio communication resources allocation based on NeuralUCB algorithm.

Show All

B. Intra-Slice Resource Scheduling

In a period, once the resource configuration is determined, the remaining problem is how to effectively allocate resources from an SP to UEs to maximize the satisfaction of all links. The optimization problem P2 in (14) is difficult to solve since the time-varying nature of the physical layer. Besides, the decisions of task offloading and resource scheduling cause changes in link states (e.g., queue characteristics and channel quality), and service satisfaction also depends on link states. Therefore, we utilize the DRL method based on MDP to solve the proposed intra-slice resource scheduling problem. First, we formulate our problem as an MDP to accurately describe the process of resource allocation and task offloading.

Algorithm 2: NeuralUCB for Edge Computing Resources and Radio Communication Resources Allocation.

Initialization: ${{\omega }_{0}}$, ${{H}_{0}}={{\chi }_{1}} \text{I}$.

for $k=1,2,{\ldots },K$ do

Observe ${{O}_{k,{{C}_{k}}}}(\forall {{C}_{k}}\in \mathcal {C})$.

for ${{C}_{k}}\in \mathcal {C}$ do

Compute ${{P}_{k,{{C}_{k}}}}$ by (18).

Let $ {{C}_{k}}=\arg {{\max }_{{{C}_{k}}\in \mathcal {C}}}{{P}_{k,{{C}_{k}}}}$.

end for

Play ${{C}_{k}}$ and observe reward ${{V}_{k}}({{C}_{k}}).$

Compute ${{H}_{k}}$ by (20).

10:

for $j=0,1,{\ldots },J-1\text{ do}$

11:

$ {{\omega }_{j+1}}={{\omega }_{j}}-\eta \nabla {{L}^{\text{NU}}}({{\omega }_{j}}).$

12:

end for

13:

Return ${{\omega }_{j}}$.

14:

Compute ${{\mu }_{k}}$ by (19).

15:

end for

At each slot $t$ during the $k$-th period, links will send the information of service requests and available resources to their subscribed slice $i\in \mathcal {I}$. We define the state space of link $l\in {{\mathcal {L}}_{i,k}}$ at slot $t$ as ${{s}_{l,t}}=\lbrace {{W}_{l,t}},{{A}_{l,t}},{{Z}_{l}},{{\beta }_{l}},D_{l}^{\max },\varphi _{l}^{\min },r_{l, t}^{v},r_{l,t}^{u},Y_{l,i,k}^{v}\rbrace$. Herein, ${{W}_{l,t}}$ is the queue length, ${{A}_{l,t}}$ denotes the number of instantaneously arriving packets, ${{Z}_{l}}$ is the total packet size, $D_{l}^{\max }$ is the maximum E2E delay, and $\varphi _{l}^{\min }$ is minimum PRR. ${{\beta }_{l}}$ denotes that the input packet requires ${{\beta }_{l}}$ cycles/bit for processing. $r_{l, t}^{v}$ and $r_{l,t}^{u}$ are available rates for V2V and V2I transmission, respectively. $Y_{l,i,k}^{v}$ is the number of vehicular computing resources of VUE ${{N}_{l}}$ allocated to slice $i$. Thus, the state space of slice $i$ at slot $t$ can be defined as: ${{s}_{t}}=\lbrace {{s}_{l,t}}|\forall l\in {{\mathcal {L}}_{i}}\rbrace \in \mathcal {S}$.

For link $l$ at slot $t$, its action space ${{a}_{l,t}}$ contains task offloading action and RBs allocation policy, which can be expressed as ${{a}_{l,t}}=\lbrace {{e}_{l,t}},{{\rho }_{l,{{j}_{1}},t}},{{\rho }_{l,{{j}_{2}},t}},{\ldots }, {{\rho }_{l,{{j}_{{{J}_{i}}}},t}}\rbrace$. Term ${{e}_{l,t}}$ is the offloading action of link $l$ at slot $t$, and ${{\rho }_{l,j,t}}$ indicates whether the $j$-th RB is allocated to link $l$ at slot $t$. Therefore, the action space of slice $i$ can be defined as: ${{a}_{t}}=\lbrace {{a}_{l,t}}|\forall l\in {{\mathcal {L}}_{i}}\rbrace \in \mathcal {A}$.

The goal of resource allocation at this stage is to maximize the satisfaction level of all links within limited resources. Therefore, we set rewards based on the constraint conditions and objective function. After taking action ${{A}_{t}}$, the reward function is defined as: \begin{align*} {{r}_{t}}&={{\ell }_{1}}\cdot \left(\sum \limits _{l\in {{\mathcal {L}}_{i,k}}}{{{U}_{l,t}}} \right) \\ & +{{\ell }_{2}}\cdot \left(\sum \limits _{l\in {{\mathcal {L}}_{i,k}}}{{{\rho }_{l,j,t}}-1} \right)\cdot {{\Lambda }_{\left(\sum \limits _{l\in {{\mathcal {L}}_{i,k}}}{{{\rho }_{l,j,t}}\leq 1} \right)}} \\ & +{{\ell }_{3}}\cdot \left(\sum \limits _{l\in {{\mathcal {L}}_{i,k}}}{\sum \limits _{j\in {{\mathcal {J}}_{i,k}}}{{{\rho }_{l,j,t}}}}-{{J}_{i,k}} \right)\cdot {{\Lambda }_{\left(\sum \limits _{l\in {{\mathcal {L}}_{i}}}{\sum \limits _{j\in {{\mathcal {J}}_{i,k}}}{{{\rho }_{l,j,t}}}}\leq {{J}_{i,k}} \right)}} \\ & +{{\ell }_{4}}\cdot \left(\sum \limits _{l\in {{\mathcal {L}}_{i,k}}}({{{e}_{l,t}}=1})-Y_{_{i,k}}^{u}\right)\cdot {{\Lambda }_{\left(\sum \limits _{l\in {{\mathcal {L}}_{i,k}}}{({{e}_{l,t}}=1)}\leq Y_{_{i,k}}^{u} \right)}} \tag{22} \end{align*} View Sourcewhere ${{\Lambda }_{(\cdot) }}$ indicates whether the condition $(\cdot)$ is satisfied. Specifically, ${{\Lambda }_{(\cdot) }}=0$ denotes the condition $(\cdot)$ is satisfied, otherwise ${{\Lambda }_{(\cdot) }}=-1$.

In the IoV, each slice is regarded as an agent and owns a private neural network. Each agent aims to find the best policy $\pi$ to maximize the expected cumulative reward $\mathbb {E}[{{R}_{t}}|s,\pi ]$ for each state $s$. The cumulative discounted reward can be expressed as \begin{equation*} {{R}_{t}}=\sum \limits _{i=0}^{T-1}{{{\gamma }^{i}}{{r}_{t+i}}}, \tag{23} \end{equation*} View Sourcewhere $\gamma$ is the discount parameter which reflects the importance of future rewards. The value of $\gamma$ is restricted between 0 and 1. A smaller $\gamma$ represents that mostly care about the instantaneous reward. In value-based reinforcement learning, the state-action function ${{Q}^{\pi }}({s,a})$ named as quality function (Q-function) is commonly used to reflect how good policy $\pi$ is when taking action $a$ at current state $s$, denoted as \begin{equation*} {{Q}^{\pi }}({s,a})=\mathbb {E}[{{R}_{t}}|s,\pi ]. \tag{24} \end{equation*} View SourceQ-function $Q(s,a)$ provides the optimum policy ${{\pi }^{*}}$ by selecting action $a$ that maximizes the Q-value for the state $s$: \begin{equation*} {{\pi }^{*}}(s)=\underset{a\in \mathcal {A}}{\mathop {\arg \max }}\;Q(s,a),\forall s\in \mathcal {S}. \tag{25} \end{equation*} View SourceBased on the definitions above, we can seek out the optimal policy ${{\pi }^{*}}$ via the recursive nature of the Bellman equation, \begin{align*} {{Q}^{\pi }}({{s}_{t}},{{a}_{t}})& \leftarrow {{Q}^{\pi }}({{s}_{t}},{{a}_{t}}) \\ &+\alpha \left({{r}_{t}}+\gamma \underset{{{a}_{t+1}}\in \mathcal {A}}{\mathop {\max }}\;{{Q}^{\pi }}({{s}_{t+1}},{{a}_{t+1}}) -{{Q}^{\pi }}({{s}_{t}},{{a}_{t}})\right). \tag{26} \end{align*} View Source

However, in high-dimensional state spaces, the classic Q-learning method cannot efficiently compute the Q-function for all states. To remedy this problem, DDQN improves the Q-learning by combining the neural networks with Q-learning [41]. Specifically, raw data is input into neural networks as the state. Then, the Q-function is approximated by deep neural networks. It is worth noting that DDQN has two separate networks: the main network and the target network. The main network approximates the Q-function, while the target network gives the temporal difference (TD) target for updating the main network. During the training phase, the main network parameters $\theta$ are updated after every action while the target network parameters ${{\theta }^{-}}$ are updated after a certain period. At each iteration, the main Q-network is trained towards the target value by minimizing the loss function. We set a mean-squared error (MSE) loss function. The function can measure how closely the $Q(s,a;\theta)$ comes to satisfy the Bellman equation: \begin{equation*} {{L}^{\text{DDQN}}}(\theta) =\frac{1}{2}\cdot \mathbb {E}\left[{{\left(y_{t}^{\text{DDQN}}-{{Q}^{\pi }}\left({{s}_{t}},{{a}_{t}};{{\theta }_{t}}\right)\right)}^{2}}\right], \tag{27} \end{equation*} View Sourcewhere \begin{equation*} y_{t}^{\text{DDQN}}={{r}_{t}}+\gamma {{Q}^{\pi }}\left({{s}_{t+1}},\underset{{{a}_{t+1}}\in \mathcal {A}}{\mathop {\arg \max }}\;Q({{s}_{t+1}},{{a}_{t+1}};{{\theta }_{t}});\theta _{t}^{-} \right). \tag{28} \end{equation*} View SourceOnce $\lbrace \theta \rbrace$ is determined, our agent will output near-optimal resource allocation strategies and computation offloading decisions using a discrete set of approximate action values. The detail of JORA-DDQN is described in Algorithm 3 and is depicted in Fig. 6.

Fig. 6.

Intra-slice resource scheduling based on JORA-DDQN algorithm.

Show All

SECTION VI.

Performance Evaluation

SECTION Algorithm 3:

JORA-DDQN for Intra-Slice Resource Scheduling.

Initialization: main network weights $\theta$,

target network weights ${{\theta }^{-}}$,

experience replay buffer.

for $episode=1,2,\ldots,E$ do

Receive the initial observation $s$;

for $t=1,2,{\ldots },T$ do

Take action ${{a}_{t}}=\arg {{\max }_{a}}{{Q}^{\pi }}({{s}_{t}},a;\theta)$ with probability

$1-\epsilon$ or a random action with probability $\epsilon$;

Get reward ${{r}_{t}}$ and observe next state ${{s}_{t+1}}$;

10:

Store the experience $({{s}_{t}},{{a}_{t}},{{r}_{t}},{{s}_{t+1}})$ into the

11:

experience replay buffer;

12:

Get a batch $U$ samples $({{s}_{t}},{{a}_{t}},{{r}_{t}},{{s}_{t+1}})$ from the

13:

replay memory;

14:

Calculate the target Q-value $y_{_{t}}^{\text{target}}$ from the target

15:

network by (28);

16:

Update the main network by minimizing the loss

17:

$ {{L}^{\text{DDQN}}}(\theta)$ in (27) and perform a gradient descent

18:

step on ${{L}^{\text{DDQN}}}(\theta)$ with respect $\theta$;

19:

Every G steps, update the target network ${{\theta }^{-}}=\theta$

20:

end for

21:

end for

A. Simulation Environment

In our simulation, we utilize Pytorch 1.10.0 on Ubuntu 18.04.6 LTS to implement the 2Ts-IRMS algorithm and compare it with multiple comparison algorithms. For experimental purposes, a cellular V2X network environment based on the SUMO platform is established, which consists of a real road network, an MBS, and several VUEs and RSUs. Specifically, to fit the reality, we import the road network around the Beijing University of Posts and Telecommunication from OpenStreetMap to SUMO at first [40]. Then, the whole road network is divided into 9 blocks, which is consistent with the road partitioning strategy of the Manhattan case [50]. An RSU is deployed in the center of each block and can communicate with vehicles within its coverage, which is depicted in Fig. 7. In order to reflect traffic in urban regions as much as possible, vehicles randomly choose lanes of departure, positions of departure, and speed of departure to enter the generated road network and follow the car-following model of Krauss and the lane-changing model of LC2013 for movement [37].

Fig. 7.

Real road conditions simulation of Beijing University of Posts and Telecommunications based on SUMO.

Show All

For the communication resources in the IoV, we assume that there are 50 RBs with 180 kHz bandwidth to be allocated. For the computing resources, we let the computation capacity (i.e., CPU frequency) of a single CPU core for the VUEs be ${{10}^{8}}$ cycles/s, and the number of CPU cores for any VUE be uniformly selected from the set $\lbrace 1, 2, 4, 8\rbrace$. Similarly, let the computation capacity of a single CPU core for the RSU be ${{10}^{9}}$ cycles/s, and the number of CPU cores for the RSU be fixed to 8. For the services required by the UEs, we assume that there are six typical V2X use cases. The transmission characteristics and KPIs of the use cases follow Table I.

In addition, to better evaluate how different V2X services affect the network performance, various combinations of services have been considered in the simulations. By selecting these combinations, we want to test if the proposed 2Ts-RL can satisfy the requirements of multiple services, especially for safety-related services. To simplify, we make slices 1, 2, and 3 represent the slice for basic road safety services, the slice for enhanced road safety services, and the slice for non-safety-related services, respectively. The simulation parameters and neural network parameters are summarized in Tables III and IV, respectively. Afterward, we compare the 2Ts-IRMS algorithm with multiple compared algorithms, which are described as follows:

TABLE III Simulation Parameters

TABLE IV Neural Network Parameters

Hierarchical resource allocation schemes: The two-timescale bidding resource management scheme (2Ts-BRMS) adopts the generalized Kelly mechanism (GKM) to address the inter-slice multi-dimensional resource configuration problem and allocates resources to users according to channel quality (CQ) [23]. In the two-timescale fair resource management scheme (2Ts-FRMS), both the InP and SPs adopt the dominant resource fairness (DRF) approach to allocate multi-dimensional resources [49].
Inter-slice resource configuration schemes: The proportional allocation scheme (PA) proportionally allocates resources to slices based on the number of subscribers and average resource requirements [51]. As for the context-aware configuration scheme (CA), it adjusts inter-slice resource configuration based on the localized service requests and traditional UCB algorithm [45].
Intra-slice resource scheduling schemes: As for communication resource allocation, the queue-aware resource allocation strategy (QA) calculates the queue length of each link. The longer the queue length, the more communication resources are allocated. In the fair resource allocation strategy (FA), the communication resources are equally shared by all links. As for computing resource allocation, the local execution scheme (LE), the edge execution scheme (EE), and the cloud execution scheme (CE) make all tasks to be executed on user terminals, MEC servers, and remote cloud servers, respectively [21].

B. Simulation Results

Fig. 8 compares the achieved values of the valuation function of three hierarchical resource allocation schemes under various combinations of services. With the fixed number of VUEs in the cellular network, the proportion of users in slices 1, 2, and 3 iterates through all possible combinations. Compared to other comparison algorithms, 2Ts-IRMS has the highest valuation value while owning more stale performance. That is because it can dynamically adjust resource allocation according to the different number of users and service requests. It is noted that there are two main reasons for the low performance of 2Ts-BRMS. On the one hand, in the phase of inter-slice resource configuration, 2Ts-BRMS equally treats all slices which leads to the failure to meet safety-related services resulting in a greater negative impact. On the other hand, when SPs allocate resources to users, only considering communication resources affects the quality of services, especially for enhanced road safety services. Besides, the value of the valuation function decreases with the increase in the number of users, because the number of resources is limited and cannot meet too many users.

Fig. 8.

Comparison of the achieved valuation of hierarchical resource allocation schemes under various combinations of services.

Show All

To further analyze the impact of the JAMR approach on inter-slice resource configuration, we evaluate the performance of multiple inter-slice resource configuration schemes by adjusting the unit price of a certain service, which is depicted in Fig. 9. When the unit price of other slices remains unchanged, it can be observed that the system revenue increases with the unit price of the current service. Significantly, JAMR can maintain the highest valuation at any price setting, which further validates its self-adaptive capability. For the three slices proposed, JAMR provides a gain of 24%, 40%, and 76% with respect to DRF, GKM, and CA on average, respectively. The CA scheme with limited fitting ability has a lower revenue since its performance is seriously affected by the nonlinear problem. Furthermore, the fluctuation of the system revenue in the slice for non-safety-related services is obviously smaller than in other slices. The reason for this phenomenon is that the utility of non-safety-related services has a smaller impact on the system performance. Similarly, Fig. 10 depicts the system revenue of multiple inter-slice resource configuration schemes under different punishing values of services. Although the increase in the penalties of services leads to a decrease in the system revenue, JAMR still maintains the highest revenue no matter how the penalties change.

Fig. 9.

Comparison of system revenue of multiple inter-slice resource configuration schemes under different unit prices of services. (a) Revenue under different unit prices of basic road safety services; (b) Revenue under different unit prices of enhanced road safety services; (c) Revenue under different unit prices of non-safety related services.

Show All

Fig. 10.

Comparison of system revenue of multiple inter-slice resource configuration schemes under different punishing values of services. (a) Revenue under different punishing values of basic road safety services; (b) Revenue under different punishing values of enhanced road safety services; (c) Revenue under different punishing values of non-safety related services.

Show All

Fig. 11 shows the number of communication resources (i.e., RBs) and edge computing resources (i.e., CPU cores) allocated to different slices under different inter-slice resource configuration schemes. When the number of vehicles remains unchanged at 20 and the proportion of users in slices 1, 2, and 3 is 2:2:1, JAMR assigns 10 RBs to slice 1, 28 RBs to slice 2, and 12 RBs to slice 3. At the same time, 25% of the edge computing resources are allocated to slice 3 and all remaining edge computing resources are allocated to slice 2. Significantly, although the PA scheme allocates enough resources to the slice for enhanced road safety services, the performance of other slices is seriously compromised.

Fig. 11.

Resource configuration among slices under different inter-slice resource configuration schemes. (a) Allocated proportion of radio blocks; (b) Allocated proportion of edge CPU cores.

Show All

After determining the resource configuration policy for all slices, each SP allocates obtained resources to its subscribers to maximize the long-term satisfaction of all links. To guarantee the isolation among slices, each SP is equipped with an exclusive agent to implement resource scheduling among users based on the proposed JORA-DDQN scheme. To illustrate the convergence performance of the JORA-DDQN scheme in different slices, we plot the variation trend of the SLA violation probability over training episodes for each slice in Fig. 12. In this paper, the SLA mainly refers to delay and reliability. At the beginning of the training, the value of the SLA violation probability is high. With the increase of training episodes, the SLA violation probability gradually decreases. After 1500 episodes, the SLA violation probability is leveling off, which means that all of the slices have converged. Moreover, the slice for enhanced road safety services has the lowest SLA violation probability, which is consistent with KPIs requirements in Table I.

Fig. 12.

Convergence performance of the proposed JORA-DDQN in SLA violation probability.

Show All

Fig. 13 depicts the performance of links in the slice for basic road safety services during an episode. It consists of the cumulative distribution functions (CDF) of packet delay, CDF of packet reception ratio, and cumulative satisfaction of links. In view of the characteristics of basic road safety services (i.e., small packet size and high timeliness and reliability), it is more important to allocate radio resources than computing resources. That is because most tasks are sufficient to be processed at terminal devices without occupying the computing resources of the MEC server or remote cloud servers. Thus, the DRF, CA, QA, and FA are selected as benchmark schemes to compare with the JORA-DDQN approach. Meanwhile, to ensure fairness, the task offloading policy of benchmark schemes is consistent with JORA-DDQN. Obviously, due to the flexible resource management paradigm, the proposed JORA-DDQN scheme significantly outperforms benchmark schemes whether in terms of delay, reliability, or cumulative satisfaction. Specifically, the average packet delay of link is 93.033 ms and the algorithm can maintain the packet reception ratio of link at least 90%. Besides, JORA-DDQN has the highest service satisfaction and provides a gain of 50% with respect to DRF.

Fig. 13.

Performance indicators of link in the slice for basic road safety services under different intra-slice resource scheduling schemes. (a) CDF of packet delay of link; (b) CDF of packet reception ration of link; (c) Cumulative satisfaction of links.

Show All

As for the slice for enhanced road safety services, the packet size is much larger than basic road safety services, and more CPU cycles are needed to process data. The computing resources of the terminal devices are insufficient to support the simultaneous processing of numerous data. It is necessary to access the MEC server. Thus, the DRF, CA, LE, and EE are selected as benchmark schemes to compare with the JORA-DDQN approach. Similarly, to ensure fairness, the task offloading policy of CA and the RB scheduling policy of LE and EE are consistent with JORA-DDQN. Fig. 14 depicts the performance of links in the slice for enhanced road safety services during an episode. In the proposed scheme, the average packet delay of link is 9.608 ms and the algorithm can maintain the packet reception ratio of link at least 99%. Meanwhile, JORA-DDQN is able to maintain the highest service satisfaction and provides a gain of 52% with respect to DRF. Notably, the LE scheme with the worst performance fails to meet the KPIs requirements of most links.

Fig. 14.

Performance indicators of link in the slice for enhanced road safety services under different intra-slice resource scheduling schemes. (a) CDF of packet delay of link; (b) CDF of packet reception ration of link; (c) Cumulative satisfaction of links.

Show All

As described in Section VI-A, the slice for non-safety-related services is expected to offload computing tasks to the MEC server or remote cloud servers. Thus, the DRF, CA, EE, and CE are selected as benchmark schemes to compare with the JORA-DDQN approach. Considering that the slice has a low sensitivity to the reliability, we only draw the curves of delay and satisfaction in Fig. 15. It is observed that JORA-DDQN can effectively use limited resources to reduce task execution delay as much as possible. The CE scheme has poor performance because it will generate additional communication delays.

Fig. 15.

Performance indicators of link in the slice for non-safety related services under different intra-slice resource scheduling schemes. (a) CDF of packet delay of link; (b) Cumulative satisfaction of links.

Show All

SECTION VII.

Conclusion

In this paper, we propose three types of network slicing to accommodate diversified V2X services over a common physical infrastructure. Specifically, the slice for basic road safety services is used to fulfill the need for imminent warning to nearby entities in time; the slice for enhanced road safety services aims to achieve a higher level of automatic driving; the slice for non-safety related services focuses on improving driving comfort and efficiency of users. Furthermore, in order to take full advantage of multi-tier resources and consider time-varying network conditions, a novel dual timescale intelligent resource management scheme is proposed. First, at the beginning of each period, the InP jaggedly tunes multi-tier resource configuration among slices to improve system revenue. Then, constrained by limited resources obtained from the InP, each SP carries out real-time task offloading and resource scheduling decisions to maximize the long-term service satisfaction of all users. Finally, based on the effect of the action on the states, we propose JAMR and JORA-DDQN algorithms for learning the optimal strategies of proposed problems, respectively. Simulation results show that our proposed 2Ts-IRMS can effectively guarantee the performance requirements of users and improve the system revenue compared with the benchmark algorithms.

Cites in Papers - |

Cites in Papers - IEEE (1)

Select All

Rongxin Han, Jingyu Wang, Qi Qi, Dezhi Chen, Zirui Zhuang, Haifeng Sun, Xiaoyuan Fu, Jianxin Liao, Song Guo, "Dynamic Network Slice for Bursty Edge Traffic", IEEE/ACM Transactions on Networking, vol.32, no.4, pp.3142-3157, 2024.

Show Article

Google Scholar

References is not available for this document.

MIT Libraries

MIT Libraries

Slice Sandwich: Jagged Slicing Multi-Tier Dynamic Resources for Diversified V2X Services

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

Introduction

Related Work

A. Network Slicing for V2X Services

B. Resource Allocation for Network Slicing

C. DRL-Enabled Network Slicing

System Model

A. Network Model

B. Multi-Tier Resources Model

C. Signal Transmission Model

D. Task Offloading Model

E. Key Performance Indicators

Problem Formulation

A. Large Timescale Problem Formulation

B. Small Timescale Problem Formulation

Dual Timescale Intelligent Resource Management Scheme

A. Inter-Slice Resource Configuration

Algorithm 1: Inter-Slice Vehicular Computing Resources Configuration Based on Service Priorities.

1) Vehicular Computing Resources

2) Edge Computing Resources & Radio Communication Resources

B. Intra-Slice Resource Scheduling

Algorithm 2: NeuralUCB for Edge Computing Resources and Radio Communication Resources Allocation.

Performance Evaluation

JORA-DDQN for Intra-Slice Resource Scheduling.

A. Simulation Environment

B. Simulation Results

Conclusion

Cites in Papers - IEEE (1) | Other Publishers (0)

Cites in Papers - IEEE (1)

References

Cites in Papers - |