Introduction
Vehicle-to-everything (V2X) communication is a key paradigm which enables seamless communication between neighboring road users (including vehicles, pedestrians, and infrastructure) and fosters advancements in road traffic efficiency and provision of innovative services such as autonomous driving, and onboard rich media entertainment [1], [2], [3]. V2X applications exhibit unique characteristics such as varied payload sizes and are associated with different types of traffic with different priorities, as well as various quality of service (QoS) requirements, including maximum tolerable latency, reliability, and data rate. The primary concern of V2X applications, particularly those related to safety, is related to the latency, which decreases safety levels as the delays in receiving safety information increase. However, certain V2X applications, such as specific vehicular tasks like computation offloading, may place a greater emphasis on other QoS requirements than on latency in applications such as sharing high-definition maps, augmented reality and virtual reality, online gaming, etc. [4], [5]. A possible approach to increasing QoS is to integrate more advanced computing and storage resources into vehicles [6]. However, the limited physical space and the high costs associated with providing these additional resources make it challenging to ensure efficient and stable execution of any associated onboard applications [7]. The latter cannot be accommodated without increasing manufacturing costs, which is not desirable [8].
Traditionally, cloud computing has been utilized to handle computationally intensive tasks in mobile networks. However, cloud computing yields high response times, which are unsuitable for dynamic and latency-critical environments such as vehicular networks [9]. To address these issues, multi-access edge computing (MEC) has emerged as a promising solution, particularly in the context of vehicular networks, where it is referred to as Vehicular Edge Computing (VEC) [9], [10]. VEC effectively deploys several computing and storage resources in close proximity of vehicles, making use of base stations (BS) and roadside units (RSU) to deliver robust V2X services. Although VEC servers deployed on RSUs and BSs can reduce connectivity latency due to their proximity to vehicles, their limited computational and communication resources require optimized resource management strategies. Furthermore, the limited coverage range of RSUs restricts the number of vehicles that can access their V2X services without incurring additional delays, mostly due to the VEC handover and migration of services [9], [11]. To address these limitations, vehicles with sufficient resources can also be employed as VEC servers to support V2X applications [4], [7], [12]. However, the mobility of vehicles increases the complexity of service delivery in such a context. Furthermore, sensitive and private vehicle information, as well as data migration between different VEC servers, present potential security breaches and data leak vectors [13]. As a result, V2X service delivery is a multifaceted and intricate issue that must be carefully managed.
Blockchain has been integrated with VEC to ensure the security and privacy of V2X applications [14], [15], [16]. By incorporating blockchain technology into VEC to support V2X services, it is possible to create a highly effective and secure data-sharing infrastructure among VEC servers. This system not only facilitates the provision of information about adjacent service providers to vehicles, but it also improves collaboration and security within the network, enhancing the delivery of V2X services. However, the consensus mechanism employed by blockchain introduces additional energy consumption and delays in the vehicular network [17]. In a public blockchain, every distributed node is obliged to participate in the consensus procedure, resulting in longer duration of both block generation and verification, and generating higher energy consumption. These increased delay and energy consumption are not suitable for the energy-constrained and delay-sensitive vehicular networks. As a result, some studies [18], [19] have opted for a permissioned blockchain approach within the VEC system. Through this approach, only a distinct subset of nodes is given permission to participate in the blockchain consensus process, resulting in a notably faster overall procedure.
While researchers started to investigate the potential of blockchain technology in enhancing data security within VEC systems, to the best of our knowledge, its potential was not yet explored within a versatile, all-encompassing V2X service delivery platform. Furthermore, none of the existing solutions take into account the critical aspect of traffic prioritization of V2X applications for obtaining high QoS levels in the context of low latency and reliable service delivery.
In this context, this paper makes the following contributions:
Proposes a novel framework called BEVEC: Blockchain-Enabled Vehicular Edge Computing for secure, performance-oriented V2X service delivery. BEVEC prioritizes V2X application traffic and delivers services in a specified time, ensuring that critical applications receive the resources they need and are executed on time.
Introduces a dual-layer verification process inside BEVEC to ensure secure and reliable delivery of general-purpose V2X services. The first layer, local verification, guarantees the accuracy of exchanged data, while the permissioned blockchain of the second layer checks for data integrity. This two-tier approach provides comprehensive security for V2X services, ensuring the authenticity and reliability of information exchanged in the vehicular environment.
Proposes a novel system utility function that takes into account three key factors: consumed energy, exchanged data size, and priority of V2X applications traffic. This function serves as a measure of system performance and also as a basis for selecting block verifier nodes in the consensus mechanism. The goal is to achieve reliable and low-latency delivery of V2X services.
Describes a novel Deep Reinforcement Learning (DRL) algorithm, named 3DPER, to improve the performance of BEVEC in terms of energy consumption, latency, and service delivery success rate. Simulations show how 3DPER outperforms existing methods in terms of these metrics, achieving an average of 18% latency reduction, 38% improvement in successful service delivery, and 65% decrease in energy consumption.
The structure of this paper is as follows. In Section II, we dive into related work. Section III introduces the system model and its underlying assumptions. The proposed solution is elaborated upon in Section IV, and Section V provides an in-depth analysis of performance. Section VI wraps up the paper with our key conclusions. To ease readability, Table I compiles a list of the major notations used in this paper.
Related Work
In this section, we explore the current research landscape concerning the provision of V2X services within a VEC-based environment, categorizing the literature into several key areas. A significant portion of the available research focuses on different aspects of service delivery, including task scheduling, computation offloading, handover between VEC servers, security considerations, blockchain integration, and other unaddressed challenges.
A. Task Scheduling and Computation Offloading in VEC Environments
Researchers have explored extensively optimization algorithms, game-theoretic frameworks, and distributed decision-making approaches to manage efficiently task distribution and resource allocation in VEC networks. For example, Gao et al. [20] proposed a two-layer optimization algorithm for joint task offloading and resource allocation in VEC networks, considering QoS constraints. They combine DRL and convex optimization methods to optimize energy and delay reduction. Zhao et al. [21] proposed a game-theoretic collaborative computation offloading architecture that integrates edge and cloud computing. The architecture jointly optimizes computation offloading and resource allocation to maximize system utility while minimizing task processing delay. However, the authors restricted their VEC environment model to a single VEC scenario, which does not accurately reflect the diverse range of operational conditions and complexities present in real-world VEC deployments.
Some studies have expanded their focus to include scenarios that involve multiple VECs. For example, Luo et al. [22] developed a model for a multi-vehicle, multi-VEC computation offloading framework. They proposed a self-learning distributed computation offloading approach that formulates the computation offloading problem as a distributed decision-making game, where each vehicle is a player that seeks to minimize its overall cost, which includes latency and offloading expenses. Shang et al. [23] proposed a combined deep learning and convex optimization method to reduce energy consumption during edge offloading in a scenario with multiple vehicles and roadside edge servers. Ning et al. [24] proposed a VEC framework that optimizes partial computation offloading and uses an adaptive task scheduling algorithm to maximize system-wide profit. They consider the selfish behavior of vehicles and utilize game theory to demonstrate the existence and optimality of the Nash equilibrium. Li et al. [25] proposed a DRL-based method to optimize task completion time and energy consumption in VEC considering the priority of tasks.
Despite the introduction of multi-VEC server scenarios and consideration of vehicle mobility in these studies, the researchers overlooked the crucial aspect of handovers between VEC servers to complete in-progress applications and the possible necessity for retransmissions or addressing failures.
In light of this challenge, researchers have shown significant interest in addressing handover between VEC servers. Specifically, [26], [27], [28] have concentrated on the task offloading and migration of individual vehicles in straightforward scenarios, with their primary emphasis placed on service handover. Unfortunately, these studies did not adequately address the importance of resource competition among multiple vehicles in real-world VEC environments. The challenges of handover in multi-vehicle scenarios have only been explored in a few other research papers [29], [30], [31], [32]. Li et al. [29] proposed a DRL-based vehicular task scheduling algorithm to minimize system cost while prioritizing tasks based on deadlines and dependencies. The authors modeled the scheduling problem as a Markov Decision Process (MDP) to address the challenges associated with the dynamic environment of vehicular networks. Dai et al. [30] presented a model for uploading and migrating tasks in an edge-cloud collaborative architecture, with the aim of minimizing the latency of task execution. The authors devised a probabilistic computation offloading approach to optimize this process and achieved optimal results through iterative means. Ma et al. [31] developed a solution to address challenges associated with highly dynamic vehicular network environments. The researchers introduced an enhanced heterogeneous earliest finish time algorithm that relies on gradient routing to address the problem of joint optimization of computation offloading and routing. This algorithm aims to maintain alignment between a vehicle's movement direction and the migration direction of tasks during offloading. By efficiently offloading tasks onto edge nodes along the routing path, this approach minimizes migration expenses and optimizes the distribution of computing resources across the network.
However, existing studies have focused mainly on the direct migration of vehicular data, disregarding the issues of data security and handover. This oversight is concerning because if the security of data exchange between vehicular network entities is not effectively ensured, it can pose serious threats to user privacy and even driving safety [33]. To address this critical issue, it is imperative to integrate robust data security measures and seamless handover protocols within the proposed solutions in this space.
B. Blockchain Integration for Secure V2X Services
Blockchain technology, originally conceptualized as the backbone of cryptocurrencies, has rapidly emerged as a versatile tool for secure data sharing and decentralized consensus mechanisms. Its decentralized nature and cryptographic principles make it highly resilient to tampering and fraud, thus gaining popularity beyond its initial applications. With the potential to enhance data security, trust, and transparency, blockchain has gained attention in the field of edge computing and vehicular networks [34]. Blockchain functions as a distributed ledger that records transactions across multiple nodes in a network. Each transaction, or block, is cryptographically linked to the previous one, forming an immutable chain of data blocks. This decentralized architecture eliminates the need for a central authority, mitigating single points of failure and reducing the risk of data manipulation or unauthorized access.
In recent years, researchers have explored the integration of blockchain technology into V2X systems to address security and trust issues inherent in vehicular networks. By leveraging blockchain, V2X services can achieve secure and transparent data sharing among vehicles, infrastructure, and other network entities. In [35], a blockchain-based framework for secure V2X data processing was proposed to address challenges related to efficient energy usage and resource utilization by utilizing edge servers to reduce latency. However, the proposed framework does not consider how the performance of the system could be affected by different types of messages, especially those that have strict latency requirements. This could potentially limit the effectiveness of the framework in certain scenarios. Zhang et al. [36] proposed a blockchain-based, hierarchical VEC platform. This approach uses a trust model to secure vehicle communication links, and the blockchain system manages the entire architecture. The aim is to optimize MEC performance while ensuring blockchain consensus. The authors modeled a joint optimization problem as an MDP and proposed a deep compressed neural network scheme to solve it.
Cui et al. [37] introduced a blockchain-based VEC platform to improve the efficiency and security of computing. They proposed a centralized controller equipped with a heuristic algorithm to optimize computation delay and implemented the platform in a real-world scenario. Zheng et al. [38] proposed a secure computation offloading framework for a blockchain-based vehicular network. The framework consists of a hierarchical architecture for security, a trusted access control mechanism using smart contracts, and a dynamic offloading solution based on DRL. The framework is designed to address the security and privacy challenges of offloading computation tasks to untrusted servers, and it provides a dynamic solution for optimizing offloading decisions and resource allocation. Ren et al. [39] presented a two-layer distributed SDN architecture with an integrated VEC platform using blockchain to enhance delay-sensitive applications and reduce energy consumption, achieved through a DRL-based algorithm. To incentivize resource sharing for V2V computation offloading, Shi et al. [40] proposed a blockchain-enabled framework using dynamic pricing and DRL. A key innovation is the integration of a dynamic pricing scheme, where the task vehicle pays a service price proportional to the computation size of the selected service vehicle executing the offloaded task. Pricing, along with carefully designed utility functions for both vehicles, provides short-term incentives for resource contribution. Furthermore, the reliability of vehicles in resource allocation, evaluated from historical offloading transactions recorded on the blockchain, is used for service vehicle selection and consensus node rewards. Vehicles with higher reliability have a higher chance of being selected for offloading and obtaining rewards, thus providing long-term incentives to maintain high reliability. The DRL-based task allocation algorithm is used to dynamically determine the service price and service vehicle to maximize the long-term utility of the task vehicle. This framework combines pricing incentives, DRL-based adaptation, and blockchain-enabled reliability management to incentivize resource sharing for V2V offloading in a secure and reliable manner. Liu et al. [19] introduced a blockchain-secured VEC framework for V2V resource trading, aiming to maximize system utility through incentivizing selfish vehicles in a decentralized architecture. Wang et al. [18] proposed a consortium blockchain solution to improve security and incentivize resource sharing in VEC. Their approach includes multi-step smart contracts for secure resource sharing and contract-based incentives to maximize utility and social welfare for VEC participants. Lang et al. [7] introduced a blockchain-based cooperative computation offloading framework to enhance the security of V2I and V2V computation offloading. Their approach included a combined consensus mechanism for secure information sharing between resource-idle vehicles. The authors also developed a cooperative computation offloading game to validate the effectiveness of their decision-making process.
As it can be seen, the blockchain technology has already been employed for enhancing data security within VEC systems. However, none of the proposed approaches explored blockchain's potential within a versatile, all-encompassing V2X service delivery platform. Furthermore, the aspect of traffic prioritization of V2X applications is not addressed in any existent approaches. This is critical in the quest to obtain low latency and reliable service delivery QoS.
This paper proposes a holistic framework for secure delivery of V2X services by integrating the blockchain with VEC. Our approach guarantees the reliability of services through optimization of a priority and energy-aware utility function.
System Model
This section outlines the proposed BEVEC framework architecture and then it presents an analysis of the latencies involved and describes the proposed utility function aimed at achieving reliable and low latency V2X service delivery.
A. BEVEC Description
The BEVEC framework consists of three layers: the vehicle layer, the edge layer, and the blockchain layer, as shown in Fig. 1.
The vehicle layer is composed of
Every V2X application is represented in terms of exchange of messages
The traffic priority and delay sensitivity of
Leveraging VEC enables efficient offloading of computation-intensive messages, facilitating prompt and reliable V2X service delivery. This not only reduces latency, but also enables real-time data processing and analysis for V2X applications, highlighting the critical role of VEC in ensuring seamless and efficient V2X communication.
1) BEVEC's Dual Layer Verification Process
is ensured by i) a local verification process, on one hand, and ii) the blockchain layer, on the other hand. When a vehicle broadcasts its message, all nearby vehicles, RSUs, and BSs within its communication range will receive it. To ensure privacy and message integrity, only RSUs or BSs are granted access to the message content.
The local verification process initiates when a RSU receives a message from a nearby vehicle that requires accuracy verification. To validate the accuracy of the broadcasted message, at least two other vehicles in proximity must participate by sending confirmation messages. If the local verification process is successful and receives confirmation from at least two other vehicles, indicating the message's accuracy, the message is securely recorded in the blockchain as a valid transaction. If the local verification process fails, the RSU waits until the specified expiration time of the local verification period. If insufficient confirmations are received within this timeframe, the message is disregarded. For unicasted messages, the RSU's role is to transmit message accuracy acknowledgments to nearby vehicles and await their responses. Validation is achieved through the successful receipt of acknowledgments from nearby vehicles, typically requiring acknowledgment from at least two-thirds of the nearby vehicles. If the RSU receives the requisite acknowledgments confirming the message's accuracy, it considers the message a valid transaction and adds it to the blockchain. Conversely, if there is an insufficient number of successful acknowledgments within the designated time frame, the RSU disregards the message.
The blockchain layer offers each network entity a unique identification consisting of a public key and a private key. This unique identification enables the entities to utilize asymmetric cryptography for secure message transmission. By incorporating these cryptographic techniques, the blockchain layer ensures the confidentiality and integrity of messages exchanged within the network.
A unicasted message originating from the network entity
\begin{equation*}
\mathcal {M}^{U}_{ij} = E_{{PK}_{j}} (\mathcal {M}_{ij} || ts || Sig_{i}), \tag{1}
\end{equation*}
However, periodic transmission of cooperation information is vital in vehicular networks, where each vehicle must regularly broadcast information such as its current location, speed, direction of movement, and available resources. To achieve this, a broadcast message using a unique key pair, known as the broadcast key pair, is employed for transmission as follows.
\begin{equation*}
\mathcal {M}^{B}_{ij} = E_{{PK}_{B}} (\mathcal {M}_{ij} || ts || Sig_{i}). \tag{2}
\end{equation*}
2) BEVEC's Consensus Mechanism
The security and immutability of data in the blockchain are typically upheld by a consensus mechanism, which is a set of rules that all nodes in the network must follow in order to agree on the state of the ledger. However, this mechanism can introduce delays to the network, as it can take time for all nodes to reach a consensus on a new block. Delegated Proof of Stake (DPoS) is a consensus mechanism that addresses this issue by relying on a voting and selection process to choose a small number of delegates to validate blocks. This reduces the time and energy required to reach consensus, while still securing the blockchain against centralization and malicious activities [10]. The BEVEC framework's blockchain layer consensus mechanism builds on the foundational principles of DPoS and comprises two fundamental elements: delegate selection and block production and verification.
A subset of blockchain nodes denoted
Delegate node selection: Nodes with larger diameters (higher collected utility) have a greater probability of being chosen.
Note that the BEVEC framework's dual layer mechanism comes with a cost. This cost is due to the communication overhead imposed by the local verification process and the blockchain's consensus mechanism and it is reflected in the next section on Latency Analysis. However, the dual-layer mechanism is required to provide an additional level of security and reliability.
B. Latency Analysis
Fig. 4 illustrates the steps involved in sending a message from vehicle
The overall delay can be expressed as follows:
\begin{equation*}
T_{\mathcal {M}_{ij}} = \tau _{ij} + \tau ^{proc}_{j} + \tau ^{verf}_{j} + x_{j}\tau ^{bv}_{j} + (1-x_{j}) T_{\mathcal {M}_{jj\prime}}, \tag{3}
\end{equation*}
The transmission latency can be expressed as follows [40]:
\begin{equation*}
\tau _{ij} = \frac{\mathcal {S}(\mathcal {M}_{ij})}{R_{ij}}, \tag{4}
\end{equation*}
\begin{equation*}
R^{\infty}_{ij} = B \log _{2}(1 + \Gamma _{ij}), \tag{5}
\end{equation*}
\begin{equation*}
\Gamma _{ij} = \frac{p_{ij} |h_{ij}|^{2} d_{ij}^{-\nu}}{\sigma ^{2}}, \tag{6}
\end{equation*}
\begin{equation*}
R_{ij} = R^{\infty}_{ij} - \sqrt{\frac{U_{ij}}{\mathcal {S}(\mathcal {M}_{ij})}}Q^{-1}(\epsilon), \tag{7}
\end{equation*}
\begin{align*}
Q(x) &= \frac{1}{2\pi}\int _{x}^{\infty} e^{-\frac{t^{2}}{2}} dt, \tag{8}\\
U_{ij} &= 1 - \frac{1}{(1+\Gamma _{ij})^{2}}. \tag{9}
\end{align*}
\begin{equation*}
\tau ^{proc}_{j} = \frac{\mathcal {S}(\mathcal {M}_{ij})}{P_{ij}}. \tag{10}
\end{equation*}
\begin{align*}
\tau ^{verf}_{j} = \min & \left.\Bigl (T_{timeout_{j}}, 2\times \max \lbrace \tau _{j1}, \tau _{j2}, \ldots, \tau _{jK} \rbrace, \right. \\
&\left. \max \lbrace \tau ^{proc}_{1}, \tau ^{proc}_{2}, \ldots, \tau ^{proc}_{K}\rbrace \right.\Bigl). \tag{11}
\end{align*}
\begin{equation*}
\tau ^{bv}_{j} = \tau ^{bb}_{j} + \tau ^{cv}_{j} + \tau ^{bc}_{j}. \tag{12}
\end{equation*}
\begin{equation*}
\tau ^{bb}_{j} = \max _{j\prime \in \hat{\mathcal {N}}\backslash\\
\lbrace j\rbrace} \tau _{jj\prime}, \tag{13}
\end{equation*}
\begin{equation*}
\tau ^{cv}_{j} \!= \max _{j\prime,j^{\prime \prime}\in \hat{\mathcal {N}}\backslash\\
\lbrace j\rbrace, j\prime \ne j^{\prime \prime}} \!\lbrace \tau _{j^{\prime \prime}j\prime} \!+ \tau ^{proc}_{j\prime}(\mathcal {M}_{jj\prime})\! +\tau ^{proc}_{j\prime}(\mathcal {M}_{j^{\prime \prime}j\prime})\rbrace . \tag{14}
\end{equation*}
\begin{equation*}
\tau ^{bc}_{j} = \max _{j\prime \in \hat{\mathcal {N}}\backslash\\
\lbrace j\rbrace} \tau _{j\prime j}. \tag{15}
\end{equation*}
\begin{equation*}
E_{ij} = p_{ij}\tau _{ij} + p_{0} \left(\tau ^{proc}_{j} + \tau ^{verf}_{j} + x_{j}\tau ^{bv}_{j} \right). \tag{16}
\end{equation*}
C. Proposed Utility Function and Problem Formulation
The core objective of the BEVEC framework is to minimize energy consumption while maintaining security and privacy during data exchange. However, achieving this objective requires the implementation of an incentive mechanism, particularly for participants within the vehicle layer. The utility function defined to meet the aforementioned objective is:
\begin{equation*}
\mathbb {I}_{ij}(E_{ij},\mathcal {S}(\mathcal {M}_{ij}),\rho _{ij}) = \Bigl (\omega _{1}|I^{E}_{ij}|^{2} + \omega _{2}|I^{S}_{ij}|^{2} + \omega _{3}|I^\rho _{ij}|^{2}\Bigl)^\frac{1}{2}, \tag{17}
\end{equation*}
The utility functions are extended in the following manner:
\begin{align*}
I^{E}_{ij} &= e^{a_{E} \left(\frac{E_{th} - E_{ij}}{E_{th}}\right)}, \tag{18}\\
I^{S}_{ij} &= e^{a_{S} \left(\frac{S_{ij} - S_{th}}{S_{th}}\right)}, \tag{19}\\
I^\rho _{ij} &= e^{a_\rho \left(\frac{\rho _{ij} - \rho _{th}}{\rho _{th}}\right)}, \tag{20}
\end{align*}
The problem of utility maximization can be expressed as:
\begin{align*}
\mathrm{P1:} \quad \max &\sum _{i \in \mathcal {P}} \sum _{j \in \mathcal {P} \backslash \lbrace i\rbrace} \alpha _{ij}\mathbb {I}_{ij}, \tag{21a}\\
\mathrm{s.t.} \quad &\mathrm{C1:} \sum _{j \in \mathcal {P} \backslash \lbrace i\rbrace} T_{\mathcal {M}_{ij}} \leq T^{max}_{i}, \quad \forall {i} \in \mathcal {P} \tag{21b}\\
&\mathrm{C2:} \alpha _{ij} \in \lbrace 0,1\rbrace, \quad \forall {i} \in \mathcal {P}, \forall {j} \in \mathcal {P} \backslash \lbrace j\rbrace \tag{21c}
\end{align*}
Problem P1 illustrates the interaction between the utility function and performance metrics of the BEVEC. Utility functions are inversely proportional to energy consumption, which means that higher energy consumption leads to lower utility. Additionally, condition C1 specifies that messages must be delivered within a specific timeframe to receive the associated utility. As a result of this requirement, low latency is preferred. In addition to latency reduction, meeting this deadline increases the probability of successful message delivery, since lower energy consumption naturally results in lower latency.
Proposed DRL Approach
Problem P1 presents a challenge due to the binary variable
Given the challenges associated with decision-making in problem P1, it is essential to handle high-dimensional and time-varying features appropriately. However, conventional models-based algorithms, such as greedy and meta-heuristic algorithms, are not suited for scaling up in large applications due to the requirement for near-complete information [43]. To overcome scalability and adaptability limitations, an efficient model-free solution based on DRL is proposed.
Before exploring the proposed approach, it is essential to establish the RL-compatible version of the problem, which encompasses the state and action spaces and the reward function. In the following, the set
A. State
The state space is the reflection of the observed vehicular environment. Let
\begin{equation*}
s_{t} = \lbrace \mathbf{M}(t), \mathbf{R}(t), \mathbf{F}(t), \boldsymbol{\Lambda}(t) \rbrace, s_{t} \in \mathcal {S}, \tag{22}
\end{equation*}
is the flattened vector of\mathbf{M}(t) message matrix, representing the set of message pairs at time period|\mathcal {P}| \times |\mathcal {P} - 1 | ;t is the rate vector with the same dimensions as the message vector, representing communication data rates between different entities of the vehicular network at time period\mathbf{R}(t) ;t is a vector that represents the current processing capability of each entity at time period\mathbf{F}(t) ;t is a\boldsymbol{\Lambda}(t) flattened matrix and contains elements|\mathcal {P}| \times |\mathcal {P} - 1 | . These elements represent the relative movement between vehicles and other network's entities at time period\lambda _{ij} \in \lbrace -1, 0, 1\rbrace , where a value of 1 indicates that they are approaching each other,t indicates that they are moving away from each other, and 0 indicates that their positions are relatively fixed.-1
B. Action
Let
\begin{equation*}
a_{t} = \operatorname{vec}\left(\left[\begin{array}{lcc}\alpha _{12}, \alpha _{13} & \cdots & \alpha _{1P}\\
\vdots & \ddots & \vdots \\
\alpha _{P1}, \alpha _{P2} & \cdots & \alpha _{P(P-1)} \end{array}\right] ^{T} \right). \tag{23}
\end{equation*}
C. Reward Function
Once the action
\begin{equation*}
\psi _{t}(s_{t},a_{t}) = {\begin{cases}\mathbb {I}_{ij}, & \text{if} \text{C1} \cap \text{C2}\\
-\Upsilon, & \text{otherwise} \end{cases}} \tag{24}
\end{equation*}
The optimal solution of problem P1 is achieved by maximizing the expected cumulative discounted rewards also called the long-term reward and defined in (25).
\begin{equation*}
\Psi = \max \mathbb {E} \left [\sum _{t = 0}^{T} \gamma ^{t} \psi _{t}(s_{t},a_{t}) \right ], \tag{25}
\end{equation*}
D. A Novel DRL-Based Approach
We begin by providing a concise overview of the foundational principles of DRL, with a particular focus on Q-learning algorithms. This overview lays the groundwork for our proposed methodology, which we introduce subsequently to address the problem P1.
1) Technical Background
DRL is a powerful approach that combines RL with deep learning to tackle problems with high-dimensional raw data inputs [44]. In the training process of DRL, a deep neural network called the Deep Q-Network (DQN) is used to approximate the action-state pair and the Q function
\begin{equation*}
L(\theta _{t}) = \frac{1}{M} \sum _{i = 1}^{M} (y_{T}^{(i)} - Q(s_{t}^{(i)}, a_{t}^{(i)}; \theta _{t}))^{2}. \tag{26}
\end{equation*}
\begin{equation*}
y_{T}^{(i)} = \psi _{t}^{(i)} + \gamma \max _{a_{t+1}} Q_{T}(s_{t+1}^{(i)}, a_{t+1}^{(i)}; \theta _{T_{t}}). \tag{27}
\end{equation*}
\begin{equation*}
\theta = \theta + lr \times \frac{1}{2} \nabla _{\theta} (L(\theta))^{2}. \tag{28}
\end{equation*}
\begin{equation*}
\theta _{T_{t}} = \theta _{t-G}. \tag{29}
\end{equation*}
2) 3DPER - a Novel Double-Dueling DQN With Prioritized Replay Experiences
To solve problem P1 and find optimal matches between different entities, a double-dueling DQN with prioritized replay experiences (3DPER) is proposed.
To manage extensive action space of (23),
\begin{equation*}
Q_{j}(s,a_{j};\theta _{j}) = V_{j}(s) + A_{j}(s,a_{j}), \tag{30}
\end{equation*}
Define
\begin{equation*}
L(\theta _{t}) \!\!=\!\! \frac{1}{M} \sum _{i = 1}^{M} \!\!\left[ (\psi ^{(i)} \!+\! \gamma \!\left(\max _{a\prime} \sum _{j=1}^{|\mathcal {P}|} Q_{T_{j}}(s\prime,a\prime _{j})\right) \!\!-\! Q(s,a))^{2}\right], \tag{31}
\end{equation*}
\begin{align*}
&L(\theta _{t}) = \frac{1}{M} \sum _{i = 1}^{M} \\
&\times\!\left[ (\psi ^{(i)} \!+\! \gamma \!\left(\sum _{j=1}^{|\mathcal {P}|} Q_{T_{j}}(s\prime,\arg \max _{a\prime _{j}} Q_{j}(s\prime,a\prime _{j}))\!\right)\! \!-\! Q(s,a))^{2}\right]. \tag{32}
\end{align*}
The complete approach of 3DPER is described in Algorithm 1.
Algorithm 1: 3DPER Algorithm.
Input: experience buffer
Initialize the experience replay buffer
Get the initial state
for each episode do
Setup vehicular environment;
for
for
Initialize the main network with random weights
Initialize the target network with weights such that
#Epsilon greedy policy:
Choose a random probability
if
else
Randomly select an action
end if
end for
Update exploration probability
\begin{equation*}
\varepsilon = \max \lbrace \varepsilon _{\text{min}}, \varepsilon (1 - \delta \varepsilon) \rbrace
\end{equation*}
Execute action
Store the experience
Sample a mini-batch of
Perform a stochastic gradient descent step on
Every
end for
end for
E. Complexity Analysis
1) Computation Complexity
In this subsection, we analyze the computational complexity of the proposed 3DPER algorithm and compare it with the conventional DQN algorithm.
The dimensionality of the action space is a critical factor in determining the computational complexity of DQN algorithms. In conventional DQN, each entity has
The state
The third term,
2) Communication Complexity
In this subsection, we examine the communication complexity of 3DPER and contrast it with traditional DQN algorithms. Because single-agent DQN algorithms have minimal communication complexity [48] and 3DPER is a DQN-based single-agent algorithm, the communication complexity of 3DPER is also minimal and is determined by the size of the state vector, which is on the order of
Performance Evaluation
A. Experimental Setups
In our experimental setup, we integrated the BEVEC framework into a simulated environment covering an area of approximately
Furthermore, we conducted an in-depth analysis of the 3DPER algorithm's performance across different parameter configurations, examining reward, latency, success rate, and energy consumption. This comprehensive evaluation highlighted the algorithm's advantages and identified optimal parameter settings. To further substantiate our approach, we compared it against other strategies, including:
Random Method: In this approach, actions are selected entirely at random, leading to a purely exploratory methodology that does not take into account past rewards or environment knowledge.
Greedy Method: Building upon our previous work [3], this method models network entities as graph nodes and assigns weights to each vertex based on rewarded utility and feasible paths. A well-known shortest path algorithm is then employed to match different network entities.
DDPG Method: Due to the large action space size of the equivalent MDP for problem P1, the Deep Deterministic Policy Gradient (DDPG) method was used [50]. DDPG uses both an actor and a critic neural network. The actor network learns the policy (the action selection), while the critic network learns the value function to evaluate the chosen actions. As DDPG is capable of handling continuous action spaces only, a rounding method was employed to convert continuous actions into discrete ones to make DDPG applicable.
Table II summarizes detailed information regarding the specific parameter values utilized in our experiments, motivated by [7], [8], [40]. In cases where the specific value of a parameter is not specified, the values are selected uniformly.
B. Results and Analysis
We start by evaluating the 3DPER algorithm's convergence behavior for different learning rates. Fig. 7 shows that different learning rates produce different reward trajectories. For example, with a learning rate of 0.001, the algorithm converges to the maximum reward in about 500 episodes. With a learning rate of 0.0001, the algorithm converges to the maximum reward, but it takes about 2,000 episodes to do so. Notably, with a learning rate of 0.00001, the algorithm fluctuates in the early episodes and eventually gets stuck at a local maximum, which is not the best possible reward.
Because the 0.001 learning rate allows the algorithm to achieve the maximum reward while converging more quickly, we use it for subsequent simulations.
To evaluate how varying numbers of vehicles impact BEVEC performance, we initiate our analysis by comparing reward curves using different algorithms, as illustrated in Fig. 8.
The comparison highlights that our proposed algorithm outperforms other algorithms in terms of average reward due to several key factors. Unlike the greedy method, which focuses only on maximizing immediate reward, our algorithm optimizes for expected long-term rewards. Additionally, unlike the random algorithm, which explores actions without a specific policy, both our algorithm and DDPG leverage learned policies, leading to higher average rewards. Furthermore, unlike DDPG, which approximates action selection via an actor network that may introduce errors, our approach directly explores the real action space, exploiting the most effective actions. It's worth noting that the average reward exhibits an increasing trend as traffic density rises. However, as the vehicular network's resources become saturated, the average reward begins to decline. Fig. 9 assesses the success rate (SR) of various algorithms across different levels of traffic density. In line with our expectations based on the average reward analysis, the 3DPER algorithm consistently outperforms the other algorithms, with an average improvement of 38% compared to the other methods. This observation highlights a crucial distinction: while in some scenarios, the average rewards of different algorithms may be relatively similar, the actual number of messages exchanged and successfully written to the blockchain within an acceptable time frame varies significantly.
To illustrate, consider the performance of the random algorithm, which achieves an SR of less than 30%. This result indicates that a substantial portion of the messages fail to find the opportunity for timely exchange. Meanwhile, by ignoring the random method, the average improvement of 3DPER to the average of DDPG and greedy algorithms is 4.5%, highlighting the significant advantage of 3DPER in terms of successful message exchange.
Fig. 10 presents the average latency across various traffic densities. Our proposed algorithm stands out for its lower average latency, which is on average 18% lower than other algorithms. Although the latency of greedy and DDPG algorithms is similar to our proposed algorithm, 3DPER still achieves a 2.3% average latency reduction compared to them.
This proximity in latency values is because average latency calculations are contingent on successful message deliveries. However, the performance of the random algorithm underscores the significance of prerequisite matching among network entities, which can substantially influence latency. For instance, Fig. 11 portrays the average latency when the maximum tolerable latency for messages is fixed. The average latency remains below the maximum tolerable latency. However, this consideration alone, without factoring in the SR, is insufficient. As we are bound by the constraint of ensuring that the sum of latencies remains below a threshold, the impact of latency must be evaluated in conjunction with SR. Fig. 12 illustrates the SR of various algorithms under varying, yet fixed, maximum tolerable delays. Predictably, the random algorithm records the lowest SR, while the 3DPER algorithm consistently outperforms the greedy and DDPG algorithms in this regard. As the maximum tolerable delay is increased, more time is available for message transmission and processing, which results in improved SR. Fig. 13 examines the average latency under varying exchanged data sizes, where the maximum tolerable latency for each message has been set proportionally. Notably, the 3DPER algorithm exhibits the lowest latency, particularly as the data size increases. This can be attributed to the algorithm's ability to collect more rewards, which in turn allows for a better balance between message size and overall latency, as illustrated in Fig. 8.Fig. 14 illustrates the average energy consumption trends under varying traffic densities. Notably, the random algorithm exhibits suboptimal performance and excessive energy consumption compared to the other algorithms. As traffic density increases, the energy usage in the other algorithms increases marginally, whereas the 3DPER algorithm demonstrates better performance and reduced energy consumption compared to the greedy and DDPG algorithms. On average, 3DPER reduces energy consumption by approximately 65% compared to other algorithms. By excluding the random algorithm due to its poor performance, 3DPER achieves an average energy reduction of 7.5% when compared to the greedy and DDPG algorithms.
Fig. 15 illustrates how the reward varies in response to changes in the message arrival rate for different schemes. Notably, as the rate increases, the rewards gradually decline and eventually reach a saturation point. This decline is primarily attributed to factors such as extended processing times and execution failures resulting from resource limitations.
Fig. 16 presents the results of the 3DPER algorithm's examination of the effect of integrated blockchain on BEVEC performance in terms of average energy consumption and latency across various delegate nodes. The results indicate that the average latency in the block verification process increases marginally as the number of delegate nodes increases. This is because the likelihood of selecting nodes at distant locations from each other increases with the number of delegate nodes, leading to slightly longer average latency. However, note that the number of delegate nodes itself does not have a significant impact on the block verification latency. Instead, the maximum of delays between delegates is the main factor in determining the overall block verification latency. In contrast, energy consumption is directly impacted by the number of delegate nodes. Each delegate node as a block verifier contributes to the total energy usage, resulting in a cumulative impact on the consumed energy. BEVEC performance is affected by the trade-off between increasing the number of delegate nodes and increasing the security of the blockchain. On one hand, this leads to a slight increase in average latency, but on the other hand, it significantly affects energy consumption.
Conclusion and Future Work
In this paper, we proposed a secure framework for in-time delivery of V2X services by combining deep reinforcement learning and permissioned blockchain in vehicular edge computing networks. Our approach features a dual-layer verification process to ensure accurate and secure delivery of vehicular messages. The first layer involves local verification to ensure the accuracy of disseminated messages, while the second layer uses a permissioned blockchain to guarantee the integrity of the messages. We introduced a new system utility that serves as a performance metric to ensure the prompt delivery of services as well as a measure for the selection of verifier nodes in the blockchain consensus mechanism. To optimize this utility function in the dynamic vehicular environment, we formulated the problem of in-time vehicular message delivery as a sequential decision problem and proposed a novel DRL-based algorithm - 3DPER - to solve it. Simulation results show that the proposed algorithm can effectively improve V2X service delivery performance in terms of energy consumption, latency, and success rate. However, further investigation into the limitations and potential enhancements of the proposed BEVEC framework is necessary. For instance, it would be worthwhile to investigate the suitability of BEVEC's architecture for services with strict delivery time requirements due to the dual layer verification. This could prove a limitation of the current architecture. In addition, although the 3DPER algorithm is currently capable of scaling with the number of vehicles, it is still a centralized approach, hence it is necessary to explore decentralized approaches, particularly in scenarios in which such scalability is critical. Our future efforts will focus on developing a decentralized algorithm that maintains security and trust while minimizing latency. Our objective is to expand the framework's utility across a wide range of application scenarios, to enhance its broad applicability.