Journals & Magazines >IEEE Access >Volume: 13

A Virtual Infrastructure Model Based on Data Reuse to Support Intelligent Transportation System Applications

Virtual infrastructure model where autonomous vehicles exchange data through Vehicle-to-Network (V2N) communications, enabling traffic monitoring in both well-equipped ur...

Abstract:

Intelligent Transportation Systems (ITS) have significantly improved transportation quality by using applications capable of monitoring, managing, and improving the trans...Show More

Metadata

Abstract:

Intelligent Transportation Systems (ITS) have significantly improved transportation quality by using applications capable of monitoring, managing, and improving the transportation system. However, the large number of devices required to provide data to ITS applications has become a challenge in recent years, particularly the high installation and maintenance costs made broad deployment impracticable. Despite several advances in smart city research and the internet of things (IoT), research on ITS is still in the early stages. In this sense, to improve data collection and maintenance strategies for ITS systems, this article proposes a virtual infrastructure model based on data reuse, mainly autonomous vehicle (AV) data, to support ITS applications. It presents design choices and challenges for deploying a virtual infrastructure based on Beyond 5G (B5G) communication and data reuse, followed by developing a proof of concept of an AV data acquisition system evaluated through simulation. The results show that the extra data collection module results in a 1.1% increase in total memory usage with direct sensor collection and a 2.6% increase with application performance management (APM) data collection on the reference hardware. This data reuse setup can significantly improve ITS data challenges with minimal impact on current technology stack on the Autonomous vehicles currently in circulation.

Virtual infrastructure model where autonomous vehicles exchange data through Vehicle-to-Network (V2N) communications, enabling traffic monitoring in both well-equipped ur...

Published in: IEEE Access ( Volume: 13)

Page(s): 40607 - 40620

Date of Publication: 03 March 2025

Electronic ISSN: 2169-3536

DOI: 10.1109/ACCESS.2025.3547160

Contents

CCBY - IEEE is not the copyright holder of this material. Please follow the instructions via https://creativecommons.org/licenses/by/4.0/ to obtain full-text articles and stipulations in the API documentation.

SECTION I.

Introduction

Emerging technologies in recent years, such as smart cities and the Internet of Things (IoT), are driving the emergence of Intelligent Transportation Systems (ITS) [8]. ITS is an advanced paradigm capable of solving various problems related to transportation quality, using applications capable of monitoring, managing, and improving the transportation system [9]. Large cities are investing in underlying data infrastructure to support this new paradigm. However, the placement of embedded devices in the infrastructure shows challenges in terms of costs of deployment and maintenance [5], [9].

In this context, ITS applications have used various data sources, such as smart cards, Radio Frequency Identification (RFID) tags, sensors, video cameras, Global Navigation Satellite System (GNSS), social media and high-definition (HD) maps [10]. Furthermore, Artificial Intelligence (AI) algorithms assist in identifying patterns and classifying large data volumes. In this sense, Autonomous Vehicles (AVs) are seen as an attractive technology in which data generated from autonomous navigation systems’ sensors can be reused for the input of sophisticated algorithms that feed ITS applications.

Several advances have been made in smart city research, and the internet of things [5], [9], [10], [13], [14], [16]. However, research on ITS is still in the early stages. For example, existing solutions do not support data reuse in a coordinated and integrated fashion for feeding ITS applications [5], [10]. With the deployment of 5G providing ultra-reliable, high-bandwidth, low-latency connectivity, alternative data sources can help the underlying infrastructure provide data to ITS applications. In this article, the technique of reusing data to support ITS applications is called virtual infrastructure.

The virtual infrastructure is a data-driven framework that creates a digital representation of the transportation environment by aggregating and processing data from autonomous vehicles and other mobile sources, supplementing or replacing traditional fixed roadside infrastructure. This infrastructure exists as a computational model rather than physical installations, leveraging sensor data and vehicle-to-everything (V2X) communications and cloud computing to provide ITS applications with real-time environmental awareness and traffic conditions. Through data reuse from autonomous vehicles’ existing sensor networks, this virtual layer can deliver equivalent or enhanced monitoring capabilities compared to conventional physical infrastructure, while significantly reducing deployment and maintenance costs. In this sense, we begin by presenting the main ITS approaches, followed by the design choices and challenges for a virtual infrastructure are presented. Furthermore, a proof of concept for data acquisition from the AVs is developed, and a discussion of the experiment and its impact on the current AV structure is provided.

This proposal is comprised of three stages: data acquisition from the AVs, data transmission, and providing a virtual framework for ITS applications. A software module is developed to attach to the AV in the first stage. The module is designed to collect relevant sensor data with minimal impact on current controller systems. The second stage is leveraging Vehicle-to-everything (V2X) communication and 5G networks and discussing the data transmission model leveraging these two network approaches. In the final stage, the virtual infrastructure is constructed through a combination of AI algorithms and probabilistic models that process the aggregated vehicle data. These algorithms analyze traffic patterns, create environmental models, and generate a dynamic digital representation of the transportation network. A data Service is made available so that vehicles and ITS applications use the refined data.

This work makes the following contributions. We present a design that transforms autonomous vehicle sensor data into a virtual infrastructure layer for ITS applications, demonstrating through empirical evidence the system’s viability with minimal impact for direct sensor collection. The practical feasibility is validated through a proof-of-concept implementation using industry-standard simulation tools, while systematic evaluation with increasing sensor counts and varying traffic conditions establishes the approach’s scalability. Our solution offers a sustainable path to expanding ITS capabilities without additional monitoring infrastructure deployment. While previous research has typically focused on either cooperative perception or ITS applications in isolation, our approach demonstrates how these domains can be effectively combined to create a low-impact solution for modern transportation systems. This integration enables areas with limited physical infrastructure to benefit from advanced ITS capabilities through data reuse from autonomous vehicles already in circulation.

The remainder of this article is organized as follows: Section II discusses approaches and challenges that need to be addressed for virtual infrastructure solutions to contribute to the operation of ITS applications. In Section III, we explore different design choices for data acquisition and transmission, as well as possible operational methods for virtual ITS services in the cloud. Section IV describes and evaluates two proof-of-concept experiments focusing on data acquisition. Finally, Section V presents concluding remarks and provides insights for future work.

SECTION II.

Related Works

A. Cooperative Perception Approaches

In [27] the authors summarize multi-sensor fusion methods, communication technology, and shared perception strategies. The impact of communication cost and robustness of vehicle positioning errors is analyzed. Authors claim that to obtain autonomy level above 3 it is necessary to leverage advanced sensing technology, edge computing, communication, and other technologies to build a cooperative perception system. Also reinforces the idea that the future is in V2X applications supported by 5G communication technology.

Another recent review took place in [30], the work summarizes the applications of multi-sensor fusion classification strategies in cooperative perception. It proposes a multi-sensor fusion taxonomy for autonomous driving perception and classifies fusion strategies into symmetric and asymmetric fusion with seven subcategories.

The work presented in [28] explores the application of V2X communication to enhance the perception performance of autonomous vehicles through a novel vision Transformer architecture called V2X-ViT. The technology combines lidar sensors data from several vehicles and obtains State of the Art performance results, against similar techniques of intermediate fusion. The article considers both an ideal communication channel and a noisy setting, where pose error and time delay are both considered. Network specific experiments were not developed.

In [29], authors focus on intermediate fusion to account for possible Lossy Communication (LC) channels on Vehicle-to-Vehicle (V2V) communications. The LC proposal accounts for real world scenarios where there are: Doppler shift introduced by fast-moving vehicles, interference generated by other communication networks, and dynamic topology caused by routing failures as well as various weather conditions.

Another work that tries to tackle LC scenarios is presented in [33]. Instead of doing intermediate fusion, the authors create a prediction model to extract multi-scale spatial-temporal features based on V2X communication conditions and capture the most significant information for the prediction of the missing information.

The work in [24] proposes baseline mobility-based generation rules for cooperative perception messages with mechanisms to control the redundancy of the information and reduce channel overhead. The proposed techniques improve perception, reduce channel load, and enhance scalability for all cooperative perception communications.

To address offline datasets for AI model training, [31] creates a dataset specifically for V2V cooperative perception research. Three cooperative perception tasks are introduced with benchmarks for model evaluation. According to the authors, the biggest challenges in cooperative detection include GPS error, asynchronicity, and bandwidth limitation.

In [32] a multimodal transformer model is introduced for cooperative perception. This new model uses lidar and camera fusion from different vehicles to obtain SOTA cooperative perception performance. It integrates point-based and voxel-based features into a single 3D representation.

To address the some inherent challenges in cooperative perception such as the loss of semantic information and perception errors, [34] proposes improving vehicle pose calibration. An object association approach named context-based matching (CBM) is used, which calibrates multi-agents poses to improve shared perception precision. Object association precision is achieved with decimeter-level relative pose calibration accuracy.

Considering the large volume of sensor data, [35] proposes the use of a Spatial and Temporal Clustering (STC) algorithm which not only reduces the communication payload between vehicles by clustering perceived objects across AVs but also optimizes the use of communication resources. The work presents significant enhancement in the efficiency of vehicle-to-vehicle communication networks, increasing information reception by 10% and reducing communication payload by approximately 41% compared to previous ETSI standards.

In the topic of safety [36] proposes a method for expanding the field of view for autonomous vehicles by using edge infrastructure sensors (e.g. RSU systems). the “Infrastructure cooperative autonomous driving” system improved the safety speed by 17% and helped avoid collisions in simulated scenarios.

To further facilitate the development of research in the field, [37] presents a new co simulation tool that allows physical and network simulations (integrates CARLA, OpenCDA, and ns-3). The framework facilitates analysis of vehicular networks under various V2X technologies and enables evaluation of AIML-enabled autonomous driving applications leveraging realistic sensor data.

While the surveyed approaches demonstrate significant advances in cooperative perception, it’s worth noting alternative perspectives in the field. Some researchers argue for decentralized approaches over the predominant centralized architectures, particularly for urban environments with limited infrastructure. Additionally, there are ongoing debates about the trade-offs between data accuracy and system latency, with some studies suggesting that selective data sharing might outperform comprehensive data collection in certain scenarios [14], [45]. These different viewpoints highlight the complexity of designing cooperative perception systems that can adapt to diverse real-world conditions.

B. Its Approaches

Within ITS applications, two big areas of research have come to prominence. First, the current and future travel time prediction studies are helpful for route planning and traffic management. Second, there is a big number of studies on the construction of intelligent infrastructures, including smart cities, roadside devices, and IoT to improve communication performance and security. Recent work has also demonstrated how IoT technology can enable real-time traffic monitoring and global control strategies for managing traffic flow through V2X-supported vehicles [47]. Other notable research is generally orthogonal to these two branches is also included to provide an overview of the field.

For travel forecasting, [9] and [10], evaluate AI algorithms and mathematical progressions to predict travel times. Reference [16] reviews extensive literature on deep learning technologies for travel time prediction. For most of these works, real-time data is crucial for an accurate answer. In line with this, [13] and [14] propose models or approaches for data collection ranging from installing devices in vehicles to using mobile sources for data generation and training of AI algorithms.

An extensive review composed of 586 articles is conducted in [38]. The article discusses big data algorithms and their applications in ITS, where the increasing amount of data requires advanced, data-driven approaches. The application of big data algorithms in ITS includes areas such as traffic flow prediction, travel time and route planning, and improving vehicle and road safety. Authors identify gaps in the field and show that the most commonly used algorithms for ITS are based on deep learning methods.

In contrast to the model described above, the second research branch proposes architectures for developing an intelligent infrastructure. The research’s primary focus is installing embedded devices on the roads. In [17], a cheap embedded device to popularize ITS applications is proposed. Other works, such as [11], [14], [19], and [20], propose using data from existing sources, such as the use of IoT systems, images from security cameras, and others.

To further explore what can be achieved by a combination of modern technologies to improve ITS, [39] investigates the applications of AI, edge computing, and caching technologies to improve the efficiency of ITS systems. RSUs are used to offload the computing closer to the network edge. The AI decision-making process provides a more efficient resource allocation and optimizes the network routing plan. With the inclusion of edge offloading and caching, latency and communication time are drastically reduced.

In [40] authors experiment with a non-orthogonal Multiple Access network to meet current industry and standards and government regulation. Authors model the end-to-end latency under the designed architecture and analyze the reliability concerning the interruption probability of the network. Statistical analysis is used to model different latency components including transmission, processing, and propagation latency. Results show that its possible, with a combination of wifi and 5g, to keep atency under 5ms.

A survey on the latest advancements in Blockchain for IoV is done in [41]. Authors evaluate the integration of these technologies to address ITS communication challenges. The potential benefits of this combination are enhanced data security, increased reliability, and system transparency. While recognizing the promise of the convergence of blockchain and IoV in enriching the intelligence and security of transportation systems, it also highlights the existence of several practical adoption challenges that need to be overcome. These include issues related to scalability, energy consumption, privacy concerns, and lack of regulation.

In [42] authors define digital twin in transportation and its possible applications. Similarities and differences between traffic simulation and digital twin are discussed. The article explores modeling vehicle driving behavior and environment simulation with the Digital twin. A three layers architecture is proposed, including data access layer, calculation and simulation layer, management and application layer.

Looking at the new 5G communication standard, [43] discusses the role it has in enhancing smart city functions, with a particular focus on ITS. It explains how 5G technologies are critical to realizing the concept of smart cities by providing high-speed connectivity, low latency, and the ability to handle a large number of connections. Particularly for ITS, the article discusses how 5G will enhance the interconnectivity of different transportation modes, improve traffic management, and increase overall transportation efficiency.

To further improve AV and ITS capabilities, [44] explores the use of High Definition maps. A review of the state of the art of HD maps uses in the various functions of autonomous driving systems is presented. Authors discuss the AV components relying on Map Data, like SLAM, scene understanding, motion prediction and planning modules and how HD maps can greatly improve AV capabilities. Furthermore, HD maps and GNSS data can be fused, offering robust and precise localization services to AVs.

Considering the above articles, an important highlight is that at this moment there is no research on building virtual infrastructures that feed ITS and traffic management systems with AV data in real-time.

By proposing the development of virtual infrastructure, this work goes beyond and enables the integration of diverse environments, such as smart cities, smart villages, rural environments, and IoT, to provide more accurate and reliable data to ITS systems and improve road traffic. Additionally, virtual infrastructures are expected to leverage the popularization of AVs in cities where an intelligent infrastructure is not yet a reality.

SECTION III.

Application Scenarios and Problem Statement

Figure 1 illustrates the potential of the virtual infrastructure in two distinct urban environments. On the left, a well-equipped intersection near a commercial center features existing physical infrastructure enhanced by vehicles actively sharing sensor data through V2N (Vehicle-to-Network) communication, creating an enriched information network that combines traditional monitoring with vehicular data. On the right, a residential area demonstrates how even with limited traffic monitoring equipment, autonomous vehicles can maintain effective communication and environmental awareness through the same protocols. This implementation of virtual infrastructure enables consistent traffic monitoring and safety systems across diverse urban environments by supplementing existing infrastructure where present and providing coverage where it is sparse.

FIGURE 1.

Demonstration of virtual infrastructure through V2N communication in distinct urban environments. Left panel shows V2N interactions in a busy commercial area, where physical traffic monitoring solutions are already deployed, while right panel illustrates the same capability in a residential setting, highlighting how autonomous vehicles can maintain effective traffic monitoring without extensive physical infrastructure.

Show All

The Virtual infrastructure must be able to reproduce the existing infrastructure based on the available information, the historical context, and predictions enabled through multiple analytical approaches. These predictions leverage advanced mathematical and statistical models for traffic flow analysis, AI-powered algorithms for pattern recognition and classification of large data volumes, deep learning techniques for travel time prediction, and robust detection algorithms to handle noisy data.

Recent comprehensive surveys have highlighted how deep learning and advanced AI techniques are revolutionizing intelligent transportation systems [46], making this an opportune time to explore virtual infrastructure approaches that can leverage these capabilities.

The reuse of data already produced by the AVs is essential for the advancement of ITS applications, particularly on roads with low traffic monitoring resources, such as cross-border roads, smart villages, and rural environments. Using AVs as data sources coupled with these predictive models can help integrate environments of varying infrastructure levels and boost ITS development by providing real-time environmental awareness, traffic conditions monitoring, and future trend forecasting. The combination of AV sensor data (including cameras, LIDAR, GNSS, and speedometers) with sophisticated processing algorithms enables the virtual infrastructure to deliver equivalent or enhanced monitoring capabilities compared to conventional physical infrastructure, while significantly reducing deployment and maintenance costs. The challenges of deploying the virtual infrastructure are presented in Subsection III-A. Subsection III-B provides a concrete use case and a broader problem statement. In Subsection IV-A, a discussion of the main approaches to data acquisition and reuse is given.

A. Challenges

Several challenges must be tackled to successfully develop a virtual infrastructure: handling large amounts of data, reliability, optimization, scalability, and interoperability are a few of these challenges. Concerning data volume, it is expected that the large number of sensors installed on roads and at the AVs generate a large amount of data. Hence, there is a need to deal with such data volumes, including classification, processing, and analysis. There is also the challenge of fusing different data types to serve diverse applications.

There is also the need for a virtual infrastructure to be aware of the surrounding context to support adaptations to dynamic situations for the AVs. This ranges from weather changes to unexpected movements of pedestrians, cyclists, and other vehicles. Context data may be collected from various sensors, in any of the specific devices, and fed to appropriate devices like RSUs and Edge Computing devices in the infrastructure. A key challenge is reliability, primarily when critical services are handled. Data and computational redundancy is a standard solution. For instance, data replication is required to meet reliability requirements, which increases networking, computational and energy costs. Where and how data replication should happen and at what rate should be a design consideration for a virtual infrastructure.

A fundamental challenge in the applicability scenarios is optimizing the solution with respect to several optimization variables, such as energy consumption, networking costs, and traffic efficiency. Scalability and data interoperability are fundamental challenges to the success of a virtual infrastructure. For example, raw data needs to be pre-processed before reuse, but there are different types of predictive models and there are few or no standards for which specific features should be present in the input data. While increased data is necessary to train better models, this availability comes with challenges in data engineering. In addition, there are also issues related to the availability of network resources. A broadcast storm [23], for example, can overload a local network and make assets unavailable. Research in this area needs to optimize storage and transfer strategies. This will be discussed in future work.

B. Problem Statement

A common challenge in urban traffic management occurs when autonomous vehicles need to navigate through areas with varying levels of infrastructure monitoring capability. Consider an AV taxi that regularly travels between a busy commercial district (equipped with comprehensive traffic monitoring systems) and residential neighborhoods where monitoring infrastructure is minimal. In the commercial district, the AV can rely on data from multiple sources - traffic cameras, road sensors, and other AVs. However, when entering residential areas, it loses access to most of these infrastructure data, potentially compromising its ability to make optimal routing decisions or respond to dynamic traffic conditions. This disparity in infrastructure coverage creates an inefficient “two-tier” system that impacts both the performance of individual AVs and the overall traffic flow optimization.

The broader challenge that this research addresses is the significant cost and impracticality of deploying comprehensive physical infrastructure for intelligent transportation systems across all environment types. Traditional approaches require the installation and maintenance of numerous embedded devices, sensors, and cameras throughout the road network - a solution that becomes economically unfeasible when scaled to include residential areas, rural roads, and smart villages. The proposed virtual infrastructure model aims to bridge this gap by leveraging data that is already being generated by autonomous vehicles’ existing sensor networks. By reusing AV sensor data (cameras, LIDAR, GNSS), the system can create a dynamic digital representation of traffic conditions without requiring additional physical infrastructure deployment.

This optimization problem balances several critical factors: minimizing the computational impact on AVs (showing only 1.1% increase in memory usage with direct sensor collection), ensuring reliable data transmission through Beyond 5G communication (with latency requirements of 100ms for V2X Message transfer), and maintaining data quality sufficient for ITS applications. The solution must be scalable across different environment types while providing consistent service quality comparable to areas with full physical infrastructure deployment.

SECTION IV.

Proposed Solution

A suitable approach may reuse the data of devices and existing infrastructure to implement a virtual infrastructure. Each data source can be seen as an agent contributing to building the virtual infrastructure. The conceptual architecture may be defined based on a set of design choices. For instance, in a more straightforward approach, the data is captured, processed locally, and sent to devices that matter. In a more general and less facile approach, data is captured and sent to the cloud, where sophisticated and robust algorithms are used to process it. Specifically, a more realistic approach, the second design alternative, proves to be more interesting.

In this way, the proposed solution uses a data acquisition module locally and supports V2X communication for sharing the data with the cloud, where the data is classified, processed, and analyzed. Figure 2 presents the proposed architecture for the virtual infrastructure. More generally, the proposed architecture is divided into two blocks. The first block comprises data acquisition from AVs and data transmission. The second block is the construction of the virtual infrastructure for ITS and cooperative perception systems.

FIGURE 2.

Proposed virtual infrastructure connecting autonomous vehicles with ITS systems. The diagram illustrates how vehicle sensor data flows through edge processing layers before reaching cloud-based data services that support traffic management, safety systems, and route planning applications. Two different scenarios are displayed on the left. One in the physical vehicle, at the top, and the simulated vehicle at the bottom. Both share data with the virtual infrastructure through their “data collection/management” modules.

Show All

Sensor data is fed to control modules as usual. The newly added data acquisition module can access shared data through V2X communication modules. ITS, in turn, will receive, process, and store the data, making it available to third-party services within the ITS network. A Simulated environment for the experiments is demonstrated in parallel, where yellow boxes represent the 2 proposed methods of data acquisition.

For better understanding, we offer a sequence diagram to show the two available data flows that make the infrastructure work. Figure 3 displays 2 data loops, one where the vehicle sends its sensor data to the ITS infrastructure and the other where the vehicle consume data services from it. A separate process will apply post processing and classify received data in timed batch windows.

FIGURE 3.

Sequence diagram illustrating the bidirectional data flow between vehicles and the ITS system through a data collection module. The upper loop shows vehicles publishing sensor data that gets preprocessed and sent to ITS services, while the lower loop demonstrates ITS publishing processed environmental data back to vehicles for local fusion and control purposes.

Show All

The subsections below present an example of conceptual modeling of a virtual infrastructure. First, the data acquisition and transmission model is presented in Subsection IV-A, and in Subsection IV-B, the design option for constructing the virtual infrastructure is presented.

A. Data Acquisition and Transmission Model

In recent years, conventional vehicles are usually designed with monitoring, sensing, and communication devices. With increasingly full AVs, the trend is for these devices to increase dramatically. In general, AVs use a set of data generated from sensing devices and received from the infrastructure to make decisions and move through the environment autonomously. The raw data does not carry much information about the traffic or the general environment, but it brings a current view of a part of it when processed. However, the AVs discard the processed data after this process since decisions have already been made and new data needs to be processed.

A data acquisition module can reuse and provide this data to the virtual infrastructure to improve traffic and safety. It is assumed that the more the data received by the AV conforms to the current environment, the more accurate the decisions made by the AVs will be. Identifying the best way for AVs to acquire data without compromising their processing power and security is still an open question in the literature. As discussed in Section II, some directions are pointed out, but how they impact the AV operation still needs to be clarified, as well as how this data can be reused efficiently.

In this scenario, a set of design implementations can be considered as possible solutions. In particular, a software module deployed inside the embedded computer could capture and make the data available to interested parties. Using a software-based solution within the embedded computer eliminates the need for dedicated hardware and of the standard’s limitations imposed by different manufacturers. One possible solution starts from the backbone’s choice, where one can intercept all the data passing through a given channel, like the CAN (Control Area Network) BUS, or force the publication of the selected data on different channels, which will later be used for information sharing. Notably, using the backbone to listen to one channel is the most appropriate solution since, this way, the sensing devices will share the information only once.

However, in the design presented above, data collection module must correctly identify what is the desired data in the backbone and separate that from its undesired counterpart, which generates a sizeable logistical overhead. In this case, the most appropriate approach would be to use the backbone to listen to specific channels, which would decrease the amount of data. However, it is still necessary that those interested in the data have knowledge of the standards of the various manufacturers, information from the data producer, and in many cases, the individual characteristics of each device used for sensing. Another design option that proves to be more attractive is to use the data already processed by the AV. This last design option can eliminate most of the limitations presented by the previous solutions. However, this solution can be more computationally demanding, particularly impacting the AV’s embedded computer processing and storage capacity.

To answer some of the gaps mentioned in this article, a proof-of-concept of the data acquisition method is implemented, and its impacts are analyzed and discussed in Section V. The first proof-of-concept implements a solution based on the second design option, where the sensors also send the signals to a secondary channel. A second proof-of-concept is then developed, where the data is acquired by using the data already processed by the AVs, at the data fusion module.

The data is stored in a cache structure expected to be transmitted to the interested party quickly to avoid overloading the AV local storage system. As a significant amount of data is expected, the proposed design considers V2X communication technology. In this proposed solution, Beyond 5G (B5G) is considered, which provides a communication rate that can reach 20 Gbps with a range of 1.6 to 5 kilometers. Thus, the local storage system and the AV-embedded computer are expected to have minimal impact.

B. Cooperative Perception

The virtual ITS infrastructure system operates through a data processing framework that considers multiple factors to ensure reliability and accuracy. The system employs a three-tier architecture: edge processing, cloud computing, and application services.

Before reaching the cloud, data undergoes edge preprocessing to optimize network resource utilization and reduce latency. This edge layer implements two key components: an edge cache for rapid data access and retrieval, and an edge preprocessor for initial data refinement and filtering. Upon arrival at the cloud service, the data follows a multi-stage processing pipeline. Initially, raw sensor data and historical information are stored in a distributed database system.

At the cloud level, the data processing pipeline implements robust detection algorithms to handle various data quality challenges, including sensor noise, communication interference, and potential security threats. As illustrated in Figure 2, this processing chain incorporates multiple complementary approaches: Data validation to ensure data integrity and consistency; Pattern recognition to identify and classify traffic behaviors and environmental conditions; Prediction models to forecast traffic patterns and system states.

The processed data feeds into a virtual infrastructure data service that supports various ITS applications, including traffic management, safety systems, and route planning. This service layer acts as an abstraction between the raw data processing and the application-specific requirements, enabling flexible and scalable implementation of new ITS services while maintaining consistent data quality and accessibility.

The system leverages V2X communication to maintain bidirectional data flow between autonomous vehicles and the infrastructure, ensuring that vehicles both contribute to and benefit from the collective perception of the environment. This cooperative approach enables a more comprehensive and accurate representation of the transportation network than would be possible with traditional fixed infrastructure alone.

Additionally, the virtual infrastructure can use its data set to perform a simulation process that, together with context data and regression and AI algorithms, can predict future traffic trends on the roads. So that administrators and public officials can use this data to improve the environment and perform maintenance operations in a way that does not impact the traffic, among other applications.

C. Data Sharing Optimization Framework

The data sharing between AVs and the cloud can be optimized through a multi-level strategy, building upon the edge computing approach discussed in [39]. The proposed framework consists of three main optimization layers: local data prioritization, edge resource management, and network optimization.

At the vehicle level, the local data prioritization layer operates with various data management strategies to efficiently utilize network resources. Time-sensitive data, such as safety-critical information, can be assigned immediate transmission priority through the V2X communication module. Meanwhile, non-critical data can be batched and compressed before transmission to reduce network load. To assist in downstream processing decisions, data packets can be tagged with quality metrics and confidence levels. Additionally, local preprocessing can filter redundant or low-quality sensor readings before transmission.

The edge resource management layer implements several optimization strategies to enhance data processing efficiency. Edge nodes can implement dynamic load balancing to distribute processing tasks across available resources. Through a locality-aware cache replacement policy, relevant data can be maintained closer to where it is most frequently accessed. Processing tasks can be distributed based on both computational capacity and current network conditions. Furthermore, edge servers can perform initial data fusion to reduce the volume of data transmitted to the cloud infrastructure.

At the network layer, various strategies can be employed to ensure efficient data transmission. B5G network slicing capabilities can be utilized to guarantee Quality of Service (QoS) for different data types and priorities. Transmission rates can be dynamically adjusted based on network congestion and application requirements. Redundancy in the data stream can be eliminated through local data fusion at edge nodes, while network paths can be optimized based on latency requirements and current network conditions.

This multi-layered optimization approach aims to balance the trade-offs between data freshness, processing efficiency, and network resource utilization. The framework is designed to be adaptable to varying traffic conditions and computational resources, allowing for efficient scaling of the virtual infrastructure as the number of connected vehicles increases.

We propose investigating several key performance indicators that could validate the effectiveness of the approach. These include analyzing system latency characteristics, particularly for safety-critical data transmission, examining resource utilization patterns across edge nodes, and measuring network efficiency through bandwidth consumption metrics. Additionally, future work should explore how the system maintains performance as the number of connected vehicles scales up.

These investigations will be crucial for understanding the practical limitations and optimization opportunities of the proposed virtual infrastructure framework. The development of a comprehensive evaluation methodology, including specific benchmarks and testing scenarios, remains an open challenge that we intend to address in subsequent research.

SECTION V.

Experiment and Discussions

For the following experiments, a proof of concept was implemented for the local data prioritization layer, mainly containing the batching feature, where independent sensor data are batched in a time window before compression and publication.

The main goal of this first exploratory work is to evaluate the impacts of a data acquisition model on AV concerning local processing and storage across different urban environments and weather conditions. Additionally, data collection is evaluated in terms of the extra processing cycles and energy expenditure.

Experiment was built on top of the CARLA simulator. CARLA contains a client-server architecture where the server runs the simulation computation and rendering. The client is responsible for information collection and interaction in the simulation world through “Actors”. In this setup, two computers were used, one for the simulation server and one for the client.

To establish mechanisms for persistent and pervasive data collection for the presented environment, two scenarios were considered: (i) collect data from sensor actors, which are managed by the simulation server, and offer callback utility functions directly to the CARLA client, and (ii) collect data directly from CARLA’s underlying data layer, using application performance management (APM) tools.

Both computers shared the same local network. Communication between client and server was established through CARLA’s Client library, with network simulation handled by OMNET++ and Simu5G to model 5G communication characteristics. The setup leverages TCP socket connections across default ports: 2000, 2001, and 2002, enabling asynchronous encoding of messages using Protocol Buffers (Protobuf) to enhance efficiency in data transmission and networking operations.

The source code is available on GitHub7. The Carla Client operates on a personal laptop with the following specifications: Ubuntu 22.04 Operational System, 6.5.0-26-generic kernel, Intel i7-13650HX with 14 cores and 2.6 GHz base frequency, 32Gb RAM DDR4, 1 TB M2 SSD storage. The vehicle agents used in the simulation are available at the CARLA challenge8 and also possess github repositories.

In [25], 3GPP published a standard for a cooperative perception Service (CPS) and cooperative perception Message (CPM), the CPM describes all kinds of sensor data and confidence levels to be shared between vehicles. Message size will scale up with the amount of sensors data attached and fusion or processing level of the present data.

To simplify the collection process and measurements in the present experiments, saved data only includes sensor output. CPM standard data format is not used as added metadata would not present a big influence in data volume. CPM will be considered for future work exploring communication experiments.

The document also cites a Frequency and Content Management entity, which determines the optimal CPM generation period and number of perceived objects and regions to be included in the CPM. That has an important impact in respect to the channel usage and ranges from 100 to 1000ms.

For GPU compute limitations in the server machine, as well as the fact that in this stage of experimentation only raw sensor data is being used, we set the period of 3 seconds of frequency for sensor data collection, as well as for the time window used for sending messages to the cloud data service.

To better clarify the scope of the present experiment, consider the simulation sequence diagram in Figure 4. The data exchanged follows the behavior detailed in Figure 3. Replacing the hypothesised ITS service, sits a network simulator that simulates the physics of the communication and routes the messages to a data service. This allows the vehicles to share messages between a central service as well as between each other.

FIGURE 4.

Sequence diagram showing the initialization and data flow between three components: CARLA simulator, Data Collection Module, and OMNeT++. After the initial setup and simulation readiness checks, the system enters a continuous loop of data exchange between all three components.

Show All

The full simulation makes use of the OMNET++ simulator with Simu5G for networking experiments, but for the present measurements, only the client machine, where the data collection module agent works, was monitored.

In order to precisely monitor Memory and CPU consumption, the python client process was monitored by its current PID. At the tests done with the APM server software, its process was also monitored and memory and CPU usage were summed. The Ubuntu System Monitor and a combination of the following shell commands were used: top -p PID, cat /proc/PID/statm, ps faux | grep PID, pmap -x PID and docker stats. CPU usage will be expressed in percentage relative to the system CPU.

Table 1 presents memory and CPU consumption data for a single vehicle with 3 sensors that was created and integrated into the existing traffic manager of the simulation environment. In this simulated environment, the Elastic APM python agent creates several objects for monitoring and introspection, the memory spent is around double the memory collected directly from the sensors.

TABLE 1 Total Memory and CPU Comparison Between Collection Methods. The Simulation Server is Not Accounted for

Increase in CPU usage was smaller than anticipated, our main hypothesis for this is that we are only monitoring a small number of objects (3 sensors) and the load is more memory and I/O based (for logging reasons). Total storage used after 1 hour of navigation was 441 MB, accounting for the images of the 2 cameras and 1 lidar sensor.

New electric vehicles come equipped with an increasing number of cameras, the BYD Atto 3 has 5 cameras and the Tesla Model 3 currently has 9. To further evaluate the system load in respect to these kind of sensors only, we evaluated CPU and RAM usage against an increasing number of cameras. The cameras used a default setup of $960\times 480$ resolution and 120 degrees field of view. Results are presented in Table 2.

TABLE 2 Total Memory and CPU Usage for an Increasing Number of Cameras, Both Increase Linearly With the Number of Cameras

To experiment with SOTA self driving models, Table 3 presents impact on memory and CPU from collecting data for a Transfuser [26] agent, the winner of one of the CARLA Challenges event. Sensor setup was reproduced from the authors github repository9 containing 3 cameras (with higher resolution than previous baseline experiment), one lidar sensor, an IMU, a GNSS and a speedometer. A total of 7 sensors. As reference, for the Transfuser model training, each camera has, in addition to RGB, depth maps and semantic segmentation data.

TABLE 3 Total Memory and CPU Comparison Between Data Collection Methods for the Transfuser Agent. Total Sensor Count in the CARLA Simulator: 7. Simulation Server is Not Accounted for in Used Memory

The same baseline model was used for navigation (not the Transfuser model), so the usage without the data collection in practice is the same. We can observe that the memory spent in the sensor data collection method increased by around 20%, this can be attributed to the growth of sensors on the vehicle. On the other hand, the memory from the APM data collection tools kept being much more significant, even though it didn’t scale with the number of sensors like it did in the first experiment. For this experiment, the total storage used after 1 hour of data collection was 3.1 GB, This big increase in used memory is attributed to the increase in amount of sensors, number of cameras and camera resolution. The Lidar setup didn’t change between experiments.

To better understand the impact this system would have on a vehicle in the real world, we considered the Delphi Audi zFAS, which is a central driver assistance controller released in 2018, for Audi’s A8 model. The controller owns a 4 GB SDRAM, DDR3L-1866, so with the transfuser setup (7 sensors) that translates to 1.1% of total memory usage with the direct sensor collection, and 2.6% with the APM data collection, a neglectable impact on a 6 year old controller. Even the camera test still shows a feasible expenditure with 9 cameras.

Autonomous vehicles have security and decision modules that need to access data with minimum latency. For this reason, CPU and memory assets must be rationalized as much as possible, the presented sensor data collection mechanism is able to meet that necessity.

In a nutshell, the costlier performance of the APM solution was expected due to the software evaluation infrastructure built around it. In physical vehicles, on the other hand, the extra load will be less than that presented in this work, considering that instead of a python profiler, a profiler is used at the vehicle operational system level.

The increase in amount of storage needed for the extra module is relevant as we show that for the transfuser model, this would be around 3Gb for 1 hour. Camera and lidar data are raw and represent the biggest part of this volume, with 2.8GB dedicated to the $960\times 480$ resolution images from the 3 cameras.

At the 3GPP release 17 (LTE; Service requirements for V2X services(3GPP TS 22.185 version 17.0.0 Release 17)), we have that the maximum latency for V2X Message transfer on 5G is 100 ms with 20 ms maximum allowed latency in some specific use cases.

In order to evaluate a simple traffic monitoring application based on the existing code, three “monitor vehicles” were placed in the simulation, for each vehicle, camera, GNSS and speedometer sensor data was collected in the same way it was done in the previous experiment. Data was then sent to antennas connected edge servers situated at the border of the maps, around 300m distance from the roads, simulated by Simu5G.

At the CARLA simulation, two urban scenarios were considered: a dense urban environment (Town10HD_Opt1) and a suburban area (Town012). Town 10’s road network consists of a grid layout, including numerous different junctions and traffic lights, while the road network of Town 01 has several bridges crossing water and numerous simple T junctions.

Each environment was tested under two weather conditions: sunny weather and heavy rainfall, to evaluate the system’s performance under different environmental challenges. This offered no results since both Simu5G and the CARLA physics simulator don’t offer rain physics simulation features like skid, aquaplaning or rain attenuation.

A light and a heavy traffic scenario were considered, one with only the three vehicles in the simulation, allowing for top speed in the selected map, and one with the monitor vehicles and another 100 cars, to examine the heavy traffic scenario.

Each vehicle built a log of the collected data, the log was monitored in real time and data was collected for 20 minutes. At the end of the simulation, map data was fused with the GNSS and speed data from the vehicles to generate the proposed traffic monitor.

Figure 5 depicts the experiment end result on Town 10, the denser of the two. Its possible to observe that even in the light traffic scenarios, some points of the road have a lower mean speed. This points are present in junctions or traffic lights.

FIGURE 5.

Traffic comparison, on the left the light traffic simulation, and on the right the heavy one. Points represent the vehicle location and the speed during collection step. Point color represents vehicle speed in meters per second. This simulation use case uses the right-hand traffic system.

Show All

In Town 10, the average speed was low even with light traffic scenario, with an average speed of 8.5 km/h in this scenario. This was expected as Town 10 emulates a “downtown” location, consisting of several junctions and traffic lights, that would naturally lower the speed of passing vehicles. Town 01, on the other hand had an average speed of 6.8 km/h with heavy traffic and 9.9 km/h with light traffic, due to longer road segments and fewer intersections, showing a clear the difference between the average speed in the two urban scenarios.

As expected, in the heavy scenario situation, traffic light and junctions are the places with the biggest impact, its possible to observe a large traffic jam on the lower left corner of the map. Considering the simplicity of the generated code, it can be reused to evaluate light and heavy in all other CARLA maps.

Average latency for communications with the antennas was around 50 milliseconds, peaking to 170 milliseconds when cars were more than 500 meters away from the antennas. There was no difference in loss of signal in the downtown region, as the Simu5G simulator does not take into consideration attenuation caused by the buildings.

Analysis of data throughput was conducted for each vehicle throughout the traffic monitoring experiments. Figure 6 shows the throughput pattern for a single vehicle over time, measured in KB/s. We observe a periodic oscillation pattern between approximately 100 KB/s and 600 KB/s. The oscillation pattern remains consistent throughout the experiment, indicating the regular sensor data collection cycles. The data package has approximately 1.5 MB, contains camera, GNSS and speedometer sensor data, and is sent every 3 seconds.

FIGURE 6.

Data throughput pattern for a single autonomous vehicle over time during the experiment. X-axis represents timesteps in seconds and the y-axis shows throughput in KB/s.

Show All

With a minimal number of vehicles sharing this kind of data with a single, centralized server, its possible to have real time speed data with a reasonable error. The error rate increases with the decreasing of number of vehicles, as data gets “old” until another vehicle passes by the location to metrify speed again.

SECTION VI.

Conclusion and Future Works

This work has presented a virtual infrastructure model that enables data reuse from autonomous vehicles to support ITS applications. Through proof of concept experiments, we demonstrated that our proposed architecture is viable for real-world autonomous vehicles, with minimal computational overhead - showing only 1.1% memory usage increase for direct sensor collection and 2.6% for APM data collection on reference hardware.

The key contributions of this work include: a minimalistic design for collecting and reusing AV sensor data through a data service, an initial data acquisition module with minimal impact on vehicle resources and a demonstration of practical traffic monitoring capabilities through simulation, demonstrating the system’s feasibility for the real world.

Moreover, the reported experience with the proof of concept allowed the identification of other promising research directions for future work. One of them is in the direction of a local pre-processing of the data in the vehicles and analyzing the impacts in relation to computing. Another involves proposing to improve the reliability of B5G in the presence of intermittent connectivity that can degrade data accuracy. One possible solution may involve developing a MIMO array of antennas and using storage at the edge. By interacting with the community, the intention is to discuss directions to provide insights into a virtual infrastructure model.

Our findings suggest that this approach could substantially reduce the cost and complexity of implementing intelligent transportation systems, particularly in areas with limited physical infrastructure. We also believe that the data presented by the virtual infrastructure can significantly improve the accuracy of ITS applications.

References is not available for this document.

MIT Libraries

MIT Libraries

A Virtual Infrastructure Model Based on Data Reuse to Support Intelligent Transportation System Applications

Abstract:

Metadata

Abstract:

Introduction