Loading web-font TeX/Caligraphic/Regular
Freeway Traffic Modeling by Physics-Regularized Gaussian Processes | IEEE Journals & Magazine | IEEE Xplore

Freeway Traffic Modeling by Physics-Regularized Gaussian Processes


Abstract:

Effective traffic management and control are essential for mitigating congestion and minimizing environmental impacts on road transportation systems. In this paper, we pr...Show More

Abstract:

Effective traffic management and control are essential for mitigating congestion and minimizing environmental impacts on road transportation systems. In this paper, we propose a novel approach for traffic modeling that integrates physics-based dynamics with machine learning techniques. Our method leverages Gaussian Processes (GPs) and a multi-class second-order discrete traffic model known as METANET to develop a Physics-Regularized Machine Learning framework. Furthermore, the proposed approach includes for the first time multi-class on/off ramps within the modeling framework, enhancing the realism of the predictive model. We systematically evaluate the performance of the hybrid model across varying dataset sizes to determine optimal data requirements for accurate traffic predictions. Experimental results indicate the improved predictive performance of the proposed approach compared to traditional machine learning and physics-based models. Our findings underscore the potential of Physics-Regularized Machine Learning for enhancing traffic management and control strategies in real-world scenarios.
Page(s): 116 - 130
Date of Publication: 31 January 2025
Electronic ISSN: 2687-7813

CCBY - IEEE is not the copyright holder of this material. Please follow the instructions via https://creativecommons.org/licenses/by/4.0/ to obtain full-text articles and stipulations in the API documentation.
SECTION I.

Introduction

Efficient traffic management and control both in interurban roads and in urban networks keep being crucial matters in our society. The massive use of road transportation induces recurrent and non-recurrent congestion phenomena, the annual costs of which are estimated, in Europe alone, at around EUR 100 billion in 2020 (approximately 1% of the gross domestic product of the EU) [1]. The consequences of traffic congestion not only impact the citizens’ mobility needs but they also yield environmental externalities. The ongoing climate change has hugely grown in recent years, as well as the awareness that road transport represents a major source of environmental pollution [2]. To effectively address these issues, monitoring the traffic system to evaluate traffic conditions constitutes the basis for developing suitable and adaptive regulation actions.

Accurate traffic state estimation, which involves monitoring and predicting the state of traffic flows, is crucial for making informed decisions and implementing effective control strategies. The objective of this paper is the design of a modelling framework for freeway traffic mainly oriented at the estimation of the main quantities characterizing the traffic behaviour. Such a framework is firstly aimed at monitoring the freeway network state, also in road portions not equipped with sensors. Actually, a full and affordable knowledge of the state of traffic in the freeway network under concern, constitutes the basis to implement feedback control schemes.

As described in [3], most of the approaches for estimating the traffic state on freeways can be classified into two main categories: model-driven approaches, and data-driven methods (a third class of the so-called streaming-data-driven techniques may be also considered, but it is less widespread).

Model-driven approaches are based on traditional modelling approaches which often rely on complex mathematical relations, which may struggle to capture the intricate dynamic behavior of real-world vehicular traffic systems. Furthermore, traditional approaches may also be limited by data scarcity and the need for extensive modeling and calibration.

Data-driven estimation methods retrieve information by suitables datasets with statistical techniques and Machine Learning (ML) approaches. ML has emerged as a powerful tool for traffic state estimation and prediction due to its ability to capture complex relationships and patterns from data ([4], [5]). ML algorithms can adapt to changing traffic conditions, make real-time predictions, and provide insights into traffic behavior. In particular, in the ML context, the problems of nowcasting and forecasting unknown quantities related to a specific process are mapped into the so-called now-classical ML problems [6]. The term nowcasting is used to identify estimates realized in the same time interval of the available input data or short-term predictions whereas forecasting typically refers to longer term predictions. The approach proposed in this work solves a nowcasting problem.

However, conventional ML approaches can be limited when applied stand-alone, as they may lack physical interpretability, leading to challenges in understanding the underlying relations of traffic dynamics.

To address these challenges and harness the advantages of both physics-based modeling and machine learning in the framework of traffic state nowcasting problems, we propose a Physics-Regularized Machine Learning approach. Specifically, in the proposed approach, we combine the flexibility and predictive power of ML, represented by Gaussian processes [7], with the physical realism offered by a multi-class second-order discrete traffic model known as METANET [8]. The multi-class METANET model, adopted here, captures the interactions of the different vehicle classes and the impact of on/off ramps, providing a more comprehensive representation of real-world traffic behavior (for further details please refer to [9] and [10]).

The idea of regularizing the estimation produced by a Gaussian process with a physics-informed component is not new and has been addressed in numerous fields of application (see, e.g., the review on physics-informed machine learning presented in [11]). In the field of traffic state estimation, approaches based on physics-informed Gaussian processes have been proposed, for example, in [12], [13]. The approach presented in this paper differs from existing ones in that the Gaussian process used for estimation is regularized by a second-order discrete-time multi-class traffic model.

The significance of this paper lies in the following contributions:

  • First, the proposed approach explicitly considers the presence of different vehicular classes and the inclusion of on-ramp traffic (which constitutes the basis for the design of ramp-metering controllers) in a hybrid modelling framework integrating ML and traditional models.

  • Second, specific attention is posed on a correct use of the macroscopic model used in the regularization of the learning approach, by effectively calibrating the model parameters on the basis of real data, before the integration of the two modelling levels.

  • Third, we assessed the performance of the multi-class hybrid traffic flow model with a range of dataset sizes, spanning from one day to 14 days. This evaluation seeks to gauge the accuracy and effectiveness of the model, particularly when operating with limited datasets, as our intention is to apply this model also in scenarios possibly characterized by data scarcity.

The paper is structured as follows. Section II introduces the state-of-the-art literature on ML methods integrated with traditional models devoted to the representation of freeway traffic dynamics. Section III illustrates the methodology adopted to develop the multi-class hybrid traffic flow model. The case study application, including tests with datasets of varying sizes and comparisons of the proposed model’s performance with pure machine learning (ML) and multi-class METANET, is presented in Section IV. Finally, Section V provides some concluding remarks.

SECTION II.

Related Work

Over the past few years, several studies have explored the adoption of Physics-Guided Machine Learning models for traffic state estimation and prediction. Such models primarily fall into two categories: Physics-Informed Machine Learning, which directly incorporates governing physical principles into the learning process, and Physics-Regularized Machine Learning, which leverages physics-based constraints as regularization terms to guide the learning process.

In [14] a preliminary version of the approach proposed here is reported. A first extension included in this paper, with respect to that previous work, refers to a fundamental improvement of the use of the macroscopic traffic model as regularizing component. Specifically, a suitable calibration procedure of the parameters of the METANET model on the basis of real data of the case study under concern is introduced before adopting METANET in the hybrid estimation framework here proposed. This significantly affects the correctness and effectiveness of the method with respect to the case in which the same parameters are determined as a result of the inference algorithm reported in Section III-C.

Moreover, a detailed experimental analysis wider than the simple analysis reported in [14], has been realized and reported in this paper. This analysis also includes the adoption of several datasets of different sizes on which the training and validation algorithms have been performed to evaluate a proper size of the dataset for training.

A. Physics-Guided Machine Learning Approaches

A series of papers have studied Physics-Informed Machine Learning approaches. Huang and Agarwal [16] introduced a Physics-Informed Machine Learning model for traffic density estimation, in which the loss function of the physics given by the LWR model is combined with the loss function of a deep learning neural network with 11 layers. The authors extended their work in [17], by enriching data considering both infrastructure and intelligent vehicles, empowering the model applicability even in the case of sparse data. In [18], the first-order macroscopic model CTM (Cell Transmission Model) was merged with a neural architecture as in [17]. They also explored data collection scenarios with fog computing infrastructure in [19]. Additionally, their work [20] examined the limitations of Physics-Informed Deep Learning in reconstructing hyperbolic Partial Differential Equations (PDEs) and found that learning the equation of the Lighthill-Whitham-Richards (LWR) model is more challenging than parabolic PDEs.

Shi et al. [21] proposed a Physics-Informed Machine Learning model based on a three-parameter LWR and a neural network, to estimate traffic density. They extended their work in [22] to include a second-order traffic flow model. In [23], a new version has been introduced, where ML terms are integrated into the LWR model to learn the fundamental diagram.

A three-parameter LWR model was also considered by Di et al. [24], but in this work and also in [25] the LWR model is integrated into a hybrid computational graph (HCG) incorporating a physics-uninformed neural network (PUNN) and a physics-informed computational graph (PICG).

Rempe et al. in [26] designed separate neural networks for estimating motorway traffic states based on Greenshields and trapezoidal fundamental diagrams.

Liu et al. [27] developed a model-based neural network for density reconstruction, substituting the LWR model with a coupled micro-macro model considering a parabolic PDE. Their extended work in [28] focused on identifying the velocity function, with a 5-layer, 10-neuron neural network.

A further step is done in [29] where traffic domain knowledge is incorporated into a Deep Convolutional Neural Network (Deep CNN) for speed field estimation, presenting an anisotropic kernel design. The proposed kernel considers the direction and speed of traffic flow, resulting in more accurate speed field estimations and reducing model complexity and computational efficiency. The model was tested under different simulated situations and with data from probe vehicles.

Mo et al. [30] explored a range of Physics-Informed car-following models. They considered artificial neural networks (ANNs), the long short-term memory (LSTM) model, and many physics-based models such as the Intelligent Driving Model (IDM), the Optimal Velocity Model (OVM), the Gazis-Herman-Rothery model (GHR), and the Full Velocity Difference Model (FVD).

In more recent works, neural networks designed to operate directly on graphs are considered in the Physics-Informed Machine Learning models. Shi et al. [31] introduced a Physics-Informed Spatiotemporal Learning Framework, designed to estimate traffic density and travel time. The proposed framework is based on the LWR model and a spatiotemporal graph convolution neural network that considers temporal and spatial dependence.

Lu et al. [32] proposed a Joint Traffic State and Queue Profile Estimation (JSQE) model formulated as a non linear programming problem represented by a computational graph (fully connected neural networks (FCNNs)), which is then solved using a forward-backward method. The model estimates traffic speed and travel time.

While Zhang et al. [33] proposed a Traffic State Estimation (TSE) model combining computational graphs with Physics-Informed Machine Learning methods. At first, the parameters of the fundamental diagram are determined through the computational graph and then an 8-layer deep neural network in combination with the LWR model is used to estimate the traffic state.

In addition to traffic state prediction, model-based reinforcement learning for collision avoidance and traffic signal management, as well as detecting real-time patterns for passenger vehicle occupancy in public transportation systems has also been investigated in recent research. A model-based reinforcement learning (MBRL) called MuJAM (MuZero Joint Action Modeling), is used by [34] to simulate the adaptive traffic signal control (ATSC) environment. Building a comprehensive traffic network model that captures the dynamics of vehicle flow and signal interactions is part of this process. MuJAM facilitates better real-time decision-making and coordination across intersections by forecasting traffic conditions based on the activities of several controllers.

Meanwhile, in [35] a method for autonomous car navigation, which blends model-based control techniques with deep reinforcement learning (DRL) is proposed. While the model-based control strategy determines desired course instructions based on the expected trajectory, the DRL component creates a global path-planning algorithm that maps environmental states to waypoints.

Furthermore in [36], a two-stage model is used in a novel method for forecasting passenger counts in public transit. Prior to capturing differences between the observed values and the baseline predictions, a real-time model is built on past automatic passenger counting data to train a baseline predictor. This method produces accurate forecasts by identifying new patterns in real-time and modifying the predictions accordingly.

On the other hand only a few works such as [12], [13], have introduced Physics-Regularized Machine Learning approaches for estimating traffic states, based on Gaussian Processes. In [12], the physics of the process was represented by continuous macroscopic traffic flow models such as the Lighthill-Whitham-Richards (LWR), Payne-Whitham (PW), and Aw-Rascle-Zhang (ARZ) models. To encode physical knowledge a shadow GP and an enhanced latent force model are used, while the evidence lower-bound of the model likelihood is maximized based on the posterior regularization inference framework. In the subsequent paper [13], the authors improved the methodology using gradual learning, where, at the beginning, the model involves lower-order models, and then the learned parameters can be fine-tuned with higher-order models. This improved the computational efficiency and increased the ability of the model to encode complex traffic flow models.

B. Considered Traffic Models and Datasets

Most of the related studies have developed Physics-Informed Deep Learning models that are built on the LWR model, which is a first-order continuous macroscopic traffic model and a deep neural network [16], [17], [18], [19], [20], [21], [22], [23], [24], [25], [26].

Regarding the LWR model, different fundamental diagrams are considered: the Greenshield fundamental diagram [16], [17], [18], [19], [33]; the Greenshield and a trapezoidal fundamental diagram [26]. Moreover, some of the studies consider a three-parameter LWR model [21], [24].

Regarding the neural networks, besides deep architectures [16], [17], [18], [19], [20], [21], [22], [23], [24], [25], [26], [28], [37], some different architectures are proposed: the LSTM model [30], neural networks with a shared feature extracting layer for joint estimation [32], spatiotemporal graph convolution neural network with spatial, adjustment, temporal, and feature layers [31]; hybrid computational graph (HCG) comprising of two computational graphs [24]; anisotropic kernel Deep CNN [29].

Only a few of the studies consider other traffic models as, for instance, the first-order Cell Transmission Model [18]; the second-order continuous model Aw-Rascle-Zhang; the Intelligent Driver Model, the Optimal Velocity Model, the Gazis-Herman-Rothery model, and the Full Velocity Difference Model [30]; the coupled micro-macro model [27], [28].

Referring to the adopted data for learning and testing the proposed Physics-Guided Machine Learning approaches, most of the studies use pre-built datasets such as NGSIM (see for instance the works [18], [21], [24], [29], [30], [33]) or PEMS [32]. Other studies, such as [16], [18], [24], [25], [29], use synthetic data, while the works [16], [17], [18], [19], [20], [21], [28], [29] have included data gathered from intelligent vehicles. Note that, only in [18] data gathered by Connected and Automated Vehicles (CAVs) have been considered. Some studies use a combination of synthetic data and data gathered from probe vehicles also comparing model performance depending on the type of data used.

SECTION III.

The Proposed Hybrid Multi-Class Traffic Model

This study proposes a novel hybrid model representing freeway traffic that integrates equations describing the dynamics of the system process into a machine-learning framework.

The proposed model is developed using a Gaussian Process (GP) and this choice is motivated by our intention to deploy the model in real-world scenarios where data may be scarce and measurements noisy. Indeed, GPs exhibit robust performance with limited datasets compared to deep neural networks and offer a valuable attribute of quantifying prediction uncertainties, thereby enhancing decision-making reliability. Furthermore, GPs tend to be more interpretable in contrast to deep neural networks, as they rely on kernel functions, facilitating a deeper understanding of input relationships and their influence on predictions, and providing valuable insights into traffic behavior.

Additionally, our model extends the framework proposed in [15] to encompass the multi-class scenario and the presence of on/off ramps. Specifically, our approach amalgamates Machine Learning techniques with the discrete time-space multi-class METANET model, previously proposed in the works [9], [10].

The resulting hybrid model is developed by embedding the multi-class traffic flow model METANET within the posterior regularization process [38] of the Gaussian Process as a penalty term. This term provides the equations that impose the conservation of vehicles and the speed dynamics of each vehicle class on the basis of the multi-class METANET model.

It is worth noting that for the development of the hybrid multi-class traffic model several parameters have to be defined. In particular, the multi-class traffic flow METANET model used for the process of regularization is characterized by the presence of several parameters that need to be appropriately defined using the available traffic measurements. In this work, the calibration of the parameters of the multi-class traffic model has been performed by means of a specific version of the Simulated Annealing algorithm, see Section III-B for further details.

The Machine Learning parameters are, then, estimated using maximum a posteriori estimation [39] and are further fine-tuned via the mini-batch Gradient Descent algorithm (m-BGD), as described in Section III-C.

This section is structured as follows, first we introduce Gaussian processes and their regularization procedure, next we present the multi-class traffic model employed in regularization and, finally, we expound upon the proposed hybrid traffic model in detail.

A. Gaussian Processes

A Gaussian process is a framework for non-parametric regression and classification that models the data points as jointly Gaussian [40].

Given the training set \mathcal {S}=(X,Y) with N samples, where the input vector with dimension s is denoted with X={[x_{1}, \ldots, x_{N}]}^{T} , whereas the output vector with dimension s^{\prime } is denoted with Y={[y_{1},\ldots, y_{N}]}^{T} . Then, the predictive distribution of the unknown data x^{\ast } is obtained from the joint Gaussian probability density function with mean \mu (x^{\ast }) and covariance \sigma (x^{\ast }) as:\begin{equation*} p\left ({{f(x^{\ast })| x^{\ast }, X, Y }}\right) = \mathcal {N} \left ({{\mu (x^{\ast });\sigma (x^{\ast })}}\right) \tag {1}\end{equation*}

View SourceRight-click on figure for MathML and additional features.where\begin{align*} \mu \left ({{x^{\ast }}}\right)=& \Xi _{\ast }^{T}\left ({{\Xi +\sigma _{\epsilon }^{2}I}}\right)^{-1}Y \tag {2}\\ \sigma \left ({{x^{\ast }}}\right)=& \Xi \left ({{x^{\ast },x^{\ast }}}\right)-\Xi _{\ast }^{T}\left ({{\Xi +\sigma _{\epsilon }^{2}I}}\right)^{-1}\Xi _{\ast } \tag {3}\end{align*}
View SourceRight-click on figure for MathML and additional features.
where \sigma _{\epsilon }^{2} is the variance of the isotropic Gaussian noise in Y, while I is the identity matrix and \Xi and \Xi _{*} are the covariance matrices, i.e., the kernel matrices, defined as:\begin{align*} \left [{{\Xi }}\right ]_{i,j}=& \Xi \left ({{x_{i},x_{j}}}\right) \quad \forall i,j=1,\ldots, N \tag {4}\\ {}\left [{{\Xi _{\ast }}}\right ]_{i}=& \Xi \left ({{x^{\ast },x_{i}}}\right)\quad \forall i=1,\ldots, N \tag {5}\end{align*}
View SourceRight-click on figure for MathML and additional features.

Squared Exponential Automatic Relevance Determination (SE-ARD) kernel and Radial Basis Function (RBF) kernel are chosen as kernels for the Gaussian Process. Equations (6) and (7) are the SE-ARD and RBF kernel equations, respectively.\begin{equation*} \Xi _{SE-ARD}\left ({{x, x^{\ast }}}\right)= \sigma _{1}^{2} exp \left ({{-\frac {1}{2}\sum _{i=1}^{s}\left ({{\frac {\left ({{x_{i}-x_{i}^{\ast }}}\right)^{2}}{{\eta }_{i}^{2}}}}\right)}}\right) \tag {6}\end{equation*}

View SourceRight-click on figure for MathML and additional features.with \sigma _{1} representing the signal standard deviation, {\eta }_{i} is the length scale for dimension i.\begin{equation*} \Xi _{RBF}\left ({{x, x^{\ast }}}\right)= \sigma _{2}^{2} exp \left ({{-\frac {1}{2}\frac {||x-x^{\ast }||^{2}}{{\eta }^{2}}}}\right) \tag {7}\end{equation*}
View SourceRight-click on figure for MathML and additional features.
where \sigma _{2} is the signal standard deviation and \eta is the length scale.

The parameters of the kernel are included in set \Theta = \{\sigma _{1}^{2}, \sigma _{2}^{2}, \underline {\eta }, \sigma _{\epsilon }^{2} \} , where \underline {\eta } is the vector of length scales. The values of the parameters in \Theta are obtained via the parameter inference procedure described in [7]. At the beginning of this procedure, the beliefs about parameters’ values are encoded within a multivariate normal distribution with mean zero and the covariance matrix dependent on the input variables X and parameters \Theta , known as the prior distribution.\begin{equation*} P\left ({{\Theta | X, Y}}\right) = \frac {P\left ({{Y|X,\Theta }}\right)P\left ({{\Theta }}\right)}{P(Y|X)} \tag {8}\end{equation*}

View SourceRight-click on figure for MathML and additional features.where P(\Theta) represents the prior distribution over the parameters, while the likelihood function (P(Y|X,\Theta)) describes the probability of observing the data given the parameters \Theta , and finally, the marginal likelihood (P(Y|X)) is the probability of observing the data averaged over all possible functions.

The above equation can also be given by:\begin{equation*} P\left ({{\Theta | X, Y}}\right)~\propto P\left ({{Y|X,\Theta }}\right)P\left ({{\Theta }}\right) \tag {9}\end{equation*}

View SourceRight-click on figure for MathML and additional features.since the marginal likelihood is not dependent on the parameters in \Theta .

It may be noticed that maximizing the likelihood function results in the best feasible fit between the model and the data since it measures how accurately the GP model accounts for the measured data. Since it transforms the product of probabilities into a sum of logarithms, log-likelihood maximization is frequently more practical than the direct maximization of the likelihood. It makes the calculations simpler and more numerically stable.

So, in a Gaussian process, in order to find the ideal parameters or hyper-parameters that offer the best fit with the data under observation, the objective function that is to be maximized is typically expressed by the log-likelihood function.\begin{equation*} log P\left ({{\Theta | X, Y}}\right)~\propto log P\left ({{Y|X,\Theta }}\right)+log P\left ({{\Theta }}\right) \tag {10}\end{equation*}

View SourceRight-click on figure for MathML and additional features.

In order to enhance the robustness of the estimates of unknown parameters a machine learning model commonly goes through a regularization procedure. More specifically the technique of Posterior regularization [38] estimates the parameters by adding a penalty term to the log-posterior distribution:\begin{align*} \ log P\left ({{\Theta | X, Y}}\right)~\propto log P\left ({{Y|X,\Theta }}\right)+log P\left ({{\Theta }}\right)+\lambda *R\left ({{\Theta }}\right) \tag {11}\end{align*}

View SourceRight-click on figure for MathML and additional features.where R(\Theta) is the regularization term and \lambda the regularization hyperparameter.

The objective of posterior regularization is obtaining the values of \Theta maximizing this regularized log-posterior. This balances the likelihood of the data with the prior knowledge (regularization) and helps to prevent overfitting by discouraging extreme parameter values.

B. Multi-Class METANET Model

The multi-class version of METANET model and the parameters calibration procedure needed for using the traffic model for the regularization process of the machine learning component are presented in this section. In more detail, the model adopted here is an extension of the METANET model, initially presented in [8], in which the dynamics of multiple classes of vehicles, i.e., passenger vehicles and heavy-duty vehicles, are explicitly represented through appropriate dynamic equations.

Like the original METANET model, the multi-class version of METANET is characterized by the presence of dynamic equations representing the evolution of macroscopic traffic quantities discrete in both time and space. In this respect, hereinafter the index k, is adopted to denote a generic time step of the time horizon consisting of K time steps of duration T. Analogously, the index m denotes a generic freeway section, while M denotes the number of sections into which the entire freeway stretch is divided. Furthermore, hereafter L_{m} and \lambda _{m} denote the length and number of lanes of section m, respectively.

Finally, the distinction into user classes is given by the c index, which with c=1 stands for the class of passenger vehicles, while with c=2 represents the class of heavy-duty vehicles.

The macroscopic variables characterizing the traffic behavior of each class of users are:

  • the traffic density \rho _{m,c}(k) related to section m at time step k (in vehicles of class c per mile);

  • the mean traffic speed v_{m,c}(k) related to section m at time step k (in miles per hour);

  • the traffic volume q_{m,c}(k) leaving section m during time interval [kT, (k+1)T) (in vehicles of class c per hour);

  • the on-ramp traffic volume r_{m,c}(k) which enters section m during time interval [kT, (k+1)T) (in vehicles of class c per hour);

  • the off-ramp traffic volume s_{m,c}(k) which leaves section m during time interval [kT, (k+1)T) (in vehicles of class c per hour).

The METANET multiclass model includes a certain number of parameters, which are: the conversion factor between passenger vehicles and heavy-duty vehicles \varsigma , which allows the total traffic volumes to be quantified in terms of Passenger Car Equivalents (PCEs), the critical density \rho ^{rm cr} (in PCEs per mile) and the free flow speed v^{\textrm {free}}_{c} which is defined for each vehicle class c (in miles per hour).

The dynamic equations of the multi-class METANET model are given below. Let us first introduce the state equation related to traffic density, which for each user class c, with c=1,2 , for each section m, with m=1,\ldots,M and for each time step k, with k=0,\ldots,K-1 , is given by\begin{align*}& \rho _{m,c}(k+1)=\rho _{m,c}(k) \\& \;{}+\frac {T}{L_{m} \lambda _{m}}\bigg [q_{m-1,c}(k)-q_{m,c}(k) + r_{m,c}(k)-s_{m,c}(k)\bigg ] \quad \tag {12}\end{align*}

View SourceRight-click on figure for MathML and additional features.

Regarding the mean speed state equation, for each class of vehicles c, with c=1,2 , for each section m, with m=1,\ldots,M , and for each time step k, with k=0,\ldots,K-1 , it is formulated as follows\begin{align*}& v_{m,c}(k+1) = v_{m,c}(k) \\& \;+ \frac {T}{\tau _{c}} \bigg [{V}_{m,c} (k) - {v}_{m,c}(k)\bigg ] \\& \; +\frac {T}{L_{m}} {v}_{m,c}(k) \big ({v}_{m-1,c}(k) - {v}_{m,c}(k) \big) \\& \;- \frac {\nu _{c} T \big (\rho _{m+1}(k) - \rho _{m}(k) \big)} {\tau _{c} L_{m} \big (\rho _{m}(k) + \chi _{c} \big)} -\delta _{on,c} T\frac {{v}_{m,c}(k) {r}_{m}(k)}{L_{m}\left [{{\rho _{m,c}(k)+\chi _{c}}}\right ]} \tag {13}\end{align*}

View SourceRight-click on figure for MathML and additional features.where \tau _{c} , \nu _{c} , \chi _{c} , \delta _{on,c} are parameters that need to be properly defined through a calibration procedure based on the available traffic measurements. Furthermore, it is worth noting that in (13) the steady-state speed density relation V_{m,c}(k) and the total density \rho _{m}(k) (in PCEs per mile) are introduced to correctly describe the speed dynamics of each class of vehicles. In particular the steady-state speed density relation used in this work is\begin{equation*} V_{m,c}(k) = v^{\textrm {free}}_{c} \cdot \exp {\left [{{ -\frac {1}{\alpha _{c}} \left ({{\frac {\rho _{m}(k)}{\rho ^{\textrm {cr}}}}}\right)^{\alpha _{c}}}}\right ]} \tag {14}\end{equation*}
View SourceRight-click on figure for MathML and additional features.
where \alpha _{c} is another parameter that has to be appropriately defined. As already mentioned, the total density accounts the simultaneous presence of passenger and heavy-duty vehicles, considering them as PCEs, and can be calculated using the following formula\begin{equation*} \rho _{m}(k)= \rho _{m,1}(k)+\varsigma \rho _{m,2}(k) \tag {15}\end{equation*}
View SourceRight-click on figure for MathML and additional features.

Finally, based on the state equations (12), (13) the traffic flow for each class of vehicles c, with c=1,2 , exiting from each section m, with m=1,\ldots,M , and for each time step k, with k=0,\ldots,K-1 , is derived from\begin{equation*} q_{m,c}(k) = \rho _{m,c}(k) \cdot \lambda _{m}\cdot v_{m,c}(k) \tag {16}\end{equation*}

View SourceRight-click on figure for MathML and additional features.

As noted previously, the multi-class METANET model includes a set of parameters which must be calibrated against the real traffic data. Specifically, we denote with \mathbb {P}_{m} the set of model parameters that must be defined for each section m=1,\ldots,M \begin{equation*} \mathbb {P}_{m} =\{ \tau _{c}, \nu _{c}, \chi _{c}, \delta _{on,c}, \varsigma, \alpha _{c}, \rho ^{\textrm {cr}}, v^{\textrm {free}}_{c} \} \tag {17}\end{equation*}

View SourceRight-click on figure for MathML and additional features.thus, the values of the model parameters gathered in \mathbb {P}_{m} must be found by solving an optimization problem in which they represent the decision variables, whereas the objective function corresponds to the error between the measured traffic speed v_{m,c}^{\textrm {real}}(k) and flows q_{m,c}^{\textrm {real}}(k) and the one computed with the traffic model. The constraints of the problem are given by equations (12) –​(16). In addition, we denote with \mathbb {I} the set that collects the initial conditions of the state variables \mathbb {I}{=}\{~\rho _{m,c}(0) , v_{m,c}(0) , \forall c , \forall m~ }, while the set of external inputs is denoted with \mathbb {D} , and includes the boundary conditions of traffic states and the flows entering and exiting the ramps \mathbb {D} {=} \{ ~q_{0,c}(k) , v_{0,c}(k) , \rho _{M+1,c}(k) , \forall c , k=1, \ldots, K , r_{m,c}(k) , s_{m,c}(k) , \forall c , \forall m , k=1, \ldots, K~ }.

Therefore, the formulation of the optimization problem designed to find the optimal values of the model parameters in \mathbb {P}_{m} , is as follows.

Problem 1:

Given the initial conditions \mathbb {I} , and the external inputs \mathbb {D} , find \mathbb {P}_{m} that minimizes\begin{align*} {\mathcal J}=& \omega _{v} \sqrt {\frac {1}{K\cdot M} \sum _{k=0}^{K-1}\sum _{m=1}^{M} \bigg [v_{m,c}^{\textrm {real}}(k) - v_{m,c}(k)\bigg ]^{2}} \\& {} + \omega _{q} \sqrt {\frac {1}{K\cdot M} \sum _{k=0}^{K-1}\sum _{m=1}^{M} \bigg [q_{m,c}^{\textrm {real}}(k) - q_{m,c}(k)\bigg ]^{2}} \\& {}\text {subject to}~ (12)- (19)\tag {18}\end{align*}

View SourceRight-click on figure for MathML and additional features. where \omega _{v} and \omega _{q} are the error weights used in the objective function.

Note that since the dynamics of the freeway system, included in the constraints of the problem, is strongly nonlinear, the resulting Problem 1 is consequently nonlinear and its solution must be found with specific solution algorithms. In particular, for solving this calibration problem, we used the Simulated Annealing algorithm, which is a gradient-free algorithm used to perform nonlinear global optimizations, that has already been effectively adopted to solve nonlinear optimal control problems to regulate freeway traffic (see [41], [42]). More in details, this algorithm has been first introduced in [43], then later extended in the works [44] and [45]; here the version proposed in [46] is used, which introduces an alternative cooling scheme. Below we provide a brief description of the Simulated Annealing algorithm adopted in this paper; for further details, the interested reader may refer to [46].

The Simulated Annealing is an iterative stochastic algorithm that performs a random search in the solution space starting from an initial feasible solution denoted with s_{0} . In this calibration problem, the initial solution s_{0} is represented by a set of initial values of the model parameters collected in \mathbb {P}_{m} , which at the first iteration of the algorithm is assigned to the current solution named with s.

Then, at each subsequent iteration of the algorithm, a candidate solution s_{\imath } , is found, which contains tentative values of the model parameters. This solution is randomly generated in a neighbourhood N(s) of radius r of the current solution s. This candidate solution replace the current solution, i.e., s=s_{\imath } , if it improves its objective function value, i.e., if {\mathcal {J}}(s_{\imath })\lt {\mathcal {J}}(s) , where {\mathcal {J}}(s_{\imath }) and {\mathcal {J}}(s) are the objective functions associated with candidate solution and current solution, respectively.

In order to avoid the random search getting stuck in a local minimum, the Simulated Annealing algorithm accepts that a candidate solution may worsen the current one. In particular, it is possible for the solution containing the tentative values of the model parameters to replace the current solution with a probability defined by \exp \left ({{{}{}\frac {{\mathcal {J}}(s)-{\mathcal {J}}(s_{\imath })}{\Gamma _{\imath }}}}\right) , wherein the parameter \Gamma _{\imath } , called “temperature”, is iteratively adjusted based on an appropriate cooling scheme, so that the probability of accepting a deteriorating solution decreases with iterations of the algorithm. In accordance with the version proposed in [46], the temperature at iteration \imath is given by \Gamma _{\imath }= \left |{{ {}{}\frac { {\mathcal {J}}(s)}{\log (P_{\imath })}}}\right |\delta , where \delta is the percentage of worsening at each iteration and P_{\imath } represents the worsening reference acceptance probability, with P_{\imath } = \alpha \cdot P_{\imath - 1} .

The algorithm terminates when the maximum number of iterations is reached, i.e., \imath =I^{\max } , or when the maximum number of consecutive non-improving iterations occurs, i.e., when \imath =I^{\textrm {ni,max}} .

C. Physics-Regularized Machine Learning Model

As mentioned before, regularization is the procedure that a machine learning model commonly goes through to improve the robustness of the estimates of unknown parameters, and more specifically the technique of Posterior regularization [38] is used for this purpose. Posterior regularization allows to estimate the model parameters by adding a penalty term to the log-prior function and maximizing the resulting objective function.

The component which is added to the loss function (i.e., the likelihood function), in order to regularize the process, is generated from the dynamic equations of the multi-class METANET model, for each class, c=1,2 , as below:\begin{align*}& g_{1,c}=\hat {\rho }_{m,c}(k+1)-\hat {\rho }_{m,c}(k) \\& \;{} - \frac {T}{L_{m} \lambda _{m}}\bigg [\hat {q}_{m-1,c}(k)-\hat {q}_{m,c}(k)+ r_{m,c}(k)- s_{m,c}(k)\bigg ] \tag {19}\\& g_{2,c} =\hat {v}_{m,c}(k+1)- \hat {v}_{m,c}(k) \\& \;{}- \frac {T}{\tau _{c}} \bigg [ v^{\textrm {free}}_{c} \cdot \exp {\left [{{ -\frac {1}{\alpha _{c}} \left ({{\frac {\hat {\rho }_{m,1}(k)+\varsigma \hat {\rho }_{m,2}(k)}{\rho ^{\textrm {cr}}}}}\right)^{\alpha _{c}}}}\right ]} \\& \;\quad - \hat {v}_{m,c}(k)\bigg ] \\& \;{}-\frac {T}{L_{m}} \hat {v}_{m,c}(k) \big (\hat {v}_{m-1,c}(k) - \hat {v}_{m,c}(k) \big) \\& \;{}+ \frac {\nu _{c} T \big (\hat {\rho }_{m+1,1}(k)+\varsigma \hat {\rho }_{m+1,2}(k) - \hat {\rho }_{m,1}(k)-\varsigma \hat {\rho }_{m,2}(k) \big)} {\tau _{c} L_{m} \big (\hat {\rho }_{m,1}(k)+\varsigma \hat {\rho }_{m,2}(k) + \chi _{c} \big)} \\& \;{}+\delta _{on,c} T\frac {\hat {v}_{m,c}(k) r_{m}(k)}{L_{m}\left [{{\hat {\rho }_{m,c}+\chi _{c}}}\right ]} \tag {20}\\& g_{3,c}=\hat {q}_{m,c}(k) - \hat {\rho }_{m,c}(k) \cdot \lambda _{m}\cdot \hat {v}_{m,c}(k) \tag {21}\end{align*}

View SourceRight-click on figure for MathML and additional features.where \hat {\rho } _{m,c} , \hat {v}_{m,c} , \hat {q}_{m,c} are the estimated values for density, speed, and flow for each class of vehicles c.

In order to estimate the values of these equations, the estimated values for density, speed, and flow for each class of vehicles are needed. To address this challenge, m pseudo-observations (Z,\omega) are introduced. Pseudo observations (Z,~\omega) have the same structure as input data (X, Y) . Z is referred to spatiotemporal data, representing position and time. Since \omega does not have a direct physical meaning, it is represented as a vector of zeros.

In several approaches, the pseudo-observations are randomly generated. Here, to maintain a coherence with existing data, pseudo-observations (i.e., the spatiotemporal input), have been obtained by a clustering approach. Specifically, the k-Means algorithm has been adopted to divide the data into clusters [47]. Then, one representative value has been chosen in each cluster, allowing us to select representative samples from different regions of the data, ensuring that the pseudo-observations reflect the diversity present in the real data and reducing the risk of bias in selecting pseudo-observations.

Once having the pseudo-observations, we can estimate the values of \hat {\rho } _{m,c} , \hat {v}_{m,c} , \hat {q}_{m,c} through Gaussian processes. Then we can obtain g values from equations (19), (20) and (21) and include these terms in the regularization of Gaussian Processes.

The log marginal likelihood can be derived by marginalizing all the latent variables from the joint posterior probability:\begin{equation*} p\left ({{Y,\omega,g,\hat {f},Z| X }}\right) = p(Y|X) p\left ({{\omega,g, \hat {f}, Z| X,Y}}\right) \tag {22}\end{equation*}

View SourceRight-click on figure for MathML and additional features.

The log marginal likelihood contains an expectation term which makes its value difficult to find. A solution is to evaluate its evidence lower bound L, which is a lower bound on the log marginal likelihood.

The derived objective function contains two components:

  1. the log marginal likelihood of the Gaussian Process where the input data are the observed ones;

  2. the extra term generated from the multi-class METANET equations where the input data are the pseudo inputs.

The inference algorithm, shown in Algorithm 1, defines the values of parameters maximizing the value of the objective function.

Algorithm 1: - Parameter Inference Algorithm
Algorithm 1:

Parameter Inference Algorithm

The resulting optimization problem is solved using the mini-batch Gradient Descent algorithm (m-BGD) updated using the ADAM optimizer. The algorithm is implemented in Tensorflow framework.

SECTION IV.

Case Study Application

This section is devoted to analyzing the results produced by the application of the hybrid multi-class traffic model to estimate the traffic state related to a real case study. More in detail, the proposed model has been tested by considering as a case study a portion of the I-8 Kumeyaay Highway in California, USA, and using for the input data the related traffic data set accessible within the Caltrans Performance Measurement System (PeMS). The considered freeway stretch has a length of 12.89 [km] shown in Fig. 1. In addition, the direction we have considered has 4–6 lanes and includes several sensor stations from which the data are gathered every 5 minutes. the positions of sensors are indicated in Fig. 1, where different colors refer to the way in which each sensor has been used in this work. Red sensors are those related to the adopted measurements, yellow sensors refer to positions in which estimations are realized (on sensor S4 for the major part of the analysis here reported), green/purple sensors are used to compute on-ramp/off-ramp volumes.

FIGURE 1. - The considered freeway stretch.
FIGURE 1.

The considered freeway stretch.

The application to this real case study consists of two parts. In the first part, some tests have been conducted on the hybrid model considering different sizes of the datasets in order to evaluate the minimum size required to achieve satisfactory accuracy in estimating traffic state. The second part has been devoted to comparing the proposed hybrid model with a fully data-driven model and the multi-class METANET model.

A. Dataset Size Analysis

In order to perform this analysis, datasets referring to 1, 3, 5, 7, 10, and 14 days have been used as input data. These data refer to the periods 30/01/2023, 30/01/2023-01/02/2023, 30/01/2023-03/02/2023, 30/01/2023-05/02/2023, 30/01/2023-08/02/2023 and 30/01/2023-12/02/2023, respectively.

For all datasets, traffic data measured by sensors at locations S1, S2, S3 have been used for training the hybrid model whereas the traffic state estimation has been referred to location S4 (please refer to Fig. 1 for the location of sensors). The time interval from 7:00 am to 7:00 pm has been considered. The proposed model also includes traffic on the freeway ramps. As regards off-ramps, the outflows data are approximated using sensors S1-S7 for the first off-ramp and S3-S9 for the second off-ramp. Instead, inflows data are approximated using sensors S2-S8 for the first on-ramp, S9-S10 for the second on-ramp and S10-S4 for the third on-ramp. Since our dataset lacks direct sensor data for on/off ramps, we use the difference between the closest upstream and downstream sensor readings to calculate the traffic flow at these sites. This method is predicated on the idea that the shift in traffic flow between these two locations reflects the principal influence of the on/off ramp on the mainline flow. We utilize exponential smoothing, which incorporates both recent and historical traffic patterns at similar ramp locations, to improve the estimate and remove noise unrelated to ramp operations.

To provide an insight on the considered data, in Fig. 2, the distributions of speed and flow for the two vehicular classes in sensors S1, S2, and S3 detected in the five-day data set are shown. The traffic data show a similar distribution for each of the five days, with higher values of traffic flow from 7:00 am to 7:00 pm and rush hours ranging from 3:00 pm and 6:00 pm, where a significant decrease in speed is observed.

FIGURE 2. - Passenger vehicles flow (a), passenger vehicles speed (b), heavy-duty vehicles flow (c), and heavy-duty vehicles speed (d) related to the 5-day dataset at locations S1, S2 and S3.
FIGURE 2.

Passenger vehicles flow (a), passenger vehicles speed (b), heavy-duty vehicles flow (c), and heavy-duty vehicles speed (d) related to the 5-day dataset at locations S1, S2 and S3.

The training of the hybrid model has been performed using a learning rate of 0.05, and the predicted trends of flow and speed for passenger vehicles and heavy-duty vehicles for 1, 5, 7, and 14 days of training are depicted in Figures 3, 4, 5, 6, respectively. The corresponding errors are plotted in Fig. 7 and the Mean Absolute Percentage Errors (MAPEs) are shown in Tables 1 and 2.

TABLE 1 MAPEs for Flow and Speed Prediction of Passenger Vehicles With Different Training Data Sets
Table 1- MAPEs for Flow and Speed Prediction of Passenger Vehicles With Different Training Data Sets
TABLE 2 MAPEs for Flow and Speed Prediction Of Heavy-Duty Vehicles With Different Training Data Sets
Table 2- MAPEs for Flow and Speed Prediction Of Heavy-Duty Vehicles With Different Training Data Sets
FIGURE 3. - Comparison of the flow trend of passenger vehicles in S4 obtained with the hybrid multi-class model (green line) with the observed data (red dots), for 1 day (a), 5 days (b), 7 days (c), and 14 days of training (d).
FIGURE 3.

Comparison of the flow trend of passenger vehicles in S4 obtained with the hybrid multi-class model (green line) with the observed data (red dots), for 1 day (a), 5 days (b), 7 days (c), and 14 days of training (d).

FIGURE 4. - Comparison of the speed trend of passenger vehicles in S4 obtained with the hybrid multi-class model (green line) with the observed data (red dots), for 1 day (a), 5 days (b), 7 days (c), and 14 days of training (d).
FIGURE 4.

Comparison of the speed trend of passenger vehicles in S4 obtained with the hybrid multi-class model (green line) with the observed data (red dots), for 1 day (a), 5 days (b), 7 days (c), and 14 days of training (d).

FIGURE 5. - Comparison of the flow trend of heavy-duty in S4 obtained with the hybrid multi-class model (green line) with the observed data (red dots), for 1 day (a), 5 days (b), 7 days (c), and 14 days of training (d).
FIGURE 5.

Comparison of the flow trend of heavy-duty in S4 obtained with the hybrid multi-class model (green line) with the observed data (red dots), for 1 day (a), 5 days (b), 7 days (c), and 14 days of training (d).

FIGURE 6. - Comparison of the speed trend of heavy-duty vehicles in S4 obtained with the hybrid multi-class model (green line) with the observed data (red dots), for 1 day (a), 5 days (b), 7 days (c), and 14 days of training (d).
FIGURE 6.

Comparison of the speed trend of heavy-duty vehicles in S4 obtained with the hybrid multi-class model (green line) with the observed data (red dots), for 1 day (a), 5 days (b), 7 days (c), and 14 days of training (d).

FIGURE 7. - Standard errors for predicted flow and speed for passenger vehicles ((a) and (b)), heavy-duty vehicles ((c) and (d)).
FIGURE 7.

Standard errors for predicted flow and speed for passenger vehicles ((a) and (b)), heavy-duty vehicles ((c) and (d)).

In general, by observing the standard errors displayed in Fig. 7 and MAPEs shown in Tables 1, 2 it is possible to state that the predictive capabilities tend to improve when the size of the dataset increases and the hybrid model performs better. Another observation is that the predictions of the models trained from the data related to 7 days have a higher error than the ones obtained in the case in which the data of 5 days are used for training. In our opinion, this is due to the presence of weekend data, when mobility demand is significantly different from that of weekdays, which negatively affects the accuracy of the estimation of traffic conditions. However, this problem is overcome when 14-day data are used for training, for which the best results are obtained. It is worth pointing out that the training conducted with dataset of 14 days requires considerable computational time. On the other hand, a training based on 5-day data produces similar performance requiring a significant shorter computational time. Therefore, the second part of this section, which is devoted to comparing the traffic state estimation obtained with the proposed hybrid model with the pure data-driven model and the multi-class METANET model, is performed by considering the 5-day dataset training.

B. Comparison with Pure ML and Multi-Class METANET Model

To demonstrate the capability of the proposed hybrid model, it will be compared with the multi-class METANET model and the pure ML model, so we have to consider the fact that the hybrid model uses the section with inflows and outflows in the training phase and can predict the flow and speed values in any other point of the freeway, while the multi-class METANET model will calculate the flow and speed values in the section which has inflows and outflows. This is the reason why we have considered two crossings: the hybrid model will use the first crossing with inflows and outflows for training, but will make predictions at the same point as the multi-class METANET model after the second crossing, in sensor S4. The detectors providing the input data for the hybrid model are described in Section IV-A. The same sensors are used to provide the input data for the pure ML model.

The multi-class METANET model is calibrated using the whole data from all sensors. The stretch of 12.89 km long has been divided into 12 sections and we have used the same calibrating parameters for all sections.

1) Results Comparison

We have shown the speed and flow profiles estimated, with the pure GP model, the multi-class METANET model, and the hybrid multi-class traffic model, and the absolute differences between the predicted values and the real ones. These results are displayed in Figure 8, Figure 9 and Figure 10, Figure 11 for both: passenger vehicles and heavy-duty vehicles.

FIGURE 8. - Flow and speed predictions for passenger vehicles in S4 with pure ML model ((a) and (b)), multi-class METANET model ((c) and (d)), and hybrid multi-class model with in/out flows ((e) and (f)).
FIGURE 8.

Flow and speed predictions for passenger vehicles in S4 with pure ML model ((a) and (b)), multi-class METANET model ((c) and (d)), and hybrid multi-class model with in/out flows ((e) and (f)).

FIGURE 9. - Absolute difference between predicted and real data for flow and speed in S4, for passenger vehicles, using pure ML model ((a) and (b)), multi-class METANET model ((c) and (d)), and hybrid multi-class model with in/out flows ((e) and (f)).
FIGURE 9.

Absolute difference between predicted and real data for flow and speed in S4, for passenger vehicles, using pure ML model ((a) and (b)), multi-class METANET model ((c) and (d)), and hybrid multi-class model with in/out flows ((e) and (f)).

FIGURE 10. - Flow and speed predictions in S4, for heavy-duty vehicles, with pure ML model ((a) and (b)), multi-class METANET model ((c) and (d)) and hybrid multi-class model with in/out flows ((e) and (f)).
FIGURE 10.

Flow and speed predictions in S4, for heavy-duty vehicles, with pure ML model ((a) and (b)), multi-class METANET model ((c) and (d)) and hybrid multi-class model with in/out flows ((e) and (f)).

FIGURE 11. - Absolute difference between predicted and real data for flow and speed in S4, for heavy-duty vehicles, using pure ML model ((a) and (b)), multi-class METANET model ((c) and (d)) and hybrid multi-class model with in/out flows ((e) and (f)).
FIGURE 11.

Absolute difference between predicted and real data for flow and speed in S4, for heavy-duty vehicles, using pure ML model ((a) and (b)), multi-class METANET model ((c) and (d)) and hybrid multi-class model with in/out flows ((e) and (f)).

Table 3 shows the execution time related to the time of training in the case of pure GP and hybrid model and the time required to calibrate the model parameters in the case of multi-class METANET model. Table 4 shows the time of execution of each model. The hybrid model requires a higher execution time with respect to pure GP and METANET models, but this time can still be considered as fully acceptable to apply the estimation technique in real case studies. The training, calibration and testing process of the models with 500 iterations is done in a laptop equipped with 16 GB RAM, 2GHz, 4 Core CPU.

TABLE 3 Calibration and Training Time
Table 3- Calibration and Training Time
TABLE 4 Execution Time
Table 4- Execution Time

In Table 5 and 6 the Mean Absolute Percentage Error (MAPE) values in the case of three considered models are shown.

TABLE 5 MAPEs for Flow and Speed Prediction of Passenger Vehicles With Pure ML Model, Multi-Class METANET and Hybrid Model
Table 5- MAPEs for Flow and Speed Prediction of Passenger Vehicles With Pure ML Model, Multi-Class METANET and Hybrid Model
TABLE 6 MAPEs for Flow and Speed Prediction of Heavy-Duty Vehicles With Pure ML Model, Multi-Class METANET and Hybrid Model
Table 6- MAPEs for Flow and Speed Prediction of Heavy-Duty Vehicles With Pure ML Model, Multi-Class METANET and Hybrid Model

In Figure 10 and also from Table 6 it is noticed that the multi-class METANET model outperforms the hybrid model in speed estimation for heavy-duty vehicles. We believe this is due to the data-driven nature of the hybrid model, which relies on a smaller dataset for heavy-duty vehicles. To address this, we compared both methods with an expanded dataset, including sensor data from 14 days. The results are shown in Figure 12. The MAPE error for heavy-duty vehicle speed predictions using the multi-class METANET model is 0.065, while the hybrid multi-class model achieves a lower MAPE error of 0.058. The results indicate that increasing the dataset size enables the hybrid model to outperform the multi-class METANET model. However, as discussed in Section IV-A, we selected a 5-day dataset for each sensor to balance accuracy and computational efficiency in our infrastructure.

FIGURE 12. - Speed predictions in S4, for heavy-duty vehicles, with multi-class METANET model (a) and hybrid multi-class model with in/out flows (b) for 14 days of training.
FIGURE 12.

Speed predictions in S4, for heavy-duty vehicles, with multi-class METANET model (a) and hybrid multi-class model with in/out flows (b) for 14 days of training.

2) Generalizability of the Hybrid Model

In the above section, we have tested our model only on sensor S4 for two main reasons: the training data are limited only to three sensors, so there should be a reasonable ratio between training and testing data, where training data should be much more than testing one (usually a ratio 0.7:0.3 or 0.8:0.2); the second reason is related to future work where we aim to design a control scheme and in sensor S4 an on-ramp is present, thus allowing to actuate ramp-metering.

To prove the generalizability of the proposed hybrid model we have also tested it, separately in two different sensors with different dynamics. The results for testing the model in sensors S5 and S6, compared with the results for testing the model in sensor S4 are shown in Figure 13, for passenger vehicles flow and speed as well as for heavy-duty vehicles flow and speed.

FIGURE 13. - Real and predicted values with the hybrid model in sensor S4, S5 and S6 for passenger vehicles flow (a), passenger vehicles speed (b), heavy-duty vehicles flow (c), and heavy-duty vehicles speed (d).
FIGURE 13.

Real and predicted values with the hybrid model in sensor S4, S5 and S6 for passenger vehicles flow (a), passenger vehicles speed (b), heavy-duty vehicles flow (c), and heavy-duty vehicles speed (d).

Table 7 and Table 8 show the MAPE errors for predicting flow and speed for passenger vehicles and heavy-duty vehicles in three sensors, respectively.

TABLE 7 Passenger Vehicles Predictions Errors for Testing the Hybrid Model in Sensor S4, S5 and S6
Table 7- Passenger Vehicles Predictions Errors for Testing the Hybrid Model in Sensor S4, S5 and S6
TABLE 8 Heavy-Duty Vehicles Predictions Errors for Testing the Hybrid Model in Sensor S4, S5 and S6
Table 8- Heavy-Duty Vehicles Predictions Errors for Testing the Hybrid Model in Sensor S4, S5 and S6

To further demonstrate the general applicability of the proposed method, additional experiments were conducted to estimate traffic states upstream of the available measurements, allowing the evaluation of the ability of the hybrid model to capture backward shock waves. The general capability of Gaussian Processes for bidirectional state estimation in dynamical systems has already been demonstrated in [48]. Indeed, in that work the flexibility and suitability of GPs for complex systems where interactions propagate in multiple directions is shown. To evaluate this capability for our model we have considered the traffic data from sensor S2, S3 and S4 to train the model, and we have estimated speed and flow at S1. The obtained results are shown in Figure 14 while the corresponding MAPE errors are: 0.14 for passenger-vehicle flow, 0.08 for passenger-vehicles speed, 0.2 for heavy-duty vehicles flow and 0.13 for heavy-duty vehicles speed.

FIGURE 14. - Upstream traffic flow and speed predictions at S1 for Passenger Vehicles ((a) and (b)), heavy-duty vehicles ((c) and (d)).
FIGURE 14.

Upstream traffic flow and speed predictions at S1 for Passenger Vehicles ((a) and (b)), heavy-duty vehicles ((c) and (d)).

It is worth noting that the hybrid model is slightly less effective in estimating the speed of passenger and heavy duty vehicles when congestion begins to propagate backward. In addition, it is worth noting that the hybrid model, in order to evaluate the backward propagation of shock waves, requires traffic measurements upstream of those used for training, limiting the applicability of the model to specific applicative scenarios. However, in general, the hybrid model demonstrates its ability to successfully capture the fundamental physical principles governing traffic dynamics, including forward and backward shock wave propagation.

SECTION V.

Conclusion

In this paper, we introduced a novel Physics-Regularized multi-class Machine Learning model for traffic modeling, leveraging the multi-class METANET model and Gaussian Processes. Our approach represents an important advancement in the field, as it provides an effective modeling framework exploiting knowledge about the dynamics of the traffic process and information embedded in real data.

Specifically, to the best of our knowledge and based on an extensive literature review, our work is the first to conduct a comprehensive analysis of dataset size requirements for achieving accurate traffic state predictions. By analyzing the relationship between dataset size and prediction accuracy, our study sheds light on the optimal data requirements necessary to achieve robust performance in traffic forecasting tasks, considering both temporal constraints and resource limitations.

Moreover, the explicit definition of a multi-class model together with the possibility of including the presence of on-ramp flows, bring innovation in the proposed modeling framework. By incorporating these aspects, we enhance the realism and fidelity of our predictive model, enabling more accurate representations of real-world traffic dynamics.

An extensive numerical campaign has been performed whose results are discussed in this work. The obtained results show that in several scenarios the proposed hybrid model outperforms both pure machine learning techniques and pure multi-class METANET models. However, it is noteworthy that in certain scenarios, such as predicting the speed of trucks, the pure multi-class METANET model exhibits lower error rates. This highlights the importance of considering different vehicle classes and their features when designing predictive models for traffic modeling.

It is essential to acknowledge that direct comparisons with existing models in the literature pose certain challenges due to variations in modeling approaches, target variables, and evaluation metrics. Most prior studies typically focus on continuous first-order traffic models and density estimation, making direct comparisons with our discrete multi-class approach challenging. Nonetheless, our model consistently demonstrates better or comparable performance with respect to related models in the literature across various evaluation metrics.

References

References is not available for this document.