Introduction and Motivation
Traffic states such as vehicle velocity
The observation data of traffic states can be harvested through another channel — connected vehicles (CV) [6]. In this approach, traffic conditions are perceived via the ubiquitous onboard sensors in connected vehicles zooming through traffic. CV has the potential of vastly broadening the detection area of traffic state on roadways [7] and enabling the collection of other traffic statistics such as link travel time [8].
However, this application scenario, in reality, is also restricted in several ways. Traffic states obtained through CV need to be broadcast and stored by roadside units or edge devices in a computing network [9]. The current penetration rates of both CV and the compatible communication system are insufficient to make it a reliable practice for procuring data on traffic states. Besides, the communication between CV and data infrastructure generates elevated demand for network resources in pursuit of robust on-site computing capability, and low-latency data exchange [10].
The estimation of traffic states, therefore, becomes an important practice for traffic planners and policy practitioners [11]. It pertains to the inference of traffic state using data that is partially available, secured either through sensing devices or CV. A variety of traffic operations rely on this crucial assessment of traffic states [12], [13]. Tasks as minute as determining the length of traffic cycles and as substantial as adding additional lanes on a highway all rely on the outcome of traffic state estimation (TSE). A variety of TSE methods exist in the literature and can be categorized into three groups: model-based, data-based, and streaming-data-based [11]. Model-based approaches [14], [15], [16], [17] rely on a calibrated traffic flow model to estimate traffic states in unobserved areas. As a case in point, an approach relying on the fundamental relationship between traffic speed and density, and paired with a threshold error correction model is designed for short-term traffic state prediction [18]. Data-based approaches [19], [20], [21], [22], [23] utilize the insights derived from historical data and apply statistical or machine learning techniques to reconstruct and predict traffic states. For instance, the method of attention-driven recurrent imputation is proposed to estimate the missing vehicle velocity data [24]. The third category of streaming-data-based approaches [25], [26] contains methods that do not necessitate a priori knowledge of historical observation, instead take advantage of real-time data, with a weak assumption of traffic relationship acquired from empirical evidence.
The restraints on obtaining traffic state data call for the adoption of TSE models that can exploit a limited amount of observed data for training and produce a precise estimation of traffic states. Among the available approaches, deep learning (DL) neural network is a powerful machine learning method increasingly used in many TSE applications [27], [28], [29]. However, DL neural network also comes with shortcomings, such as the high requirement of training data and computing power, over-fitting, and transferability issues, limiting its appeal for time-critical applications, which calls for the role of physics in aiding the training process of a neural network in TSE [30], [31].
Physics-informed deep learning (PIDL), also referred to as physics-informed neural network (PINN), arms a neural network with the governing equations of a physical system [32]. It empowers the deep learning neural network with knowledge of the underlying relationship in observed data to efficiently use the limited data input for estimation and prediction tasks [33]. Since the introduction of its architecture [33], PIDL has been adopted in the field of traffic state estimation (TSE) [34], [35]. Researchers have experimented with PIDL to estimate both the traffic state and the fundamental diagram depicting the relationship between traffic states [28]. Second-order traffic models have also been considered in the application of PIDL for TSE [36].
A diverse range of mechanical engineering and computational science applications have also been proposed, signifying its advantage in utilizing the governing equation to accurately capture the physical system. Among its diverse implementations in the mathematical domain, the PIDL approach has been adopted for solving the free boundary problems [37], high-dimensional PDE [38], uncertainty quantification [39], and time-dependent stochastic PDE [40]. In the field of fluid dynamics, PIDL is employed to model the velocity and pressure fields [41], vortex-induced vibration [42], and fluid flows without the use of simulation data [43]. And on the engineering side, the modeling of cardiovascular flow [44], nano-optics [45], and proxy modeling in solid mechanics [46] have all witnessed the effectiveness of the PIDL approach.
Variants of PIDL have been proposed to learn the solution of partial differential equations (PDEs). For instance, Galerkin method-based hp-VPINN was introduced to solve PDEs with non-smooth solutions [47]. A Bayesian approach to PINN is presented for forward and inverse problems [48], and the idea of physics-informed adversarial training to solve PDE is proposed [49]. Particle swarm optimization is also put forward to PIDL training [50].
Amid the PIDL applications that have been demonstrating encouraging results, the focal point is to use the neural network to learn the solutions of deterministic PDE commanding a physical system. As the PIDL model ciphers the underlying relationship between state variables, it incorporates the governing equation as a priori knowledge into calculating cost. If the PDE of interest is smooth and has a strong solution, at the same time, is paired with an adequate number of collocation points where the physics cost is optimized, then PIDL accordingly is capable of achieving good accuracy in learning the solution to the PDE. However, recent research points to the limitations of PIDL for learning certain types of PDEs, such as hyperbolic conservation laws.
Scalar Conservation Laws in Traffic Flow Theory: Lighthill-Whitham-Richards (LWR) is a one-dimensional scalar conservation law and a commonly-used transportation model. No strong solution exists to LWR, given it is a hyperbolic PDE [1]. Certain classes of partial differential equations, including the LWR model, can be solved by the method of characteristics (details in Section II). In our previous work [34], [35], equipped with a realistic choice of data sampling on the interior traffic data (either Lagrangian data that come from connected vehicles (CV) or Eulerian data from roadside sensors and loop detectors), we have shown that PIDL can successfully and accurately learn the onerous task of reconstructing LWR PDE. However, no empirical evidence exists that PIDL can learn an LWR PDE with acceptable accuracy given only boundary and initial condition data, which is the default experimental setup for a variety of reconstruction problems in transportation and other domains [28], [32], [33], [36].
Research Goals: The overarching goal of this research is to understand the limitations of PIDL in traffic state estimation applications and initiate a dialogue on possible remedies in light of current literature. We aim to achieve the goal by addressing the following research questions:
Which form of the LWR model, hyperbolic PDE or parabolic PDE, is learned better by the PIDL algorithm?
What are the factors behind the disparity in learning results?
What is the effect of a diffusion term, which transforms the hyperbolic LWR PDE to a parabolic variation, in alleviating the weakness of PIDL?
What are the inherent architectural issues in PIDL that inhibit it from working with hyperbolic conservation law?
The research is divided into the following tasks: (a) exhibiting the contradistinction in terms of reconstruction accuracy between learning a commonly-used conservation law (LWR) in traffic flow theory, which is a first order hyperbolic PDE, and its second order parabolic counterpart by using a synthetic ring road density dataset, (b) further illustrating the limitations of PIDL in TSE with only the initial and boundary observations from the realistic field data (NGSIM), (c) understanding the impact of discontinuity in PDE solution, and the effect of a diffusion term on PIDL learning, (d) discussing the findings from the experimental results in light of existing literature and putting forward suggestions to improve PIDL results in TSE with the LWR PDE.
Key features and contributions:
We design a series of experiments by creating a circular testbed and by using realistic field (NGSIM) data to understand the performance of PIDL in TSE while incorporating the hyperbolic and parabolic versions of the physics.
We showcase the effect of the additional second order diffusion term in the LWR PDE on PIDL learning using only the initial and boundary observations from a ring road testbed.
We further investigate the contrast between PIDL learning the hyperbolic LWR PDE and its parabolic variant using a mix of Eulerian and Lagrangian training inputs (non-boundary data).
Using the realistic NGSIM field data, we shed light on the limitations of PIDL in TSE and demonstrate that, on occasions, a numerical PDE solver outperforms PIDL.
Finally, we discuss the reasoning behind the difficulties encountered when training a PIDL neural network with the LWR conservation law and list suggestions for improving the performance.
Note: The code accompanying this manuscript can be found in the GitHub repository at github.com/Urbanity-Lab/PIDL-Limitations
Outline: The rest of the manuscript is organized as follows: Section II gives the mathematical background of the scalar conservation laws, including the LWR model, strong and weak solutions, and method of characteristics. Section III introduces the preliminaries of physics-informed deep learning (PIDL) for learning the conservation law-based traffic flow models. Section IV sets up a case study to reveal the challenges of PIDL with the first order hyperbolic LWR PDE. Section V further illustrates the limitations of PIDL with field data from the NGSIM dataset. Section VI discusses the implications from results and Section VII concludes the paper with suggestions on future work.
Scalar Conservation Laws
In one dimension, a general form of the scalar conservation law is described by (1) where \begin{equation*} \partial _{x} f(u) + \partial _{t} u = 0, \;\;\; x \in \mathbb {R}, \; t \geq 0\tag{1}\end{equation*}
The flux function \begin{equation*} \lambda \partial _{x} u + \partial _{t} u = 0, \;\;\; x \in \mathbb {R}, \; t \geq 0\tag{2}\end{equation*}
An initial value problem for the linear advection equation is given by (3) and its solution is exhibited in (4).\begin{align*} \partial _{x} f(u) + \partial _{t} u=&0, \;\;\; x \in \mathbb {R}, \; t \geq 0 \\ u\left ({x, \; 0}\right ) = u_{0}(x)=&f\left ({x_{0}}\right ), \;\;\; x \in \mathbb {R}\tag{3}\\ u\left ({x, \; t}\right )=&u_{0}\left ({x - \lambda t}\right ), \;\;\; t \geq 0\tag{4}\end{align*}
A simple nonlinear particle differential equation is the Burgers’ equation, and it is one of the commonly used models as the scalar conservation law. The classical form of Burgers’ equation is presented in (5) where the \begin{equation*} u \partial _{x} u + \partial _{t} u - \epsilon \partial _{xx} u = 0, \;\;\; x \in \mathbb {R}, \; t \geq 0\tag{5}\end{equation*}
When \begin{align*} \partial _{x} \left ({u^{2} / 2}\right ) + \partial _{t} u=&u \partial _{x} u + \partial _{t} u = 0, \;\;\; x \in \mathbb {R}, \; t \geq 0\tag{6}\\ u\left ({x, \; t}\right )=&u_{0}\left ({x - ut}\right ), \;\;\; t \geq 0\tag{7}\end{align*}
The viscosity term is a diffusion term that flattens discontinuities and ensures a smooth solution.
A. Physics of Traffic Flow
Macroscopically, traffic flow is expressed using variables such as the mean velocity
LWR conservation law is a continuity equation, which holds for all macroscopic models and formulates the conservation of traffic flow. This equation relates the change of density with the gradient of flow. When the flow is considered as a static function (portrayed by a fundamental diagram), it leads to a first order continuity equation, also referred to as Lighthill-Whitham-Richards (LWR) model [53] - which is a hyperbolic partial differential equation (PDE). Given a location
The formulation of the LWR conservation law in the Eulerian coordinate system is shown in (8). It is worth noting that data provided by connected vehicles (CV), such as vehicle velocity
For \begin{equation*} \frac {\partial q\left ({x, \; t}\right )}{\partial x} + \frac {\partial \rho \left ({x, \; t}\right )}{\partial t} = 0\tag{8}\end{equation*}
For \begin{equation*} \frac {\partial v\left ({n, \; t}\right )}{\partial n} + \frac {\partial s\left ({n, \; t}\right )}{\partial t} = 0\tag{9}\end{equation*}
B. Strong and Weak Solutions
Given the initial value problem (10), \begin{align*} \partial _{x} f(u) + \partial _{t} u=&0, \;\;\; x \in \mathbb {R}, \; t \geq 0 \\ u\left ({x, \; 0}\right )=&u_{0}(x), \;\;\; x \in \mathbb {R}\tag{10}\end{align*}
If \begin{align*} f^{\prime } (u) \partial _{x} u + \partial _{t} u=&0, \;\;\; x \in \mathbb {R}, \; t \geq 0 \\ u\left ({x, \; 0}\right )=&u_{0}(x), \;\;\; x \in \mathbb {R}\tag{11}\end{align*}
In domain
When no strong solution to (10) exists, the smoothness requirement can be relaxed to find weak solutions, even if these solutions are not differentiable or even continuous. Weak solutions eliminate the derivative terms of
Multiplying the scalar conservation law (1) with a function \begin{align*}&\int _{0}^{\infty } \int _{-\infty }^{\infty } \left ({\partial _{x} f(u) + \partial _{t} u}\right )\psi dxdt \\&\;=\int _{0}^{\infty } \int _{-\infty }^{\infty } \left ({f(u)\psi _{x} + u \psi _{t}}\right )dxdt \\&\;\quad {}+ \int _{-\infty }^{\infty } u\left ({x, \; 0}\right )\psi (x)dx = 0\tag{12}\end{align*}
Notice in (12), the requirement on smoothness is lessened as there are no derivative terms of
It is worth noting that there is no strong solution to the LWR conservation law. Nevertheless, a diffusive term can be added to avoid breakdown and ensure a strong solution by making the hyperbolic conservation equation become a parabolic PDE. We will further discuss this in Section VI.
C. Method of Characteristics
The method of characteristics is used to solve quasilinear partial differential equations, converting the PDEs into ordinary differential equations (ODEs). Consider (1) and its solution \begin{equation*} \dot {x}(t) = u\left ({x(t), \; t}\right )\tag{13}\end{equation*}
From (13) observe that, \begin{equation*} \frac {d}{dt}u\left ({x(t), \; t}\right ) = \frac {dx}{dt}u_{x} + u_{t}\tag{14}\end{equation*}
Combining (1) and (14), we reach (15) that can propagate the solution \begin{equation*} \frac {du}{dt} = 0, \; \frac {dx}{dt} = u\tag{15}\end{equation*}
A simple discontinuous solution of the conservation law (10) is given by (16).\begin{align*} u\left ({x, \; t}\right ) = \begin{cases} u_{L}, & x < \lambda t, \\ u_{R}, & x \geq \lambda t, \end{cases}\tag{16}\end{align*}
If \begin{align*}&v_{f} \left ({1 - \frac {\rho \left ({x, \; t}\right )}{\rho _{m}}}\right ) \frac {\partial \rho \left ({x, \; t}\right )}{\partial x} + \frac {\partial \rho \left ({x, \; t}\right )}{\partial t} = 0 \\&\; \rho \left ({x, \; 0}\right ) = \begin{cases} \rho _{L}, & x < 0, \\ \rho _{R}, & x \geq 0, \end{cases}\tag{17}\end{align*}
The characteristic speed at \begin{equation*} \lambda (\rho ) = f^{\prime }(\rho ) = v_{f} \left ({1 - 2 \frac {\rho }{\rho _{m}}}\right )\tag{18}\end{equation*}
If we modify the problem in (17) with the initial condition that \begin{align*} \rho \left ({x, \; t}\right ) = \begin{cases} \rho _{L}, & \frac {x}{t} < \lambda \left ({\rho _{L}}\right ), \\ \frac {\lambda \left ({\rho _{R}}\right ) - \lambda \left ({\rho _{L}}\right )}{\rho _{R} - \rho _{L}} \frac {x}{t}, & \lambda \left ({\rho _{L}}\right ) \leq \frac {x}{t} < \lambda \left ({\rho _{R}}\right ), \\ \rho _{R}, & \lambda \left ({\rho _{R}}\right ) \leq \frac {x}{t} \\ \end{cases}\tag{19}\end{align*}
Physics-Informed Deep Learning
Development in deep learning (DL) neural networks has made it a suitable tool in the computational modeling of a physical system, which is often governed by complex non-linear functions [55]. When mean square error (MSE) is used as the measurement of cost in a DL neural network, the cost function can be formulated as (20), in which \begin{align*} J=&MSE_{\left ({\hat {\mathbf {u}}\left ({x, ~y, ~z, ~t}\right ), \; \mathbf {u}\left ({x, ~y, ~z, ~t}\right )}\right )} \\=&\frac {1}{N} \sum _{k=1}^{N}\left |{\mathbf {u}\left ({x, ~y, ~z, ~t}\right ) - \hat {\mathbf {u}}\left ({x, ~y, ~z, ~t}\right )}\right |^{2}\tag{20}\end{align*}
Automatic differentiation (AD) computes a state variable’s partial derivatives with respect to its spatial and temporal independent variables [56]. Through the layers of a neural network, an output
To evaluate the outputs from the neural network in terms of compliance with the governing physical laws, the physics cost is computed at a set of spatiotemporal points \begin{align*} \begin{cases} J_{DL} = \frac {1}{N_{u}} \sum _{k=1}^{N_{u}}\left |{\mathbf {u}\left ({x, ~y, ~z, ~t}\right ) - \hat {\mathbf {u}}\left ({x, ~y, ~z, ~t}\right )}\right |^{2} \\ J_{PHY} = \frac {1}{N_{f}} \sum _{k=1}^{N_{f}}\left |{\mathbf {f}\left ({x, ~y, ~z, ~t}\right ) }\right |^{2} \end{cases}\tag{21}\end{align*}
In (21), cost function
A. PIDL for Traffic State Estimation
This section provides details of the PIDL approach for traffic state estimation (TSE) by using LWR PDE and Greenshield’s fundamental diagram. Other physical models, such as second order flow models or discretized first order models can be obtained by following the steps laid out in this section. PIDL empowers a DL neural network with the system’s governing physical laws as priori knowledge [32]. The fundamental diagram of traffic flow and the conservation law serve as meaningful know-how in training a neural network to recognize the underlying relationship between traffic variables.
Several fundamental diagrams exist and are used accordingly depending on the situation. The commonly-used ones are: piece-wise affine speed-density relationship [15], triangular fundamental diagram [59], etc. Greenshield’s fundamental diagram [60], is one of the most utilized and simplest models in traffic flow theory. It makes an untangling assumption that the mean velocity has a linear relationship with the density. The relationship between traffic variables is described in (22), where \begin{align*} \begin{cases} q\left ({x, \; t}\right ) = \rho \left ({x, \; t}\right ) \; v_{f} \left ({1 - \frac {\rho \left ({x, \; t}\right )}{\rho _{m}}}\right ) \\ v\left ({x, \; t}\right ) = v_{f} \left ({1 - \frac {\rho \left ({x, \; t}\right )}{\rho _{m}} }\right ) \end{cases}\tag{22}\end{align*}
Plugging the relationship between variables
For \begin{equation*} \rho _{m} \left ({1 - \frac {2v\left ({x, \; t}\right )}{v_{f}}}\right ) \frac {\partial v\left ({x, \; t}\right )}{\partial x} - \frac {\rho _{m}}{v_{f}} \frac {\partial v\left ({x, \; t}\right )}{\partial t} = 0\tag{23}\end{equation*}
For \begin{equation*} v_{f} \left ({1 - \frac {2\rho \left ({x, \; t}\right )}{\rho _{m}}}\right ) \frac {\partial \rho \left ({x, \; t}\right )}{\partial x} + \frac {\partial \rho \left ({x, \; t}\right )}{\partial t} = 0\tag{24}\end{equation*}
Observe that both the equations provide the same physical law - the only difference is their dependent variable. Equation (23) formulates the law in terms of velocity
Both (23) and (24) are hyperbolic PDEs. A second order diffusive term can be added to make the PDE become parabolic and secure a strong solution. For example, (24) will become (25) where
For \begin{equation*} v_{f} \left ({1 - \frac {2\rho \left ({x, \; t}\right )}{\rho _{m}}}\right ) \frac {\partial \rho \left ({x, \; t}\right )}{\partial x} + \frac {\partial \rho \left ({x, \; t}\right )}{\partial t} = \epsilon \frac {\partial ^{2} \rho \left ({x, \; t}\right )}{\partial x^{2}}\tag{25}\end{equation*}
The second order diffusion term ensures the solution of PDE is continuous and differentiable, avoiding the breakdown and discontinuity in the solution to the PDE.
B. Training Data and Cost Functions
The cost function of a PIDL neural network reconstructing a density-field \begin{align*} \begin{cases} J_{DL} = \frac {1}{N_{o}}\sum _{j=1}^{N_{o}}\left |{\rho \left ({x_{o}^{j}, t_{o}^{j}}\right ) - \hat {\rho }\left ({x_{o}^{j}, t_{o}^{j}}\right )}\right |^{2} \\ J_{PHY} = \frac {1}{N_{c}}\sum _{j=1}^{N_{c}}\Big |v_{f} \left ({1 - \frac {2\hat {\rho }\left ({x_{c}^{j}, t_{c}^{j}}\right )}{\rho _{m}}}\right )\frac {\partial \hat {\rho }\left ({x_{c}^{j}, t_{c}^{j}}\right )}{\partial x} \\ + \frac {\partial \hat {\rho }\left ({x_{c}^{j}, t_{c}^{j}}\right )}{\partial t} \Big |^{2} \end{cases}\tag{26}\end{align*}
Weights can be assigned to the cost terms of \begin{equation*} J = \mu _{1} * J_{DL} + \mu _{2} * J_{PHY}\tag{27}\end{equation*}
C. Training Approaches
There are many optimization algorithms to train a neural network, here we provide brief information on the following training approaches we used in this work.
1) L-BFGS-B
Limited memory, boundary constraints Broyden-Fletcher–Goldfarb-Shanno algorithm [61] is one of the default optimizers of
2) Adam
Adaptive moment estimation (Adam) [63] takes the advantages in the Momentum [64] and the RMSProp [65] optimization algorithms by monitoring the accumulation of both the gradient and the squared gradient, using \begin{align*} \Delta \theta _{i, t}=&\nabla \mathit {J}\left ({\theta _{i, t}}\right ) \\ \mathit {G}_{i, t}=&\beta _{1} \mathit {G}_{i, t-1} + \left ({1 - \beta _{1}}\right ) \Delta \theta _{i, t} \\ \mathit {E}_{i, t}=&\beta _{2} \mathit {E}_{i, t-1} + \left ({1 - \beta _{2}}\right ) \left ({\Delta \theta _{i, t}}\right )^{2} \\ \theta _{i, t+1}=&\theta _{i, t} - \alpha \frac {\mathit {E}_{i, t}}{\mathit {G}_{i, t} + \epsilon } \Delta \theta _{i, t}\tag{28}\end{align*}
D. Error Metric
Relative \begin{align*} \mathcal {L}_{2}^{error}=&\frac {\left \lVert{ \mathbf {P} - \hat {\mathbf {P}}}\right \rVert _{F}}{\left \lVert{ \mathbf {P}}\right \rVert _{F}} \\=&\frac {\sqrt {\sum _{j=1}^{N_{1} \cdot N_{2}} \left |{\hat {\rho }\left ({x^{(j)}, t^{(j)}}\right ) - \rho \left ({x^{(j)}, t^{(j)}}\right )}\right |^{2}}}{\sqrt {\sum _{j=1}^{N_{1} \cdot N_{2}}\left |{\rho \left ({x^{(j)}, t^{(j)}}\right )}\right |^{2}}}\tag{29}\end{align*}
Case Study I - Insights From Circular Testbed
In this case study, we compare the PIDL reconstruction accuracy between learning the LWR conservation law (hyperbolic) and its parabolic form. The datasets used in this case study are synthetic vehicle density datasets generated on a ring road (represented by Fig. 4). We configure all the neural networks with the same learning architecture (equal numbers of layers, same number of neurons on each layer, etc.). 10000 collocation points are assigned in the density field to compute the physics-cost. The learning rate of Adam is set to 0.001, and the number of training iterations is set to 8000.
A. Dataset
The datasets for this case study simulate vehicular traffic on a ring road. The location
B. Selection of Learning Data Instances
The sampling of available data for learning is an important aspect of machine learning. Data collection and sensing may introduce potential biases, often due to human factors and sensing limitations. These biases can persist in the models that are trained on the data, highlighting the importance of selecting appropriate subsets of the data to minimize these biases. However, in many instances, the selection of training instances is limited by the availability and positions of the sensors. e.g., for traffic sensing, the sensors are either fixed on the roadside at fixed intervals (e.g., loop detectors) or they are moving with the traffic stream (probe vehicles, CVs).
Fig. 6 demonstrates the sampling cases of traffic state data. Numerical PDE solvers such as Lax-Friedrichs’ numerical scheme [66] can be used as a state reconstruction tool; however, it requires the complete information on the initial and boundary conditions as shown in Fig. 6(a) which is not practically feasible. On the other hand, the PIDL approach can utilize any given amount of inputs from the boundaries for training; in Fig. 6(b), 20% of the initial and boundary data are shown as an exemplar training setting. Fig. 6(c) represents the Eulerian traffic data that can be gathered from roadside sensors or loop detectors installed at predetermined locations along the road infrastructure (shown at
Given the sampling choices, we designate two subsets of training data inputs about the vehicle density
Initial condition data are the vehicle density values
as$\rho (x, \; 0)$ . Boundary condition data include vehicle density values at the first location$t = 0, x \in [{0, 1.0}]$ , and at the last location$\rho (0, \; t), \; t \in [{0, 3.0}]$ .$\rho (1.0, \; t), \; t \in [{0, 3.0}]$ Interior data (CV data in the case study)
comes from the CV fleet in the traffic. They can be gathered at any randomly selected location along the vehicle trajectory and reflect the density value$\rho (x, t)$ , given$\rho (x, \; t)$ and$x \in [0, \, 1.0]$ ,.$t \in [0, \, 3.0] $
The initial condition data on vehicle density can be registered through a still image, recorded by devices such as roadside video cameras or drones. The boundary condition data can be obtained from a stationary detector deployed along a freeway at the start location
C. Reconstruction With Initial and Boundary Inputs
We first evaluate the PIDL results based only on training inputs about the initial and boundary conditions. We select four levels of available training inputs (10%, 20%, 50%, and 90% of the total numbers of initial and boundary data), and use both L-BFGS-B and Adam optimizers to reconstruct the hyperbolic LWR PDE and its parabolic variation with the diffusion term. The results are shown in Table 1 (best results are shown in bold). The reconstructed density fields, trained with 10% initial and boundary inputs, are shown in Fig. 7.
Among all training settings, we observe that PIDL models achieved much higher accuracy (lower relative
We also take several snapshots of the reconstruction at time
D. Reconstruction With Initial, Boundary, and Interior Inputs
Subsequently, here we evaluate the reconstruction accuracy under the scenarios in which varying levels of data on the initial and boundary conditions and the interior conditions (CV inputs) are available. We pick two levels of available inputs
Comparing to the results in Table 1, we find the inclusion of CV inputs in the training inputs of PIDL slightly improves the reconstruction accuracy with the hyperbolic PDE. However, the learning performances of PIDL architecture with the parabolic PDE are still far superior. With 20% initial and boundary observations, and 2% CV inputs, the PIDL model with L-BFGS-B optimizer achieves a relative
Case Study II - Insights From Field Data
In this section, we will further shed light on the topic using field data. We examine the limitation with PIDL and compare it with Lax-Friedrichs’ numerical scheme [67] in learning the traffic density using the “Next Generation SIMulation” (NGSIM) dataset [68].
A. NGSIM Dataset
The NGSIM dataset records traffic conditions using video cameras and processes the traffic state variables such as velocity and vehicle density through vehicle trajectories identified in the video recordings [69]. The vehicle density data used, illustrated in Fig. 9, contains vehicle density for 45-minute on a 2060-foot segment of
B. Field Data Reconstruction Using PIDL with Hyperbolic PDE
The density dataset is tabulated with spatiotemporal bins of
The governing physical equation of the PIDL architecture is the hyperbolic LWR conservation law paired with Greenshield’s fundamental diagram. The estimated value of maximum density
The relative
C. Field Data Reconstruction Using PIDL with Parabolic PDE
For NGSIM US-101 data, the diffusion coefficient
Reconstruction with PIDL,
We observe that with the realistic NGSIM dataset, the addition of the diffusion term only slightly improves the accuracy, landing a relative
We also conduct the sensitivity analysis on the diffusion coefficient
D. Reconstruction Using Lax-Friedrichs’ Numerical Scheme
The vehicle density dataset from NGSIM can also be reconstructed by using the Lax-Friedrichs’ differencing method [72] with the complete initial and boundary conditions. The reconstruction is pictured in Fig. 12.
Reconstruction with Lax-Friedrichs’ Numerical Scheme, Relative
Along with a smaller relative
Discussion on Empirical Results
Recent examinations have elucidated the challenges associated with training and drawbacks of certain data representations in PIDL. In several instances, unstable convergence occurs in the gradient-descent-based PIDL training, especially when the underlying PDE solution has high-frequency features [39]. This pathological behavior observed in PIDL training is due to the multi-scale interactions between the cost terms in optimizing the neural network cost [73]. It leads to stiffness in the gradient flow dynamics, ultimately inducing a severe constraint on the learning rate and adding detriment to the stability of the training process. PIDL, which often deploys fully-connected hidden layers, faces the challenge termed “spectral bias” that cannot reasonably assimilate a nonlinear hyperbolic PDE when its solution involves shocks [74].
Potential mitigation directions: The potential approaches to improve the PIDL paradigm include (a) switching the hyperbolic physics with the parabolic counterpart by adding the diffusion term, (b) incorporating more observation or collocation points around the shocks, (c) including more interior training instances (e.g., Lagrangian measurements), (d) modifying the fully-connected learning architecture of the neural network. Next, we discuss these approaches below.
For a one-dimensional hyperbolic PDE with a non-convex flux function, its analytical solution can be depicted by a simple piecewise continuous function, and the stability of its solution can be significantly improved by adding a diffusion term to the inherent PDE. With smoothing around the shock by the diffusion, the neural network can recuperate the actual scale and location of the shock, solves the PDE in its parabolic form, and leads to precise approximation results [75]. However, common practices assume the coefficient
Increased observations or collocation points along the shock trajectories in the training of the neural network forms another approach [77]. However, one challenge in this approach would be identifying the shock location. As we have observed, the PIDL struggles with approximating the vehicle density where localized non-linear discontinuity exists in the data; adding artificial dissipation could improve the learning result of the hyperbolic conservation law [78], [79].
Based on the results in Section V, PIDL with the parabolic variant of LWR PDE cannot overcome the random perturbation in the traffic dataset and accurately estimate the traffic density based on pure observation of the initial and boundary conditions. Our previous work suggests the benefit of including Lagrangian observation in this setting for the task of TSE [34].
Alternatively, recent studies have started tweaking the design of the deep learning architecture in PIDL to circumvent the issue of learning underlying hyperbolic PDEs with discontinuity and no strong solution. The adoption of attention-based recurrent neural networks is introduced to capture the localized shock waves in the nonsmooth solution to governing equations [77]. It mitigated the challenges by substituting the conventional fully connected feedforward architecture in PIDL with recurrent neural networks and attention mechanisms. Additionally, convolutional neural network architectures have also demonstrated efficiency in assimulating data with nonlinear shock features or high-frequency components in the PDE solution [81], [80]. It is pointed out that the optimization of a PIDL neural network in learning a hyperbolic PDE can be a futile process due to the fact that the pointwise residual blows up during approximating the exact solution, and an alternative optimization approach can be found by using the residual relate to the Kružkov entropy condition to replace the pointwise residual [82].
In practice, the task of TSE involves realistic traffic data in which a diffusion term is inherent - drivers will gradually slow down the speed of their vehicles in participation of congestion or when a slowdown is visually perceivable. Therefore when adopting physics-informed deep learning for TSE, this nature of smoothness around shockwaves should be considered as part of the “physics”, which illustrates the underlying relationship between traffic states. Recall the diffusion term in the parabolic form of LWR PDE is weighted by the parameter
Conclusion
In this work, we exhibited the difficulties of training a physics-informed deep learning (PIDL) neural network to reconstruct a certain type of partial differential equation (PDE) - the hyperbolic PDE for which a strong solution cannot be obtained. The non-smooth weak solution to conservation law-based traffic flow models (such as the LWR PDE) causes PIDL failure in capturing the scale and location of the discontinuity. Through the case study, we showcased the stark differences between the learning result using PIDL with the first order hyperbolic LWR PDE and its parabolic counterpart, in which the additional diffusion term secures a strong solution and leads to pinpoint approximation. When the PDE solution contains multi-scale features of high-frequency terms, it oftentimes causes severeness in calculating the gradients in the fully connected learning structure of a PIDL neural network [74], making the optimization process unstable and ultimately leads to inaccurate predictions [39], [83]. We observe that the deep learning neural network fails to approximate the nonlinear relationship of a hyperbolic PDE in areas where shockwaves are present, whereas the diffusion term in the parabolic PDE ensures improved data estimation in these areas and thus ameliorates the reconstruction result. Besides reconstructing using only the initial and boundary conditions, we list the possible sampling choices of traffic flow and use a diverse combination of Eulerian and Lagrangian data to ensure the reliability and validity of the results. Moreover, with the field data from the NGSIM traffic dataset, we further highlight the limitation of PIDL in the presence of shockwaves. Future work includes analysis of the cost evolution (over time) in reconstructing hyperbolic and parabolic PDEs to understand the interaction between the cost terms of the PIDL neural network and interpret its pathological behavior in learning conservation law-based models.
ACKNOWLEDGMENT
The authors wish to thank Dr. Rongye Shi at Columbia University for providing the ring road data for the case study, Dr. Animesh Biswas at the University of Nebraska at Lincoln for the LWR reconstruction of the data, and Dr. Pushkin Kachroo at the University of Nevada Las Vegas for the suggestions and discussions on the topic.