Journals & Magazines >IEEE Open Journal of Intellig... >Volume: 4

On the Limitations of Physics-Informed Deep Learning: Illustrations Using First-Order Hyperbolic Conservation Law-Based Traffic Flow Models

Abstract:

Since its introduction in 2017, physics-informed deep learning (PIDL) has garnered growing popularity in understanding the systems governed by physical laws in terms of p...Show More

Metadata

Abstract:

Since its introduction in 2017, physics-informed deep learning (PIDL) has garnered growing popularity in understanding the systems governed by physical laws in terms of partial differential equations (PDEs). However, empirical evidence points to the limitations of PIDL for learning certain types of PDEs. In this paper, we (a) present the challenges in training PIDL architecture, (b) contrast the performance of PIDL architecture in learning a first order scalar hyperbolic conservation law and its parabolic counterpart, (c) investigate the effect of training data sampling, which corresponds to various sensing scenarios in traffic networks, (d) comment on the implications of PIDL limitations for traffic flow estimation and prediction in practice. Case studies present the contrast in PIDL results between learning the traffic flow model (LWR PDE) and its diffusive variation. The outcome indicates that PIDL experiences significant challenges in learning the hyperbolic LWR equation due to the non-smoothness of its solution. Conversely, the architecture with parabolic PDE, augmented with the diffusion term, leads to the successful reassembly of the density data even with the shockwaves present. The paper concludes by providing a discussion on recent assessments of reasons behind the challenge PIDL encounters with hyperbolic PDEs and the corresponding mitigation strategies.

Published in: IEEE Open Journal of Intelligent Transportation Systems ( Volume: 4)

Page(s): 279 - 293

Date of Publication: 19 April 2023

Electronic ISSN: 2687-7813

DOI: 10.1109/OJITS.2023.3268026

Funding Agency:

References is not available for this document.

Contents

CCBY - IEEE is not the copyright holder of this material. Please follow the instructions via https://creativecommons.org/licenses/by/4.0/ to obtain full-text articles and stipulations in the API documentation.

SECTION I.

Introduction and Motivation

Traffic states such as vehicle velocity $v$ , density $\rho $ , and flow $f$ depict the condition of traffic operations on road infrastructures. An important task in transportation management is accurately capturing the current traffic state conditions to implement control measures such as ramp metering and tidal lanes during rush hours [1], [2]. The practice of obtaining traffic state measurements is bounded by several constraints [3], [4]. Firstly, sensing traffic state requires the installation of equipment along the roadway to detect the presence of vehicles. Costs of installation, calibration, and maintenance occur and limit this sensing application to a few selected locations in a complex road network. Secondly, inaccuracy and measurement noise is inherent in the process of acquiring traffic states through sensing devices. Considering the reliability of observed traffic data and handling missing values are common requirements to process traffic state measurement [5].

The observation data of traffic states can be harvested through another channel — connected vehicles (CV) [6]. In this approach, traffic conditions are perceived via the ubiquitous onboard sensors in connected vehicles zooming through traffic. CV has the potential of vastly broadening the detection area of traffic state on roadways [7] and enabling the collection of other traffic statistics such as link travel time [8].

However, this application scenario, in reality, is also restricted in several ways. Traffic states obtained through CV need to be broadcast and stored by roadside units or edge devices in a computing network [9]. The current penetration rates of both CV and the compatible communication system are insufficient to make it a reliable practice for procuring data on traffic states. Besides, the communication between CV and data infrastructure generates elevated demand for network resources in pursuit of robust on-site computing capability, and low-latency data exchange [10].

The estimation of traffic states, therefore, becomes an important practice for traffic planners and policy practitioners [11]. It pertains to the inference of traffic state using data that is partially available, secured either through sensing devices or CV. A variety of traffic operations rely on this crucial assessment of traffic states [12], [13]. Tasks as minute as determining the length of traffic cycles and as substantial as adding additional lanes on a highway all rely on the outcome of traffic state estimation (TSE). A variety of TSE methods exist in the literature and can be categorized into three groups: model-based, data-based, and streaming-data-based [11]. Model-based approaches [14], [15], [16], [17] rely on a calibrated traffic flow model to estimate traffic states in unobserved areas. As a case in point, an approach relying on the fundamental relationship between traffic speed and density, and paired with a threshold error correction model is designed for short-term traffic state prediction [18]. Data-based approaches [19], [20], [21], [22], [23] utilize the insights derived from historical data and apply statistical or machine learning techniques to reconstruct and predict traffic states. For instance, the method of attention-driven recurrent imputation is proposed to estimate the missing vehicle velocity data [24]. The third category of streaming-data-based approaches [25], [26] contains methods that do not necessitate a priori knowledge of historical observation, instead take advantage of real-time data, with a weak assumption of traffic relationship acquired from empirical evidence.

The restraints on obtaining traffic state data call for the adoption of TSE models that can exploit a limited amount of observed data for training and produce a precise estimation of traffic states. Among the available approaches, deep learning (DL) neural network is a powerful machine learning method increasingly used in many TSE applications [27], [28], [29]. However, DL neural network also comes with shortcomings, such as the high requirement of training data and computing power, over-fitting, and transferability issues, limiting its appeal for time-critical applications, which calls for the role of physics in aiding the training process of a neural network in TSE [30], [31].

Physics-informed deep learning (PIDL), also referred to as physics-informed neural network (PINN), arms a neural network with the governing equations of a physical system [32]. It empowers the deep learning neural network with knowledge of the underlying relationship in observed data to efficiently use the limited data input for estimation and prediction tasks [33]. Since the introduction of its architecture [33], PIDL has been adopted in the field of traffic state estimation (TSE) [34], [35]. Researchers have experimented with PIDL to estimate both the traffic state and the fundamental diagram depicting the relationship between traffic states [28]. Second-order traffic models have also been considered in the application of PIDL for TSE [36].

A diverse range of mechanical engineering and computational science applications have also been proposed, signifying its advantage in utilizing the governing equation to accurately capture the physical system. Among its diverse implementations in the mathematical domain, the PIDL approach has been adopted for solving the free boundary problems [37], high-dimensional PDE [38], uncertainty quantification [39], and time-dependent stochastic PDE [40]. In the field of fluid dynamics, PIDL is employed to model the velocity and pressure fields [41], vortex-induced vibration [42], and fluid flows without the use of simulation data [43]. And on the engineering side, the modeling of cardiovascular flow [44], nano-optics [45], and proxy modeling in solid mechanics [46] have all witnessed the effectiveness of the PIDL approach.

Variants of PIDL have been proposed to learn the solution of partial differential equations (PDEs). For instance, Galerkin method-based hp-VPINN was introduced to solve PDEs with non-smooth solutions [47]. A Bayesian approach to PINN is presented for forward and inverse problems [48], and the idea of physics-informed adversarial training to solve PDE is proposed [49]. Particle swarm optimization is also put forward to PIDL training [50].

Amid the PIDL applications that have been demonstrating encouraging results, the focal point is to use the neural network to learn the solutions of deterministic PDE commanding a physical system. As the PIDL model ciphers the underlying relationship between state variables, it incorporates the governing equation as a priori knowledge into calculating cost. If the PDE of interest is smooth and has a strong solution, at the same time, is paired with an adequate number of collocation points where the physics cost is optimized, then PIDL accordingly is capable of achieving good accuracy in learning the solution to the PDE. However, recent research points to the limitations of PIDL for learning certain types of PDEs, such as hyperbolic conservation laws.

Scalar Conservation Laws in Traffic Flow Theory: Lighthill-Whitham-Richards (LWR) is a one-dimensional scalar conservation law and a commonly-used transportation model. No strong solution exists to LWR, given it is a hyperbolic PDE [1]. Certain classes of partial differential equations, including the LWR model, can be solved by the method of characteristics (details in Section II). In our previous work [34], [35], equipped with a realistic choice of data sampling on the interior traffic data (either Lagrangian data that come from connected vehicles (CV) or Eulerian data from roadside sensors and loop detectors), we have shown that PIDL can successfully and accurately learn the onerous task of reconstructing LWR PDE. However, no empirical evidence exists that PIDL can learn an LWR PDE with acceptable accuracy given only boundary and initial condition data, which is the default experimental setup for a variety of reconstruction problems in transportation and other domains [28], [32], [33], [36].

Research Goals: The overarching goal of this research is to understand the limitations of PIDL in traffic state estimation applications and initiate a dialogue on possible remedies in light of current literature. We aim to achieve the goal by addressing the following research questions:

Which form of the LWR model, hyperbolic PDE or parabolic PDE, is learned better by the PIDL algorithm?
What are the factors behind the disparity in learning results?
What is the effect of a diffusion term, which transforms the hyperbolic LWR PDE to a parabolic variation, in alleviating the weakness of PIDL?
What are the inherent architectural issues in PIDL that inhibit it from working with hyperbolic conservation law?

The research is divided into the following tasks: (a) exhibiting the contradistinction in terms of reconstruction accuracy between learning a commonly-used conservation law (LWR) in traffic flow theory, which is a first order hyperbolic PDE, and its second order parabolic counterpart by using a synthetic ring road density dataset, (b) further illustrating the limitations of PIDL in TSE with only the initial and boundary observations from the realistic field data (NGSIM), (c) understanding the impact of discontinuity in PDE solution, and the effect of a diffusion term on PIDL learning, (d) discussing the findings from the experimental results in light of existing literature and putting forward suggestions to improve PIDL results in TSE with the LWR PDE.

Key features and contributions:

We design a series of experiments by creating a circular testbed and by using realistic field (NGSIM) data to understand the performance of PIDL in TSE while incorporating the hyperbolic and parabolic versions of the physics.
We showcase the effect of the additional second order diffusion term in the LWR PDE on PIDL learning using only the initial and boundary observations from a ring road testbed.
We further investigate the contrast between PIDL learning the hyperbolic LWR PDE and its parabolic variant using a mix of Eulerian and Lagrangian training inputs (non-boundary data).
Using the realistic NGSIM field data, we shed light on the limitations of PIDL in TSE and demonstrate that, on occasions, a numerical PDE solver outperforms PIDL.
Finally, we discuss the reasoning behind the difficulties encountered when training a PIDL neural network with the LWR conservation law and list suggestions for improving the performance.

Note: The code accompanying this manuscript can be found in the GitHub repository at github.com/Urbanity-Lab/PIDL-Limitations

Outline: The rest of the manuscript is organized as follows: Section II gives the mathematical background of the scalar conservation laws, including the LWR model, strong and weak solutions, and method of characteristics. Section III introduces the preliminaries of physics-informed deep learning (PIDL) for learning the conservation law-based traffic flow models. Section IV sets up a case study to reveal the challenges of PIDL with the first order hyperbolic LWR PDE. Section V further illustrates the limitations of PIDL with field data from the NGSIM dataset. Section VI discusses the implications from results and Section VII concludes the paper with suggestions on future work.

SECTION II.

Scalar Conservation Laws

In one dimension, a general form of the scalar conservation law is described by (1) where $f(u)$ is the flux function depending on the location $x$ and time $t$ . The conserved variable is $u = u(x, \; t)$ . Equation (1) becomes a Cauchy problem when the initial condition $u(x, \; 0) = u_{0}(x)$ is provided [51].\begin{equation*} \partial _{x} f(u) + \partial _{t} u = 0, \;\;\; x \in \mathbb {R}, \; t \geq 0\tag{1}\end{equation*} View Source

The flux function $f(u)$ can take many forms, the linear advection equation is a basic example of the scalar conservation law where the flux function $f(u)$ takes the form of $f(u) = \lambda u$ with a constant $\lambda $ , shown in (2).\begin{equation*} \lambda \partial _{x} u + \partial _{t} u = 0, \;\;\; x \in \mathbb {R}, \; t \geq 0\tag{2}\end{equation*} View Source

An initial value problem for the linear advection equation is given by (3) and its solution is exhibited in (4).\begin{align*} \partial _{x} f(u) + \partial _{t} u=&0, \;\;\; x \in \mathbb {R}, \; t \geq 0 \\ u\left ({x, \; 0}\right ) = u_{0}(x)=&f\left ({x_{0}}\right ), \;\;\; x \in \mathbb {R}\tag{3}\\ u\left ({x, \; t}\right )=&u_{0}\left ({x - \lambda t}\right ), \;\;\; t \geq 0\tag{4}\end{align*} View Source

A simple nonlinear particle differential equation is the Burgers’ equation, and it is one of the commonly used models as the scalar conservation law. The classical form of Burgers’ equation is presented in (5) where the $\epsilon \partial _{xx} u$ is the viscosity term.\begin{equation*} u \partial _{x} u + \partial _{t} u - \epsilon \partial _{xx} u = 0, \;\;\; x \in \mathbb {R}, \; t \geq 0\tag{5}\end{equation*} View Source

When $\epsilon = 0$ , (5) becomes the inviscid Burgers’ equation, and the flux function takes the form of $f(u) = u^{2} / 2$ , shown in (6). Plugging $\lambda = u$ into (4) we get the solution of the inviscid Burgers’ equation in (7).\begin{align*} \partial _{x} \left ({u^{2} / 2}\right ) + \partial _{t} u=&u \partial _{x} u + \partial _{t} u = 0, \;\;\; x \in \mathbb {R}, \; t \geq 0\tag{6}\\ u\left ({x, \; t}\right )=&u_{0}\left ({x - ut}\right ), \;\;\; t \geq 0\tag{7}\end{align*} View Source

The viscosity term is a diffusion term that flattens discontinuities and ensures a smooth solution.

A. Physics of Traffic Flow

Macroscopically, traffic flow is expressed using variables such as the mean velocity $v(x, \; t)$ , a function of location $x$ and time $t$ , traffic flow $q(x, \; t)$ , and vehicle density $\rho (x, \; t)$ . Spacing $s$ is the headway distance, which is the inverse of density $\rho $ . Traffic flow is considered a continuum flow with a density profile associated with a compressible liquid [52]. The traffic flow velocity is related to the density profile, time, and location.

LWR conservation law is a continuity equation, which holds for all macroscopic models and formulates the conservation of traffic flow. This equation relates the change of density with the gradient of flow. When the flow is considered as a static function (portrayed by a fundamental diagram), it leads to a first order continuity equation, also referred to as Lighthill-Whitham-Richards (LWR) model [53] - which is a hyperbolic partial differential equation (PDE). Given a location $x$ between the start point $x_{0}$ and endpoint $x_{n}$ , a timestamp $t$ between the initial time $t_{0}$ and the end time $t_{m}$ , cumulative flow $N(x, \; t)$ depicts the number of vehicles that have reached location $x$ at time $t$ . The flow $q(x, \; t)$ is the partial differential of $N(x, \; t)$ with respect to time $t$ . Likewise, density $\rho (x, \; t)$ is the partial differential of $N(x, \; t)$ with respect to location $x$ .

The formulation of the LWR conservation law in the Eulerian coordinate system is shown in (8). It is worth noting that data provided by connected vehicles (CV), such as vehicle velocity $v(n, \; t)$ and spacing $s(n, \; t)$ with respect to the vehicle identifier $n$ and time $t$ , would be in Lagrangian coordinates [54]. The Lagrangian formulation of the conservation law is shown in (9).

For $(x, \; t) \in [ \, x_{0}, \; x_{n} ] \, \times [ \, t_{0}, \; t_{m} ], \;\;\;$ \begin{equation*} \frac {\partial q\left ({x, \; t}\right )}{\partial x} + \frac {\partial \rho \left ({x, \; t}\right )}{\partial t} = 0\tag{8}\end{equation*} View Source

For $(n, \; t) \in [ \, n_{0}, \; n_{i} ] \, \times [ \, t_{0}, \; t_{m} ], \;\;\;$ \begin{equation*} \frac {\partial v\left ({n, \; t}\right )}{\partial n} + \frac {\partial s\left ({n, \; t}\right )}{\partial t} = 0\tag{9}\end{equation*} View Source

B. Strong and Weak Solutions

Given the initial value problem (10), \begin{align*} \partial _{x} f(u) + \partial _{t} u=&0, \;\;\; x \in \mathbb {R}, \; t \geq 0 \\ u\left ({x, \; 0}\right )=&u_{0}(x), \;\;\; x \in \mathbb {R}\tag{10}\end{align*} View Source

If $u_{0}(x) \in C^{1}(\mathbb {R})$ , then the initial condition $u_{0}(x)$ is continuously differentiable and (10) becomes (11) by virtue of the chain rule.\begin{align*} f^{\prime } (u) \partial _{x} u + \partial _{t} u=&0, \;\;\; x \in \mathbb {R}, \; t \geq 0 \\ u\left ({x, \; 0}\right )=&u_{0}(x), \;\;\; x \in \mathbb {R}\tag{11}\end{align*} View Source

In domain $\Omega \in \mathbb {R}$ , a solution to (11) is a strong solution (also referred to as “classical solution”) if it satisfies (10), and is continuously differentiable on the domain $\Omega $ .

When no strong solution to (10) exists, the smoothness requirement can be relaxed to find weak solutions, even if these solutions are not differentiable or even continuous. Weak solutions eliminate the derivative terms of $u$ and $f(u)$ to ease the smoothness requirement.

Multiplying the scalar conservation law (1) with a function $\psi : \mathbb {R} \times \mathbb {R}^{+} \to \mathbb {R}$ , and given the initial condition $u(x, \; 0) = u_{0}(x)$ , we have (12).\begin{align*}&\int _{0}^{\infty } \int _{-\infty }^{\infty } \left ({\partial _{x} f(u) + \partial _{t} u}\right )\psi dxdt \\&\;=\int _{0}^{\infty } \int _{-\infty }^{\infty } \left ({f(u)\psi _{x} + u \psi _{t}}\right )dxdt \\&\;\quad {}+ \int _{-\infty }^{\infty } u\left ({x, \; 0}\right )\psi (x)dx = 0\tag{12}\end{align*} View Source

Notice in (12), the requirement on smoothness is lessened as there are no derivative terms of $u$ and $f$ . If (12) satisfies all $\psi (x)$ , then $u(x, t)$ is the weak solution of (10).

It is worth noting that there is no strong solution to the LWR conservation law. Nevertheless, a diffusive term can be added to avoid breakdown and ensure a strong solution by making the hyperbolic conservation equation become a parabolic PDE. We will further discuss this in Section VI.

C. Method of Characteristics

The method of characteristics is used to solve quasilinear partial differential equations, converting the PDEs into ordinary differential equations (ODEs). Consider (1) and its solution $f(u) = u(x, \; t)$ , let $x = x(t)$ solve the ODE in (13):\begin{equation*} \dot {x}(t) = u\left ({x(t), \; t}\right )\tag{13}\end{equation*} View Source

From (13) observe that, \begin{equation*} \frac {d}{dt}u\left ({x(t), \; t}\right ) = \frac {dx}{dt}u_{x} + u_{t}\tag{14}\end{equation*} View Source

Combining (1) and (14), we reach (15) that can propagate the solution $u(x(t), \; t)$ with the initial condition $u_{0}(x)$ .\begin{equation*} \frac {du}{dt} = 0, \; \frac {dx}{dt} = u\tag{15}\end{equation*} View Source

A simple discontinuous solution of the conservation law (10) is given by (16).\begin{align*} u\left ({x, \; t}\right ) = \begin{cases} u_{L}, & x < \lambda t, \\ u_{R}, & x \geq \lambda t, \end{cases}\tag{16}\end{align*} View Source

If $u_{L} \neq u_{R}$ , (16) is termed a shock wave. With the shock speed $\lambda $ , it connects $u_{L}$ to the value of $u_{R}$ . Consider the following scalar Riemann problem (17) where $\rho _{L} < \rho _{R}$ :\begin{align*}&v_{f} \left ({1 - \frac {\rho \left ({x, \; t}\right )}{\rho _{m}}}\right ) \frac {\partial \rho \left ({x, \; t}\right )}{\partial x} + \frac {\partial \rho \left ({x, \; t}\right )}{\partial t} = 0 \\&\; \rho \left ({x, \; 0}\right ) = \begin{cases} \rho _{L}, & x < 0, \\ \rho _{R}, & x \geq 0, \end{cases}\tag{17}\end{align*} View Source

The characteristic speed at $t = 0$ is given in (18). As $\rho _{L} < \rho _{R}$ , the characteristic speed $\lambda (\rho _{L})$ on the left is greater than the right $\lambda (\rho _{R})$ and develops a shock curve, shown in Fig. 1.\begin{equation*} \lambda (\rho ) = f^{\prime }(\rho ) = v_{f} \left ({1 - 2 \frac {\rho }{\rho _{m}}}\right )\tag{18}\end{equation*} View Source

FIGURE 1.

Shockwave solution.

Show All

If we modify the problem in (17) with the initial condition that $\rho _{L} > \rho _{R}$ , the value of the characteristic speed on the left $\lambda (\rho _{L})$ will be smaller than the value of speed on the right $\lambda (\rho _{L})$ . One of the possible solutions to (17) with $\rho _{L} > \rho _{R}$ is the symmetry rarefaction solution, which is stable to perturbation to the initial data. The solution is given in (19), and shown in Fig. 2.\begin{align*} \rho \left ({x, \; t}\right ) = \begin{cases} \rho _{L}, & \frac {x}{t} < \lambda \left ({\rho _{L}}\right ), \\ \frac {\lambda \left ({\rho _{R}}\right ) - \lambda \left ({\rho _{L}}\right )}{\rho _{R} - \rho _{L}} \frac {x}{t}, & \lambda \left ({\rho _{L}}\right ) \leq \frac {x}{t} < \lambda \left ({\rho _{R}}\right ), \\ \rho _{R}, & \lambda \left ({\rho _{R}}\right ) \leq \frac {x}{t} \\ \end{cases}\tag{19}\end{align*} View Source

FIGURE 2.

Rarefaction solution.

Show All

SECTION III.

Physics-Informed Deep Learning

Development in deep learning (DL) neural networks has made it a suitable tool in the computational modeling of a physical system, which is often governed by complex non-linear functions [55]. When mean square error (MSE) is used as the measurement of cost in a DL neural network, the cost function can be formulated as (20), in which $N$ is the number of outputs, and $\hat {\mathbf {u}}(x, ~y, ~z, ~t)$ is the prediction of the variable ${\mathbf {u}}(x, ~y, ~z, ~t)$ .\begin{align*} J=&MSE_{\left ({\hat {\mathbf {u}}\left ({x, ~y, ~z, ~t}\right ), \; \mathbf {u}\left ({x, ~y, ~z, ~t}\right )}\right )} \\=&\frac {1}{N} \sum _{k=1}^{N}\left |{\mathbf {u}\left ({x, ~y, ~z, ~t}\right ) - \hat {\mathbf {u}}\left ({x, ~y, ~z, ~t}\right )}\right |^{2}\tag{20}\end{align*} View Source

Automatic differentiation (AD) computes a state variable’s partial derivatives with respect to its spatial and temporal independent variables [56]. Through the layers of a neural network, an output $\mathbf {u}(x, ~y, ~z, ~t)$ can be presented as a nested function of input variables $x, y, z$ , and $t$ . By applying the chain rule, automatic differentiation yields accurate derivatives of the training cost with respect to the parameters of the neural network [57].

To evaluate the outputs from the neural network in terms of compliance with the governing physical laws, the physics cost is computed at a set of spatiotemporal points $(x, ~y, ~z, ~t)$ , termed as collocation points, which can be chosen by Latin hypercube sampling (LHS) [58]. To differentiate the deep learning cost (DL-cost) in (20) and the physics-cost, we use $J_{DL}$ to depict the DL-cost, and $J_{PHY}$ as the penalty of incompliance to the physics. $J_{DL}$ and $J_{PHY}$ are computed individually in (21). $N_{u}$ symbolizes the number of training samples, and $N_{f}$ is the number of collocation points.\begin{align*} \begin{cases} J_{DL} = \frac {1}{N_{u}} \sum _{k=1}^{N_{u}}\left |{\mathbf {u}\left ({x, ~y, ~z, ~t}\right ) - \hat {\mathbf {u}}\left ({x, ~y, ~z, ~t}\right )}\right |^{2} \\ J_{PHY} = \frac {1}{N_{f}} \sum _{k=1}^{N_{f}}\left |{\mathbf {f}\left ({x, ~y, ~z, ~t}\right ) }\right |^{2} \end{cases}\tag{21}\end{align*} View Source

In (21), cost function $\mathbf {f}(x, ~y, ~z, ~t)$ is configured to quantify the non-compliance of the physics. As in the exemplary Fig. 3, which outlines the architecture of a PIDL neural network, the cost function of $J_{PHY}$ is given as $\mathbf {f} = \lambda _{1}u + \lambda _{2}u_{x} + \lambda _{3}u_{xy} + \lambda _{4}u_{zz} + \lambda _{5}u_{t}$ . The cost terms $u_{x}, u_{xy}, u_{zz}$ , and $u_{t}$ are partial derivatives of the output $\hat {\mathbf {u}}(x, ~y, ~z, ~t)$ with respect to a few combinations of input coordinates $(x, ~y, ~z, ~t)$ . The $\lambda $ parameters represent the weights of cost terms.

FIGURE 3.

Architecture of a physics-informed deep learning (PIDL) neural network.

Show All

A. PIDL for Traffic State Estimation

This section provides details of the PIDL approach for traffic state estimation (TSE) by using LWR PDE and Greenshield’s fundamental diagram. Other physical models, such as second order flow models or discretized first order models can be obtained by following the steps laid out in this section. PIDL empowers a DL neural network with the system’s governing physical laws as priori knowledge [32]. The fundamental diagram of traffic flow and the conservation law serve as meaningful know-how in training a neural network to recognize the underlying relationship between traffic variables.

Several fundamental diagrams exist and are used accordingly depending on the situation. The commonly-used ones are: piece-wise affine speed-density relationship [15], triangular fundamental diagram [59], etc. Greenshield’s fundamental diagram [60], is one of the most utilized and simplest models in traffic flow theory. It makes an untangling assumption that the mean velocity has a linear relationship with the density. The relationship between traffic variables is described in (22), where $\rho _{m}$ is the maximum density, $s_{m}$ is the minimum spacing, and $v_{f}$ is the free-flow speed.\begin{align*} \begin{cases} q\left ({x, \; t}\right ) = \rho \left ({x, \; t}\right ) \; v_{f} \left ({1 - \frac {\rho \left ({x, \; t}\right )}{\rho _{m}}}\right ) \\ v\left ({x, \; t}\right ) = v_{f} \left ({1 - \frac {\rho \left ({x, \; t}\right )}{\rho _{m}} }\right ) \end{cases}\tag{22}\end{align*} View Source

Plugging the relationship between variables $\rho $ , $v$ , and $q$ from (22) into the Eulerian formulation of conservation law in (8), the physical law can be written as (23) and (24).

For $(x, \; t) \in [ \, x_{0}, \; x_{n} ] \, \times [ \, t_{0}, \; t_{m} ], \;\;\;$ \begin{equation*} \rho _{m} \left ({1 - \frac {2v\left ({x, \; t}\right )}{v_{f}}}\right ) \frac {\partial v\left ({x, \; t}\right )}{\partial x} - \frac {\rho _{m}}{v_{f}} \frac {\partial v\left ({x, \; t}\right )}{\partial t} = 0\tag{23}\end{equation*} View Source

For $(x, \; t) \in [ \, x_{0}, \; x_{n} ] \, \times [ \, t_{0}, \; t_{m} ], \;\;\;$ \begin{equation*} v_{f} \left ({1 - \frac {2\rho \left ({x, \; t}\right )}{\rho _{m}}}\right ) \frac {\partial \rho \left ({x, \; t}\right )}{\partial x} + \frac {\partial \rho \left ({x, \; t}\right )}{\partial t} = 0\tag{24}\end{equation*} View Source

Observe that both the equations provide the same physical law - the only difference is their dependent variable. Equation (23) formulates the law in terms of velocity $v(x, \; t)$ , whereas (24) formulates it in terms of density $\rho (x, \; t)$ .

Both (23) and (24) are hyperbolic PDEs. A second order diffusive term can be added to make the PDE become parabolic and secure a strong solution. For example, (24) will become (25) where $\epsilon $ is a constant of a small value.

The second order diffusion term ensures the solution of PDE is continuous and differentiable, avoiding the breakdown and discontinuity in the solution to the PDE.

B. Training Data and Cost Functions

The cost function of a PIDL neural network reconstructing a density-field $\rho (x \; t)$ is written as (26), where $J_{DL}$ is computed at observation points $\mathcal {O} = \{(x_{o}^{j}, t_{o}^{j}) | j = 1, 2,\ldots , N_{o}\}$ , and $J_{PHY}$ is obtained at the collocation points $\mathcal {C} = \{(x_{c}^{j}, t_{c}^{j}) | j = 1, 2,\ldots , N_{c}\}$ .\begin{align*} \begin{cases} J_{DL} = \frac {1}{N_{o}}\sum _{j=1}^{N_{o}}\left |{\rho \left ({x_{o}^{j}, t_{o}^{j}}\right ) - \hat {\rho }\left ({x_{o}^{j}, t_{o}^{j}}\right )}\right |^{2} \\ J_{PHY} = \frac {1}{N_{c}}\sum _{j=1}^{N_{c}}\Big |v_{f} \left ({1 - \frac {2\hat {\rho }\left ({x_{c}^{j}, t_{c}^{j}}\right )}{\rho _{m}}}\right )\frac {\partial \hat {\rho }\left ({x_{c}^{j}, t_{c}^{j}}\right )}{\partial x} \\ + \frac {\partial \hat {\rho }\left ({x_{c}^{j}, t_{c}^{j}}\right )}{\partial t} \Big |^{2} \end{cases}\tag{26}\end{align*} View Source

Weights can be assigned to the cost terms of $J_{DL}$ and $J_{PHY}$ in constituting the cost function of PIDL, as composed in (27).\begin{equation*} J = \mu _{1} * J_{DL} + \mu _{2} * J_{PHY}\tag{27}\end{equation*} View Source where weight parameters $\mu _{1}$ and $\mu _{2}$ adjust the scales of the DL-cost and the physics-cost.

C. Training Approaches

There are many optimization algorithms to train a neural network, here we provide brief information on the following training approaches we used in this work.

1) L-BFGS-B

Limited memory, boundary constraints Broyden-Fletcher–Goldfarb-Shanno algorithm [61] is one of the default optimizers of scipy.optimize.minimize in the scientific computing library SciPy [62]. Under the default setting, the optimization process terminates when the difference of cost $ftol$ between iterations is less than 2.22e-16.

2) Adam

Adaptive moment estimation (Adam) [63] takes the advantages in the Momentum [64] and the RMSProp [65] optimization algorithms by monitoring the accumulation of both the gradient and the squared gradient, using $\beta _{1}$ and $\beta _{2}$ as the decay terms shown in (28).\begin{align*} \Delta \theta _{i, t}=&\nabla \mathit {J}\left ({\theta _{i, t}}\right ) \\ \mathit {G}_{i, t}=&\beta _{1} \mathit {G}_{i, t-1} + \left ({1 - \beta _{1}}\right ) \Delta \theta _{i, t} \\ \mathit {E}_{i, t}=&\beta _{2} \mathit {E}_{i, t-1} + \left ({1 - \beta _{2}}\right ) \left ({\Delta \theta _{i, t}}\right )^{2} \\ \theta _{i, t+1}=&\theta _{i, t} - \alpha \frac {\mathit {E}_{i, t}}{\mathit {G}_{i, t} + \epsilon } \Delta \theta _{i, t}\tag{28}\end{align*} View Source

D. Error Metric

Relative $\mathcal {L}_{2}$ Error - To evaluate the inaccuracy of a trained neural network, we establish the relative $\mathcal {L}_{2}$ error term, which is defined in (29), using vehicle density $\rho (x, \; t)$ as the exemplary dependent variable.\begin{align*} \mathcal {L}_{2}^{error}=&\frac {\left \lVert{ \mathbf {P} - \hat {\mathbf {P}}}\right \rVert _{F}}{\left \lVert{ \mathbf {P}}\right \rVert _{F}} \\=&\frac {\sqrt {\sum _{j=1}^{N_{1} \cdot N_{2}} \left |{\hat {\rho }\left ({x^{(j)}, t^{(j)}}\right ) - \rho \left ({x^{(j)}, t^{(j)}}\right )}\right |^{2}}}{\sqrt {\sum _{j=1}^{N_{1} \cdot N_{2}}\left |{\rho \left ({x^{(j)}, t^{(j)}}\right )}\right |^{2}}}\tag{29}\end{align*} View Source where $\mathbf {P}$ is the matrix form of $\rho (x, \; t)$ , and $\hat {\mathbf {P}}$ is the neural network’s estimation of $\mathbf {P}$ . After discretizing the density field $\rho (x, \; t)$ in time and space, $N_{1}$ is the number of temporal bins and $N_{2}$ is the number of spatial bins.

SECTION IV.

Case Study I - Insights From Circular Testbed

In this case study, we compare the PIDL reconstruction accuracy between learning the LWR conservation law (hyperbolic) and its parabolic form. The datasets used in this case study are synthetic vehicle density datasets generated on a ring road (represented by Fig. 4). We configure all the neural networks with the same learning architecture (equal numbers of layers, same number of neurons on each layer, etc.). 10000 collocation points are assigned in the density field to compute the physics-cost. The learning rate of Adam is set to 0.001, and the number of training iterations is set to 8000.

FIGURE 4.

Ring road configuration of the datasets.

Show All

A. Dataset

The datasets for this case study simulate vehicular traffic on a ring road. The location $x$ and time $t$ are normalized as $x, \; t \in [0, \, 1.0] \, \times [0, \, 3.0]$ . The road is evenly divided into 240 spatial units with $\Delta x = 1 / 240$ , and time is similarly separated into 960 units and each timestep represents the progression of $\Delta t = 1 / 320$ . The traffic state variable of interest in this case study is the vehicle density $\rho (x, \; t)$ . The assumed physical model is LWR conservation law, paired with Greenshield’s fundamental diagram (FD) formulated in (22). The values of the free flow speed $v_{f}$ and the maximum density $\rho _{m}$ are normalized as well as both are set to 1. We first generate the dataset shown in Fig. 5(a), by using the LWR conservation law (hyperbolic PDE) in (24). Based on the same initial and boundary conditions, and only adding a diffusion term to the LWR PDE to make it parabolic, as explained in (25), we are able to configure the dataset illustrated in Fig. 5(b) by the parabolic form of LWR PDE. The relative mean squared value of the difference between these two datasets in Fig. 5 is 0.35%, indicating almost identical density values of the datasets.

FIGURE 5.

Ring road vehicle density experimental data.

Show All

B. Selection of Learning Data Instances

The sampling of available data for learning is an important aspect of machine learning. Data collection and sensing may introduce potential biases, often due to human factors and sensing limitations. These biases can persist in the models that are trained on the data, highlighting the importance of selecting appropriate subsets of the data to minimize these biases. However, in many instances, the selection of training instances is limited by the availability and positions of the sensors. e.g., for traffic sensing, the sensors are either fixed on the roadside at fixed intervals (e.g., loop detectors) or they are moving with the traffic stream (probe vehicles, CVs).

Fig. 6 demonstrates the sampling cases of traffic state data. Numerical PDE solvers such as Lax-Friedrichs’ numerical scheme [66] can be used as a state reconstruction tool; however, it requires the complete information on the initial and boundary conditions as shown in Fig. 6(a) which is not practically feasible. On the other hand, the PIDL approach can utilize any given amount of inputs from the boundaries for training; in Fig. 6(b), 20% of the initial and boundary data are shown as an exemplar training setting. Fig. 6(c) represents the Eulerian traffic data that can be gathered from roadside sensors or loop detectors installed at predetermined locations along the road infrastructure (shown at $x = 0, 0.25, 0.5, 0.75, 1.0$ in this instance). Finally, Fig. 6(d) exhibits the Lagrangian traffic data that can be collected by connected vehicles, which can muster traffic state information at various locations along the vehicle trajectories. Notice the gaps between the Eulerian data in Fig. 6(c); these represent the occasional sensor failures or malfunctions, resulting in data loss in this scenario.

FIGURE 6.

Selection of learning data instances.

Show All

Given the sampling choices, we designate two subsets of training data inputs about the vehicle density $\rho (x, t)$ : (1) initial and boundary condition data inputs and (2) interior data inputs. The interior data of the density field can be collected by either roadside detectors (Eulerian) or connected vehicles (Lagrangian). In the case study, we select the Lagrangian data from CV as the interior data.

Initial condition data are the vehicle density values $\rho (x, \; 0)$ as $t = 0, x \in [{0, 1.0}]$ . Boundary condition data include vehicle density values at the first location $\rho (0, \; t), \; t \in [{0, 3.0}]$ , and at the last location $\rho (1.0, \; t), \; t \in [{0, 3.0}]$ .
Interior data (CV data in the case study) $\rho (x, t)$ comes from the CV fleet in the traffic. They can be gathered at any randomly selected location along the vehicle trajectory and reflect the density value $\rho (x, \; t)$ , given $x \in [0, \, 1.0]$ and $t \in [0, \, 3.0] $ ,.

The initial condition data on vehicle density can be registered through a still image, recorded by devices such as roadside video cameras or drones. The boundary condition data can be obtained from a stationary detector deployed along a freeway at the start location $x = 0$ , and the end location $x = 1.0$ (normalized).

C. Reconstruction With Initial and Boundary Inputs

We first evaluate the PIDL results based only on training inputs about the initial and boundary conditions. We select four levels of available training inputs (10%, 20%, 50%, and 90% of the total numbers of initial and boundary data), and use both L-BFGS-B and Adam optimizers to reconstruct the hyperbolic LWR PDE and its parabolic variation with the diffusion term. The results are shown in Table 1 (best results are shown in bold). The reconstructed density fields, trained with 10% initial and boundary inputs, are shown in Fig. 7.

TABLE 1 Traffic Density Reconstruction Results (With Initial and Boundary Inputs)

FIGURE 7.

Density reconstruction, trained with 10% initial and boundary inputs.

Show All

Among all training settings, we observe that PIDL models achieved much higher accuracy (lower relative $\mathcal {L}_{2}$ error) in reconstructing the parabolic PDE with the diffusion term compared to the ones with the hyperbolic LWR PDE. Trained with the L-BFGS-B optimizer and 10% boundary and initial observations, the relative $\mathcal {L}_{2}$ error of PIDL reconstruction is 0.0164, which is merely 5.3% of the 0.309 error of the same metric in reconstructing the hyperbolic LWR PDE.

We also take several snapshots of the reconstruction at time $t = 0, 0.5, 1.0, 1.5, 2.0, 2.5$ for comparison. From Fig. 7(b), we notice that PIDL learning the hyperbolic PDE encountered difficulties in reconstructing the density state at locations where the discontinuity of density data occurs. However, in Fig. 7(a), equipped with the diffusion term in the parabolic PDE, the reconstruction result is smoothed and closely aligned with the ground truth.

D. Reconstruction With Initial, Boundary, and Interior Inputs

Subsequently, here we evaluate the reconstruction accuracy under the scenarios in which varying levels of data on the initial and boundary conditions and the interior conditions (CV inputs) are available. We pick two levels of available inputs $N_{o1}$ on the initial and boundary conditions: (1) $N_{o1} = 108$ inputs, representing 5% of initial and boundary data; and (2) $N_{o1} = 432$ inputs, representing 20% of the available data. For the number of CV inputs $N_{o2}$ , we also choose two settings: $N_{o2} = 1146$ and $N_{o2} = 4584$ , accounting for 0.5% and 2% of the interior of the density field, respectively. The reconstruction result, measured by the relative $\mathcal {L}_{2}$ error defined in (29) is shown in Table 2 (again, best results are tabulated in bold). The reconstruction results with 20% initial and boundary inputs $(N_{o1} = 432)$ , and 2% CV inputs $(N_{o2} = 4584)$ are shown in Fig. 8.

TABLE 2 Density Reconstruction Results (With Initial and Boundary Inputs & CV Inputs)

FIGURE 8.

Density reconstruction, 20% initial and boundary inputs & 2% CV inputs.

Show All

Comparing to the results in Table 1, we find the inclusion of CV inputs in the training inputs of PIDL slightly improves the reconstruction accuracy with the hyperbolic PDE. However, the learning performances of PIDL architecture with the parabolic PDE are still far superior. With 20% initial and boundary observations, and 2% CV inputs, the PIDL model with L-BFGS-B optimizer achieves a relative $\mathcal {L}_{2}$ error of 0.00388 for reconstructing the parabolic PDE, which is $ \boldsymbol {1.72}\%$ of the relative $\mathcal {L}_{2}$ error with the hyperbolic PDE, at 0.225.

SECTION V.

Case Study II - Insights From Field Data

In this section, we will further shed light on the topic using field data. We examine the limitation with PIDL and compare it with Lax-Friedrichs’ numerical scheme [67] in learning the traffic density using the “Next Generation SIMulation” (NGSIM) dataset [68].

A. NGSIM Dataset

The NGSIM dataset records traffic conditions using video cameras and processes the traffic state variables such as velocity and vehicle density through vehicle trajectories identified in the video recordings [69]. The vehicle density data used, illustrated in Fig. 9, contains vehicle density for 45-minute on a 2060-foot segment of $US-101$ freeway. Shockwaves of vehicles stopping due to traffic congestion, which back-propagates in space and forward-propagates in time, can be observed in the plot of vehicle density. This freeway segment has five lanes, one on-ramp, and one off-ramp. An additional lane is attached to the freeway between the locations of the on-camp and the off-ramp. The data used was collected from 7:50 a.m. to 8:35 a.m. in Los Angeles, California, on June 15th, 2005. During the first 12 minutes of the data, a free-flow zone is observed in the post-off-ramp area, while the areas before the off-ramp are experiencing stop-and-go wave traffic [70].

FIGURE 9.

Vehicle density on US-101 highway segment, between 7:50 am and 8:35 am, NGSIM.

Show All

B. Field Data Reconstruction Using PIDL with Hyperbolic PDE

The density dataset is tabulated with spatiotemporal bins of $\Delta x = 20ft$ and $\Delta t = 5s$ . At $t = 0$ , the initial condition contains 104 data points along the 2060-ft road segment $(x = 0, 20,\ldots , 2060)$ . The lower bound condition at $x = 0$ and the upper bound condition at $x = 2060$ each has 540 boundary condition points $(t = 0, 5,\ldots , 2695)$ . Together, the initial and boundary condition data of vehicle density $\rho (x, \; t)$ are all used as training inputs of PIDL with the L-BFGS-B optimizer.

The governing physical equation of the PIDL architecture is the hyperbolic LWR conservation law paired with Greenshield’s fundamental diagram. The estimated value of maximum density $\rho _{m}$ is 0.12 vehicle per foot (sum of all traffic lanes), and the free-flow speed $v_{f}$ is estimated at 80 feet per second (54.54 miles per hour). The reconstruction result with PIDL is shown in Fig. 10.

$FIGURE 10. - Reconstruction with PIDL, Relative $\mathcal{L}_{2}$ Error: 0.345.$

FIGURE 10.

Reconstruction with PIDL, Relative $\mathcal{L}_{2}$ Error: 0.345.

Show All

The relative $\mathcal {L}_{2}$ error of PIDL reconstruction is 0.345. From the snapshots of $ t = 0, 450, 900, 1350, 1800, 2250$ , PIDL tries to mimic the evolution of the density field between the lower boundary $x = 0$ and the upper boundary $x = 2060$ ; however, it cannot overcome the challenge in learning the stochastic perturbation of the density state. It is also evident in the reconstruction plot of $\rho (x \; t)$ that the PIDL architecture overly generalizes the output and fails to capture any traffic patterns, such as the shockwaves present in the dataset.

C. Field Data Reconstruction Using PIDL with Parabolic PDE

For NGSIM US-101 data, the diffusion coefficient $\epsilon $ in Eqn. (25) is estimated to be 0.13 for regular motor vehicles [71]. We select a range of values at ${0.05, 0.1, 0.13, 0.15, 0.20}$ for $\epsilon $ and reconstruct the density dataset using the LWR PDE with the addition of the diffusion term as the physics of PIDL. The reconstruction result with $\epsilon = 0.13$ is shown in Fig. 11.

$FIGURE 11. - Reconstruction with PIDL, $\epsilon= 0.13$ , Relative $\mathcal{L}_{2}$ Error: 0.319.$

FIGURE 11.

Reconstruction with PIDL, $\epsilon= 0.13$ , Relative $\mathcal{L}_{2}$ Error: 0.319.

Show All

We observe that with the realistic NGSIM dataset, the addition of the diffusion term only slightly improves the accuracy, landing a relative $\mathcal {L}_{2}$ error at 0.319. Similar to the troubles the PIDL neural network with hyperbolic LWR PDE encountered, the stochastic nature of traffic disturbance overcomes the PIDL’s ability to accurately capture the traffic state, with only boundary and initial observations. Our previous work [34] demonstrates that the inclusion of Lagrangian observations, such as CV data, will significantly improve PIDL performances in this case.

We also conduct the sensitivity analysis on the diffusion coefficient $\epsilon $ , and the results are presented in Table 3. With varying values of the diffusion coefficient $\epsilon $ , the conclusion from this analysis is unwavering: although the diffusion term smooths out the reconstruction in areas where state discontinuity is present, it cannot accurately estimate the traffic state with only initial and boundary observations.

TABLE 3 Relative $\mathcal{L}_{2}$ Error With Choices of Diffusion Coefficient $\epsilon$

$Table 3- Relative $\mathcal{L}_{2}$ Error With Choices of Diffusion Coefficient $\epsilon$$

D. Reconstruction Using Lax-Friedrichs’ Numerical Scheme

The vehicle density dataset from NGSIM can also be reconstructed by using the Lax-Friedrichs’ differencing method [72] with the complete initial and boundary conditions. The reconstruction is pictured in Fig. 12.

$FIGURE 12. - Reconstruction with Lax-Friedrichs’ Numerical Scheme, Relative $\mathcal{L}_{2}$ Error: 0.231.$

FIGURE 12.

Reconstruction with Lax-Friedrichs’ Numerical Scheme, Relative $\mathcal{L}_{2}$ Error: 0.231.

Show All

Along with a smaller relative $\mathcal {L}_{2}$ error, the most significant improvement in the reconstruction result by the Lax-Friedrichs’ method is the capability to capture and rebuilds the pattern of shockwaves in the dataset, based only on the inputs from the initial and boundary condition. The advantage of the numerical scheme resides in the ability to propagate the discontinuity in the density field based on the method of characteristics, while the PIDL architecture struggles with the reconstruction task.

SECTION VI.

Discussion on Empirical Results

Recent examinations have elucidated the challenges associated with training and drawbacks of certain data representations in PIDL. In several instances, unstable convergence occurs in the gradient-descent-based PIDL training, especially when the underlying PDE solution has high-frequency features [39]. This pathological behavior observed in PIDL training is due to the multi-scale interactions between the cost terms in optimizing the neural network cost [73]. It leads to stiffness in the gradient flow dynamics, ultimately inducing a severe constraint on the learning rate and adding detriment to the stability of the training process. PIDL, which often deploys fully-connected hidden layers, faces the challenge termed “spectral bias” that cannot reasonably assimilate a nonlinear hyperbolic PDE when its solution involves shocks [74].

Potential mitigation directions: The potential approaches to improve the PIDL paradigm include (a) switching the hyperbolic physics with the parabolic counterpart by adding the diffusion term, (b) incorporating more observation or collocation points around the shocks, (c) including more interior training instances (e.g., Lagrangian measurements), (d) modifying the fully-connected learning architecture of the neural network. Next, we discuss these approaches below.

For a one-dimensional hyperbolic PDE with a non-convex flux function, its analytical solution can be depicted by a simple piecewise continuous function, and the stability of its solution can be significantly improved by adding a diffusion term to the inherent PDE. With smoothing around the shock by the diffusion, the neural network can recuperate the actual scale and location of the shock, solves the PDE in its parabolic form, and leads to precise approximation results [75]. However, common practices assume the coefficient $\epsilon $ to be zero when fitting a realistic traffic dataset with LWR conservation law [28], [76], leaving out the diffusion effect. From Fig. 7(a) and Fig. 7(b) in Section IV, the inclusion of the diffusion term significantly improved the reconstruction accuracy of a PIDL neural network. The deficiency of PIDL with hyperbolic LWR PDE is not rooted in the learning architecture of the neural network, or the hyperparameter selection [75]. The diffusion term enhances the stability of the gradient optimization process in training the neural network around the areas with state discontinuity and shockwaves.

Increased observations or collocation points along the shock trajectories in the training of the neural network forms another approach [77]. However, one challenge in this approach would be identifying the shock location. As we have observed, the PIDL struggles with approximating the vehicle density where localized non-linear discontinuity exists in the data; adding artificial dissipation could improve the learning result of the hyperbolic conservation law [78], [79].

Based on the results in Section V, PIDL with the parabolic variant of LWR PDE cannot overcome the random perturbation in the traffic dataset and accurately estimate the traffic density based on pure observation of the initial and boundary conditions. Our previous work suggests the benefit of including Lagrangian observation in this setting for the task of TSE [34].

Alternatively, recent studies have started tweaking the design of the deep learning architecture in PIDL to circumvent the issue of learning underlying hyperbolic PDEs with discontinuity and no strong solution. The adoption of attention-based recurrent neural networks is introduced to capture the localized shock waves in the nonsmooth solution to governing equations [77]. It mitigated the challenges by substituting the conventional fully connected feedforward architecture in PIDL with recurrent neural networks and attention mechanisms. Additionally, convolutional neural network architectures have also demonstrated efficiency in assimulating data with nonlinear shock features or high-frequency components in the PDE solution [81], [80]. It is pointed out that the optimization of a PIDL neural network in learning a hyperbolic PDE can be a futile process due to the fact that the pointwise residual blows up during approximating the exact solution, and an alternative optimization approach can be found by using the residual relate to the Kružkov entropy condition to replace the pointwise residual [82].

In practice, the task of TSE involves realistic traffic data in which a diffusion term is inherent - drivers will gradually slow down the speed of their vehicles in participation of congestion or when a slowdown is visually perceivable. Therefore when adopting physics-informed deep learning for TSE, this nature of smoothness around shockwaves should be considered as part of the “physics”, which illustrates the underlying relationship between traffic states. Recall the diffusion term in the parabolic form of LWR PDE is weighted by the parameter $\epsilon $ . The value of $\epsilon $ is associated with drivers’ behavior (reaction time to press the brake, for example) and should be tuned according to the traffic dataset before plugging in the conservation equation for the realistic reconstruction of the traffic data. Other approaches that transportation planners and agency practitioners can adopt is to include Lagrangian sensing data [34], [35], increasing the number of observation samples in the shockwave areas [77], and domain decomposition and projection onto the space of high-order polynomials [47].

SECTION VII.

Conclusion

In this work, we exhibited the difficulties of training a physics-informed deep learning (PIDL) neural network to reconstruct a certain type of partial differential equation (PDE) - the hyperbolic PDE for which a strong solution cannot be obtained. The non-smooth weak solution to conservation law-based traffic flow models (such as the LWR PDE) causes PIDL failure in capturing the scale and location of the discontinuity. Through the case study, we showcased the stark differences between the learning result using PIDL with the first order hyperbolic LWR PDE and its parabolic counterpart, in which the additional diffusion term secures a strong solution and leads to pinpoint approximation. When the PDE solution contains multi-scale features of high-frequency terms, it oftentimes causes severeness in calculating the gradients in the fully connected learning structure of a PIDL neural network [74], making the optimization process unstable and ultimately leads to inaccurate predictions [39], [83]. We observe that the deep learning neural network fails to approximate the nonlinear relationship of a hyperbolic PDE in areas where shockwaves are present, whereas the diffusion term in the parabolic PDE ensures improved data estimation in these areas and thus ameliorates the reconstruction result. Besides reconstructing using only the initial and boundary conditions, we list the possible sampling choices of traffic flow and use a diverse combination of Eulerian and Lagrangian data to ensure the reliability and validity of the results. Moreover, with the field data from the NGSIM traffic dataset, we further highlight the limitation of PIDL in the presence of shockwaves. Future work includes analysis of the cost evolution (over time) in reconstructing hyperbolic and parabolic PDEs to understand the interaction between the cost terms of the PIDL neural network and interpret its pathological behavior in learning conservation law-based models.

ACKNOWLEDGMENT

The authors wish to thank Dr. Rongye Shi at Columbia University for providing the ring road data for the case study, Dr. Animesh Biswas at the University of Nebraska at Lincoln for the LWR reconstruction of the data, and Dr. Pushkin Kachroo at the University of Nevada Las Vegas for the suggestions and discussions on the topic.

References is not available for this document.

On the Limitations of Physics-Informed Deep Learning: Illustrations Using First-Order Hyperbolic Conservation Law-Based Traffic Flow Models

Abstract:

Metadata

Abstract:

Funding Agency:

Introduction and Motivation