Journals & Magazines >IEEE Access >Volume: 8

A Personalized Human Drivers’ Risk Sensitive Characteristics Depicting Stochastic Optimal Control Algorithm for Adaptive Cruise Control

A personalized stochastic optimal ACC algorithm for AVs incorporating human drivers’ risk-sensitivity and control comfort requirement under system and measurement uncerta...

Abstract:

This paper presents a personalized stochastic optimal adaptive cruise control (ACC) algorithm for automated vehicles (AVs) incorporating human drivers' risk-sensitivity u...Show More

Metadata

Abstract:

This paper presents a personalized stochastic optimal adaptive cruise control (ACC) algorithm for automated vehicles (AVs) incorporating human drivers' risk-sensitivity under system and measurement uncertainties. The proposed controller is designed as a linear exponential-of-quadratic Gaussian (LEQG) problem, which utilizes the stochastic optimal control mechanism to feedback the deviation from the design car-following target. With the risk-sensitive parameter embedded in LEQG, the proposed method has the capability to characterize risk preference heterogeneity of each AV against uncertainties according to each human drivers' preference. Further, the established control theory can achieve both expensive control mode and non-expensive control mode via changing the weighting matrix of the cost function in LEQG to reveal different treatments on input. Simulation tests validate the proposed approach can characterize different driving behaviors and its effectiveness in terms of reducing the deviation from equilibrium state. The ability to produce different trajectories and generate smooth control of the proposed algorithm is also verified.

A personalized stochastic optimal ACC algorithm for AVs incorporating human drivers’ risk-sensitivity and control comfort requirement under system and measurement uncerta...

Published in: IEEE Access ( Volume: 8)

Page(s): 145056 - 145066

Date of Publication: 10 August 2020

Electronic ISSN: 2169-3536

DOI: 10.1109/ACCESS.2020.3015349

Funding Agency:

Contents

CCBY - IEEE is not the copyright holder of this material. Please follow the instructions via https://creativecommons.org/licenses/by/4.0/ to obtain full-text articles and stipulations in the API documentation.

SECTION I.

Introduction

Automated vehicles have drawn considerable attention widespread from the public recently since they have been expected to have a transformative impact on road transportation, for instance, to address critical traffic issues such as energy and capacity shortage. AVs are those equipped with embedded sensors named on-board units (OBUs) such as Radar or LIDAR to acquire real-time driving information from the leading vehicle via the detecting process [1], [2]. Among vehicle automation functions, longitudinal control, such as adaptive cruise control plays an essential role to automatically adjust their speeds to maintain a desired distance from the preceding vehicle to avoid rear-end collisions. And ACC provides assistance to the drivers in the task of longitudinal control during their motorway driving [3], [4]. With fast-developing communication technologies, cooperative ACC is very heated recently [5], [6]. It is appealing because of the enhancement of vehicle performance and situational awareness. However, cooperative ACC can be unreliable due to the immaturity of vehicle to vehicle communication [7]. Hence, ACC is still worth investigating.

A quantity of ACC car-following optimal control algorithms and strategies were proposed during the past decades. For example, two illustrious models, the optimal velocity model (OVM) [8] and intelligent driver model (IDM) [9], [10], initially designed to mimic human-driven vehicle’s (HV) car-following behavior are then applied to describe ACC vehicle behavior. The OVM regulates the vehicle towards an optimal speed defined as a function of the time headway, however, it does not have a collision-free property. IDM addresses the safety problem by introducing extra parameters such as a brake term to constrain acceleration. Nevertheless, these models do not consider human driver characteristics and have an obvious flaw in distinguishing the essential difference between HV’s car-following and AV’s car-following behavior.

Other than directly applying or modifying existed HV car-following models, a large amount of control theory based ACC algorithms are developed rapidly, which can be generally divided into three main categories: (i) optimal control with explicit objectives and hard constraints [11], [12], (ii) linear controllers [5], [13] and (iii) nonlinear controllers [14], [15]. The first type is usually implemented in a model predictive control (MPC) fashion [14], [16]–[20], which represents a series of control algorithms that use explicit process models to predict future responses of certain inputs. The MPC approach is attractive due to its flexibilities on objectives and constraints modelling. However, especially for constrained MPC, it does not guarantee feasible solutions and requires an efficient algorithm to solve the constrained optimization problem. Compared with MPC, objectives of unconstrained nonlinear and linear controllers are more fixed, and constraints are lack. Linear ACC controllers usually design the acceleration to be proportional to the spacing deviation and the relative speed with the predecessor, which are fast in computing and easier to apply in practice. Nonlinear controllers exist to depict some scenarios more precisely such as a mixed traffic environment consisting of AVs, HVs, and semi-autonomous vehicles while with the drawback of increasing calculation complexity [15]. In spite of the above-mentioned three categories of methods have both strengths and weaknesses, personalized preferences are barely considered in previous controller design. AVs are usually treated as simply homogenous.

Even though AV can satisfy the driving automation requirement, personalized automation such as some certain driving styles, driver-based preferences, and driver patterns has rarely been considered inside the ACC design yet. As far as authors know, current ACC control algorithms cannot really incorporate users driving preference [21]. Besides, as AV’s market penetration grows gradually, the number of user-preferred settings can naturally diversify [22], [23]. Among all the personal settings, risk-taking preference, also known as risk-sensitivity, is an extremely important characteristic due to the fact that it has a close relationship with the safety of car-following behaviors and should be heterogeneous for different drivers depending on the states and current driving situations. Hence, it is necessary to fill the gap on risk-sensitive personalization interpreting in AV control to satisfied driver’s expectation better.

Additionally, comfort in AV control process is also critical. To avoid sudden starts and stops, expensive control strategy [23] which has an extra penalty on large control input (acceleration) can also help to smooth car-following behaviors based on different drivers’ preference. Hence, to increase the diversity of ACC controller behaviors, we present a personalized stochastic optimal control algorithm for AV. The developed ACC algorithm shed the light on diversifying drivers’ choices to satisfy the personal requirements, which gives two detailed contributions: Firstly, to show the functionality of being personalized in risk-sensitive preference, the control framework applies linear exponential-of-quadratic Gaussian (LEQG) (or risk sensitive linear quadratic) mechanism via extending LQG, in which the cost function is designed as an exponential form and involves a risk-sensitive parameter to interpret different willingness degrees to bear additional risk under mixed uncertainties [25]–[28]. As a result, various driving trajectories and behaviors are generated by the controller based on different sensitive parameter settings. Secondly, control comfort requirement in the control process is also incorporated via the relative magnitude settings on weight matrices, i.e., switching between expensive control mode and non-expensive control mode. These contributions in ACC design are innovative and important to becoming practical functions in ACC future industrialization. For verification, the presented controller is evaluated through several simulation experiments. The results show that the proposed controller can provide effective and convergent control as well as generate different trajectories for AVs with different risk preferences. The control framework has overall satisfying performances.

The remainder of the paper is organized as follows: Section II presents the continuous and discretized system state-space formulation and introduces control design. Section III describes the validation process of the proposed control strategy. Finally, Section IV summarizes the key findings and provides future research direction.

SECTION II.

State-Space Formulation and Control Design

This section starts by presenting the system state-space formulations in both continuous and discretized form, then introduces the proposed LEQG stochastic optimal control problem. For simplicity, some preliminaries are stated firstly. We postulate that all ACC vehicles can receive real-time information (e.g. velocity, acceleration) detecting by the sensors regarding the direct predecessor. We also assume all discretized uncertainties are zero-mean additive white Gaussian noises. Besides, communication delay and vehicular actuation delay are not considered here since they are not large compared to the control sample period [5].

A. Continuous State-Space Formulation

Accordingly, this contribution applies the state-space formulation for ACC system propose by Zhou et al., [18]. The notable constant time headway (CTH) policy is used in this paper, thus the target equilibrium spacing at time $t$ is computed as below:\begin{equation*} s^{\ast }(t)=v(t)\times t^{\ast }+s_{0}\tag{1}\end{equation*} View Source where $s^{\ast }(t)$ denotes the desired spacing at time $t$ , $v(t)$ represents the speed of the follower, $t^{\ast }$ is a predefined desired time gap and $s_{0} $ is the minimum spacing between two vehicles in standstill situation for safety concern. We then define:\begin{equation*} \Delta v(t)=v^{\ast }(t)-v(t)\tag{2}\end{equation*} View Source where $v^{\ast }(t)$ represents the velocity of the leader. $\Delta v(t)$ is the relative speed of an ACC vehicle (follower) with its leader.

Using the above-mentioned variables, the system state $x$ can be naturally defined as follows:\begin{equation*} x(t)=(s(t)-s^{\ast }(t),\Delta v(t))^{T}\tag{3}\end{equation*} View Source where $s(t)$ is follower’s actual relative spacing with the leading vehicle. The first term of $x(t)$ represents the deviation with the equilibrium spacing at time $t$ . Thus, the above system can be formulated as a linear time invariant (LTI) system with system state $x(t)$ and control input $u(t)$ by assuming the leading vehicle follows a constant speed during each sampling interval, the state equation follows:\begin{align*} \dot {x}(t)=&Ax(t)+Bu(t)+V(t) \tag{4}\\ A=&\left ({{{\begin{array}{cccccccccccccccccccc} 0 &\quad 1 \\ 0 &\quad 0 \\ \end{array}}} }\right),\quad B=\left ({{{\begin{array}{cccccccccccccccccccc} {-t^{\ast }} \\ {-1} \\ \end{array}}} }\right)\tag{5}\end{align*} View Source where $u(t)$ is the acceleration of an ACC vehicle as well as the desired system input at time $t$ , $V(t)$ refers to the exogenous system disturbance which consists of relative speed term and acceleration term due to unmodeled vehicular and aerodynamics factors. Eq. (4) and Eq. (5) are obtained according to definitions in Eqs. (1–3) and the physical relationship among the distance, velocity, and acceleration. Meanwhile, Eq. (4) depicts the correlation between the derivative of the system state, system state, control input and uncertainty.

B. State-Space Formulation Discretization

Though continuous system is usually more desirable, in application, systems usually have a control frequency. Therefore, to be more realistic, we discretize the continuous ACC system proposed in Eq. (4) by assuming control input $u(t)$ to be zero-order hold same as Zhou et al. [18]:\begin{equation*} x_{t+1} =A_{d} x_{t} +B_{d} u_{t} +V_{t}\tag{6}\end{equation*} View Source where \begin{align*} A_{d}=&e^{At_{s}} \tag{7}\\ B_{d}=&\left({\int _{0}^{t_{s}} {e^{A\tau }} d\tau }\right)B \tag{8}\\ x_{0}=&\mu _{0}\tag{9}\end{align*} View Source

Note that zero-order hold means that within each sampling interval, the control input is time-invariant [29]. $t_{s} $ is the control frequency (interval) of the controller. $x_{t} $ , $u_{t} $ , $V_{t} $ represents the discretized version of system state, control input and system error, respectively. The sequence $V_{t} $ has zero mean and satisfies independent and identical distribution. Furthermore, $V_{t} $ is independent of $x_{t} $ . $A_{d} $ and $B_{d} $ are the discretized form for weight matrices which are calculated as Eq. (7) and Eq. (8) respectively. $\mu _{0} $ is a predefined constant value, indicating the initial system state. After discretization, the system becomes more realistic and are available for digital-computer implementation.

C. State Feedback Control Strategy

Our designing of stochastic optimal longitudinal control of ACC system applies a linear exponential-of-quadratic Gaussian control framework. LEQG control problem, adopts risk sensitive control and differential game theory, is an important generalization of the linear-quadratic Gaussian (LQG) control problem. LEQG replaces the original quadratic cost function in LQG with an exponential form of a quadratic functional of the state and the input, hence, is called “LEQG”. Furthermore, a risk-sensitive parameter is introduced in the cost function of LEQG. Other than being a part of the control framework, the parameter has a realistic meaning in this paper in depicting different levels of risk preference for AVs when facing oscillations.

Firstly, the state feedback control strategy is discussed, which is a widely applied technique for adaptive cruise controllers. In this strategy, the observation of system is assumed to be perfect. The control input is a linear function of the state that defined as state multiplies feedback gain. The running cost at each time point $t$ of an ACC vehicle in the system is defined as:\begin{equation*} \Upsilon _{t} ={x}'_{t} M_{t} x_{t} +{u}'_{t} N_{t} u_{t}\tag{10}\end{equation*} View Source where ${x}'_{t} $ denotes the transpose of matrix $x_{t} $ , ${u}'_{t} $ denotes the transpose of matrix $u_{t} $ . The way of transpose expression is also suitable for other matrices in this paper. $M_{t} $ and $N_{t} $ are the predetermined weight matrices of state and input at time $t$ respectively. In order to guarantee the non-negative, symmetry property and ensure Eq. (10) is always in a quadratic form, define the $M_{t} $ and $N_{t} $ in a diagonal form as below:\begin{align*} M_{t} =\left ({{{\begin{array}{cccccccccccccccccccc} {\beta _{1,t}} &\quad 0 \\ 0 &\quad {\beta _{2,t}} \\ \end{array}}} }\right),\quad N_{t} =\omega _{t}\tag{11}\end{align*} View Source where $\beta _{1,t},\beta _{2,t},\omega _{t} >0$ and all of them are predefined constant values addressing control target weights. The exponential form quadratic cost function $J_{\theta } $ of the system is then defined in Eq. (12) and Eq. (13) which subjects to the state space and initial constraints (Eq. (6), Eq. (9)) concurrently [26].\begin{equation*} J_{\theta } (x_{t},u_{t})=2\theta ^{-1}\log E\left({e^{\frac {1}{2}\theta G}}\right)\tag{12}\end{equation*} View Source where \begin{equation*} G=\sum \nolimits _{t=1}^{T} {\Upsilon _{t}}\tag{13}\end{equation*} View Source

Note that $E$ is the expectation operator. $T$ is the study period. $\theta $ in Eq. (12) is a risk-sensitive parameter used to interpreting different extent responses of dynamic systems on exogenous risk.

The objective of the proposed ACC controller is to minimize the cost function defined by Eq. (12) and Eq. (13) which commits to regulating vehicle’s velocity and spacing towards the equilibrium one in terms of running cost function, i.e., to do its best to remain the equilibrium state $x(t)=(0,0)^{T}$ . Hence, the optimal control input solution set of the system cost function constrained by Eq. (6) for ACC is defined by:\begin{equation*} u^{\ast }=\arg \min \left \{{{J_{\theta } (x_{t},u_{t})} }\right \}\tag{14}\end{equation*} View Source

The value of $\theta $ has a significant meaning in LEQG problem and ACC control. The analysis of risk behaviors has been widely discussed in economics also known as risk evaluation, which aims to assess the willingness degree to bear additional risk. For a system optimizer of LEQG type, different signs of $\theta $ suggest different risk-sensitive attitudes, the magnitude of $\theta $ can represent the level of preference for each scenario. Hence, $\theta $ is critical in AV risk personalization. More explicitly, cases of $\theta =0,\theta >0$ and $\theta < 0$ respectively correspond to the situation of risk-neutral, risk-preferring and risk-averse attitude from an economist perspective [25]. The classification reason can be expressed as follow.

In the case $\theta >0$ , to minimize the cost function, the controller actually attempts to minimize the expectation of a convex increasing function defined in Eq. (12). The penalties of the occurrences of values that larger than $E(G)$ are treated overweight than those less than $E(G)$ to achieve system goal.

Whereas in the case $\theta < 0$ , one is maximizing the expectation of a convex decreasing function in Eq. (12). The system will take an opposite evaluation in this situation which means to set the penalties of occurrences of values less than $E(G)$ larger than those greater than $E(G)$ .

Particularly, the developed control problem will reduce to a conventional LQG problem when $\theta =0$ . And the cost function for LQG problem relaxes to:\begin{equation*} J\left ({{x_{t},u_{t}} }\right)=E\left(\sum \nolimits _{t=1}^{T} x'_{t} M_{t} x_{t} +{u}'_{t} N_{t} u_{t}\right)=E(G)\tag{15}\end{equation*} View Source

Remark 1:

According to [26], the sign of risk-sensitive parameter $\theta $ can also be interpreted from the perspective of deterministic linear quadratic differential game in which both control input and system disturbance are viewed as two players.

Scenario $1~{(}\theta >\text {0)}$ : Cooperative game. Player $u_{t} $ assumes that player $V_{t} $ will be cooperative in minimizing the cost function even though the preliminary for $V_{t} $ is to satisfy the white Gaussian distribution. Then the cost function can be expressed as a cooperative deterministic game as below:\begin{equation*} \mathop {\textrm {minimize}}\limits _{\left \{{{u_{t}} }\right \},\left \{{{v_{t}} }\right \}} \sum \nolimits _{t=1}^{T} \left({x}'_{t} M_{t} x_{t} +{u}'_{t} N_{t} u_{t} + {V}'_{t} Q_{t} V_{t}\right)\tag{16}\end{equation*} View Source subject to the constraints Eq. (6) and Eq. (9), where $Q_{t}^{-1} =E\left [{ {V_{t} {V}'_{t}} }\right]; t=0,\ldots,T$ , $E\left [{ {V_{t}} }\right]=0,t=0,\ldots,T$ .

Scenario 2 ($\theta < {0}$ ): Non-cooperative game. Player $u_{t} $ assumes that player $V_{t} $ will not cooperate and even mess up in minimizing the quadratic criterion, that is to say $u_{t} $ treats the expectation of $V_{t} $ to be max (despite the fact that $V_{t} $ behaves as random Gaussian variable), then the cost function is expressed as follows:\begin{equation*} \mathop {\textrm {min max}}\limits _{\left \{{{u_{t}} }\right \},\left \{{{v_{t}} }\right \}} \sum \nolimits _{t=1}^{T} \left({x}'_{t} M_{t} x_{t} +{u}'_{t} N_{t} u_{t} + {V}'_{t} Q_{t} V_{t}\right)\tag{17}\end{equation*} View Source subject to Eq. (6) and Eq. (9).

Based on Remark 1, we can have a more comprehensive understanding of the function of the risk-sensitive parameter in LEQG.

Correspondingly, when $\theta >0$ , the solutions of LEQG state feedback control problem defined before are equivalent to the solutions of the cooperative deterministic game defined by Eq. (6) and Eq. (16). With a risk-preferring attitude, the ACC controller is thought to be more optimistic than reality (best case design) as it views system disturbance as minimized.

When $\theta < 0$ , similarly, the equivalent deterministic game problem is determined by Eq. (6) and Eq. (17) and viewed as a non-cooperative situation. With a risk-averse attitude, the ACC controller will be treated as more pessimistic than reality (worst case design) since it maximum the intensity of system noise.

For the LEQG state feedback control problem defined by Eq. (6), Eq. (12) and Eq. (13), the optimal control takes the form [25], [27]:\begin{equation*} u_{\theta,t}^{\ast } =K_{\theta,t} x_{t}\tag{18}\end{equation*} View Source where $K_{\theta,t} $ is known as the feedback gain calculated by:\begin{equation*} K_{\theta,t} =-N_{t}^{-1} {B}'_{d} \times (B_{d} N_{t}^{-1} {B}'_{d} +\tilde {P}_{\theta,t+1}^{-1})^{-1}\times A_{d}\tag{19}\end{equation*} View Source The negative sign in Eq. (19) is to indicate the negative feedback property. And $P_{\theta,t} $ is determined by solving the discrete-time algebraic Riccati equation (DARE): \begin{align*} P_{\theta,t}=&M_{t} +A_{d}^{\prime }\times (B_{d} N_{t}^{-1} {B}'_{d} +\tilde {P}_{\theta,t+1}^{-1})^{-1}\times A_{d} \tag{20}\\ \tilde {P}_{\theta,t+1}=&(P_{\theta,t+1}^{-1} -\theta \xi)^{-1},\quad t=0,1,\ldots,T-1\tag{21}\end{align*} View Source $\xi $ is defined as the variance matrix of system disturbance $V_{t} $ . In this paper, the tilde operator, ~, means an estimate of a variable.

D. Output Feedback Control Strategy

The precondition to applying the aforementioned state feedback control strategy is that to assume the system has perfect measurement. Nevertheless, it is highly unrealistic in practice due to inevitable measurement uncertainties. Here we introduce the measurement equation considering sensor disturbance as below:\begin{equation*} y(t)=Cx(t)+W(t)\tag{22}\end{equation*} View Source where \begin{align*} C=\left ({{{\begin{array}{cccccccccccccccccccc} 1 &\quad 0 \\ 0 &\quad 1 \\ \end{array}}} }\right)\tag{23}\end{align*} View Source

Eq. (22) describes the measurement (observation) equality, in which $y(t)$ is the measurement detecting from AV’s on-board sensor and $W(t)$ represents the sensor error (disturbance) at time $t$ . $C$ is the measurement matrix which is set to be a unit diagonal matrix without losing generality.

The covariance matrix of two disturbances $\left [{ {V(t),W(t)} }\right]^{\prime }$ are given by:\begin{align*} \Delta =\left ({{{\begin{array}{cccccccccccccccccccc} \xi &\quad \gamma \\ {\gamma '} &\quad \Gamma \\ \end{array}}} }\right)\tag{24}\end{align*} View Source with \begin{equation*} \xi =H(t),\quad \Gamma =S(t),~\gamma =\text {cov}(V(t),W(t))\tag{25}\end{equation*} View Source where $H(t)$ and $S(t)$ are the variances for $V(t)$ and $W(t)$ respectively, $\gamma $ is the covariance of $V(t)$ and $W(t)$ .

Similarly, we discretize Eq. (22) using the same method mentioned before and we obtain:\begin{equation*} y_{t} =C_{d} x_{t} +W_{t}\tag{26}\end{equation*} View Source where $C_{d} $ is the discretized form for $C$ . $W_{t} $ represents the discretized measurement error. $H_{t} $ and $S_{t} $ are the discretized expression for the covariance $H(t)$ and $S(t)$ , computed as:\begin{align*} H_{t}=&\int _{0}^{t_{s}} {e^{A\tau }} M(t)d\tau \tag{27}\\ S_{t}=&\frac {t_{s}}{N(t)}\tag{28}\end{align*} View Source

For the LEQG problem defined by Eq. (6), Eq. (12), Eq. (13) and Eq. (26), the output feedback case optimal control can be obtained by using the separation principle [30], which states that control and estimation are two independent processes in designing. Therefore, based on this principle, output feedback control is designed to remain the original control framework while replaces the unmeasurable parameter in the objective function to its estimated value. The detailed optimal solution follows:\begin{equation*} u_{\theta,t}^{\ast } =K_{\theta,t} (I-\theta R_{\theta,t} P_{\theta,t})^{-1}\mu _{\theta,t}\tag{29}\end{equation*} View Source

Further, let $x_{0} \sim N\left ({{\mu _{0},R_{0}} }\right)$ , where $N$ represents the normal distribution. With the cost function defined in Eq. (12) and Eq. (13), the output feedback DARE has the form:\begin{align*}&\hspace {-0.5pc}R_{\theta,t+1} =\xi +A_{d} \tilde {R}_{\theta,t+1} A_{d}^{\prime }-(\gamma +A_{d} \tilde {R}_{\theta,t+1} {C}'_{d}) \\& \qquad\qquad {{\times (\Gamma \textrm {+C}_{d} \tilde {R}_{\theta,t+1} {C}'_{d})^{-1}\times (\gamma +A_{d} \tilde {R}_{\theta,t+1} {C}'_{d} {)}'}}\tag{30}\end{align*} View Source where \begin{align*} \tilde {R}_{\theta,t} =(R_{\theta,t}^{-1} -\theta M_{t})^{-1};\quad t=0,1,\ldots,T-1,~R_{\theta,0} =R_{0} \\\tag{31}\end{align*} View Source

For state estimation, the Kalman filter which contains prediction stage and update stage is applied [31]. We use superscripts + and − to denote predicted estimates and updated estimates, respectively. Firstly, the predicted state estimate $\mu _{\theta,t+1}^{-} $ at time $t+1$ is acquired from the previously updated state estimate $\tilde {\mu }_{\theta,t}^{+} $ :\begin{align*} \tilde {\mu }_{\theta,t+1}^{-}=&A_{d} \tilde {\mu }_{\theta,t}^{+} +B_{d} u_{t} \tag{32}\\ \tilde {\mu }_{\theta,t}^{+}=&\tilde {R}_{\theta,t} \times R_{\theta,t}^{-1} \tilde {\mu }_{\theta,t}^{-}\tag{33}\end{align*} View Source

Note that we tend to use a different symbol $\mu $ to distinguish the state in state feedback control and output feedback control.

In the update stage, the measurement residual $\tilde {z}_{t+1} $ which represents the difference between the true measurement and the estimated measurement is computed:\begin{equation*} \tilde {z}_{t+1} =y_{t+1} -C_{d} \tilde {\mu }_{\theta,t+1}^{-}\tag{34}\end{equation*} View Source

The filter estimates the real measurement via using the product of the predicted state estimate and the discretized measurement matrix. And we then multiply the residual by the Kalman gain to update the state estimate together with the predicted state estimate:\begin{equation*} \tilde {\mu }_{\theta,t+1}^{+} =\tilde {\mu }_{\theta,t+1}^{-} +L_{\theta,t+1} \tilde {z}_{t+1}\tag{35}\end{equation*} View Source where $\tilde {\mu }_{\theta,t+1}^{+} $ is the current updated state estimate and the Kalman gain $L_{\theta,t+1} $ equals:\begin{equation*} L_{\theta,t+1} {= (}{\gamma }'+C_{d} \tilde {R}_{\theta,t} {A}'_{d} {)}'\times (\Gamma +C_{d} \tilde {R}_{\theta,t} {C}'_{d})^{-1}\tag{36}\end{equation*} View Source where $K_{\theta,t} $ and $P_{\theta,t} $ have been defined in Eq. (19) and Eq. (20).

Noting that the output feedback strategy has the same logic with the state feedback strategy in understanding the functionality of $\theta $ . For brevity, detailed discussions are omitted here.

E. Expensive and Non-Expensive Control Mode

To be better personalized in control comfort requirements, two control modes are proposed below.

1) Expensive Control Mode

Controlling of a time-invariant dynamic system with a heavy constraint on the input amplitudes is known as expensive control since the control cost of input $u$ is more expensive relative to that of the state $x$ [24]. To achieve this, set $N_{t} >M_{t} $ in the system to show the additional emphasis on smooth driving, indicating that the system gives priority to smaller control input in order to accelerate moderately in the study period.

2) Non-Expensive Control Mode

When $M_{t} $ and $N_{t} $ defined in Eq. (10) are set to have same magnitudes of value or $N_{t} < M_{t} $ , it is considered to be a non-expensive driving scenario. No extra attention is paid on the input term in non-expensive control mode.

Based on the methodology given above, except for the state feedback case and output feedback case, the proposed ACC framework can represent six different AV driving behaviors shown as Fig. 1. In Fig. 1, the first level branch is based on the system control mode and the second level branch distinguishes different categories of risk sensitivity. For example, risk-preferring driving behavior under expensive control mode implying the ACC controller has an optimistic driving attitude towards uncertainties meanwhile the acceleration action magnitude is limited to some extent during the control period.

FIGURE 1.

AV driving behaviors classification.

Show All

SECTION III.

Experiments and Results Analysis

In order to validate the performance of the proposed stochastic linear optimal control method, numerical simulation experiments have been conducted since field test is expensive and beyond scope. This section includes four parts. We firstly begin with an experiment set-up part to initialize the condition and set some parameter default values in Section III.A. Based on that, we systematically designed multiple scenarios with different control parameters to show our proposed framework can produce the heterogeneous driving behaviors. Specifically, we conducted a sensitivity analysis to compare the performances difference between input feedback case and output feedback case with different risk sensitivity parameters in Section III.B. Then we analyzed heterogeneous driving behaviors caused by joint impact of risk sensitivity and magnitudes of disturbances in Section III.C. Finally, two different control modes adopting output feedback strategy with varied values of risk sensitivity are discussed in Section III.D.

A. Experiment Set-Up

As mentioned before, we consider simulation experiments to validate the proposed algorithm. Firstly, Table 1 gives the default values of corresponding parameters. The simulation study period $T$ is set as $5s$ . Considering the balance of controller efficiency and proper driving comfort, we set $\beta _{1,t} =\beta _{2,t} =1$ and $\omega _{t} =1$ as default values firstly. Giving the same weights to the deviation from the equilibrium spacing and speed with the preceding vehicle suggests the equal significance of collision avoidance and driving smoothness. Besides, through parameter tuning, the values of the risk parameter $\theta $ in the proposed controller choose–0.2, 0 and 0.2 for risk-averse $(\theta < 0)$ , risk-neutral $(\theta =0)$ and risk-preferring $(\theta >0)$ case respectively. As for the system initial condition, the initial state is set to be $x_{0} =(2,0)^{T}$ as an illustration for the following experiments, meaning that the start spacing deviation of system to be $2m$ and the start speed difference with the predecessor to be $0m/s$ . Note that without additional expression, controller parameters comply with default values.

TABLE 1 Parameters

B. State Feedback Strategy and Output Feedback Strategy Performance Comparison Under Same Intensities of Mixed Uncertainties

We firstly conducted a sensitivity analysis on the control performance comparison between the output feedback case and state feedback case with varied variances of both system disturbance and measurement disturbance to investigate joint impact caused by the risk preference and disturbance intensities on control performance. The test range of measurement disturbance is set according to the authoritative report National Highway Traffic Safety Administration [32], which states that radar’s range of accuracy for distance and velocity are $\pm 0.3m$ and $\pm 0.83m/s$ . Therefore, based on the reference value and considering control range feasibility, we vary the variance of measurement disturbance $S_{t} $ from 0 to 0.2 with an increment of 0.05. The units for distance and speed are $m$ and $m/s$ , respectively. As for the variance of system disturbance, no relative reference was found. Hence, we also vary $H_{t} $ from 0 to 0.2 with an increment of 0.05 to guarantee that $H_{t} $ is consistent with the magnitude of $S_{t} $ . For simplicity, we postulate that measurement disturbance and system disturbance are non-correlative as it has been shown that the potential correlation between them does not have a critical impact (i.e., $\gamma =0$ ).

The relative change percentage of the total cost is chosen to be an indicator to compare the performance with/without considering measurement noises, which is calculated as follows:\begin{align*} \textrm {Relative change percentage =}\frac {J_{\theta,State} -J_{\theta,Output}}{J_{\theta,Output}} \times {100\%} \\\tag{37}\end{align*} View Source where $J_{\theta,State} $ and $J_{\theta,Output} $ represent the total control cost for the state feedback case and output feedback case under same mixed uncertainties respectively. We set $J_{\theta,Output} $ as a reference here. The results with different risk sensitive parameters are given in Fig. 2. To thoroughly describe the sensitivity analysis results, we discuss them from three aspects. (i) No measurement disturbance $(y=0)$ . In this case, with available system states and perfect measurement, the relative change percentages stay zero. (ii) Small measurement disturbance $(y=0.05)$ . In this situation, the indicator value of risk-averse case $(\theta =-0.2)$ reaches the maximum values compared with other measurement disturbance intensity settings, changing from 1.8% to 4.1% as the system disturbance variance increases. For risk-neutral case $(\theta =0)$ , the relative change percentages locate in 2.5% to 5.1%. The indicator values of risk-preferring case $(\theta =0.2)$ range from 3.3% to 5.9%. (iii) Large measurement disturbance $(y=0.2)$ . We find conspicuous differences exist in terms of indicator among scenarios with different risk parameters: risk-preferring case acquires the largest indicator value while risk-averse case has nearly zero relative change percentage, risk-neutral case is slightly less than the maximum indicator value. Hence, the above indicates that the proposed controller can generate different trajectories for different risk-resistance settings to depict human drivers’ risk preference in AV control. To further validate this, we plot the vehicle’s state evolutions shown as Fig. 3. Besides, an observe suggests that output feedback case always outperforms than state feedback case because the relative change percentage is always positive within the experiment region.

$FIGURE 2. - Sensitivity analysis of disturbances: (a) $\theta =-0.2$ ; (b) $\theta =0$ (c) $\theta =0.2$ .$

FIGURE 2.

Sensitivity analysis of disturbances: (a) $\theta =-0.2$ ; (b) $\theta =0$ (c) $\theta =0.2$ .

Show All

$FIGURE 3. - Performance comparison for state feedback case and output feedback case $(H_{t} =0.2,S_{t} =0.05)$ : (a) $\theta =-0.2$ ; (b) $\theta =0$ (c) $\theta =0.2$ .$

FIGURE 3.

Performance comparison for state feedback case and output feedback case $(H_{t} =0.2,S_{t} =0.05)$ : (a) $\theta =-0.2$ ; (b) $\theta =0$ (c) $\theta =0.2$ .

Show All

C. Heterogeneous Driving Behavior Caused by Joint Impact of Risk Sensitivity and Magnitudes of Disturbances

Since the behaviors of different risk sensitivity will react heterogeneously under different magnitudes of disturbances, here we investigate the heterogeneous driving behavior caused by joint impact induced by different risk sensitivity and magnitudes of disturbances. Since the experiments of previous part have verified that output feedback case outperforms state feedback case, we will no longer discuss the latter one in the following experiments. Further, we extend the behavior analysis using the default value given in Table 1 but varying the system disturbance $H_{t} $ varies in Table 1 but varying the system disturbance $H_{t} $ varies from $0\times I^{2\times 2}$ to $0.2\times I^{2\times 2}$ with 0.1 increment to evaluate the magnitude $\theta $ . The measurement disturbance variance is set at 0.1. The results are shown in Fig. 4. Furthermore, Fig. 5 gives the cost distribution of three discriminated values of $\theta $ corresponds to the two stochastic cases in Fig. 4, in which the total cost is computed as the sum of state deviation cost and control input cost.\begin{align*} C_{\theta,\textrm {State deviation}}=&\sum \nolimits _{t=1}^{T} {x_{t,\theta }} {x}'_{t,\theta } \tag{38}\\ C_{\theta,\textrm {Control input}}=&\sum \nolimits _{t=1}^{T} {u_{t,\theta }} {u}'_{t,\theta }\tag{39}\end{align*} View Source where $C_{\theta,\textrm {State deviation}} $ is the state deviation cost and $C_{\theta,\textrm {Control input}} $ is the control input cost. $x_{t,\theta } $ and $u_{t,\theta } $ are state and input considering risk sensitive parameter.

FIGURE 4.

Performance analysis for joint impact of risk sensitivity and magnitudes of system disturbances.

Show All

FIGURE 5.

AV’s cost distribution of different system disturbances.

Show All

Fig. 4 has the characteristics that the system states converge from the initial condition to the equilibrium state $(0,0)^{T}$ for all cases within the study period. We can also find that when no system disturbance exists, i.e., the deterministic scenario, state recovery evolution exhibits as a regular smooth process. With the growth of $H_{t} $ , more fluctuations manifest for both state and speed deviations during convergence. Meanwhile, if we fix the intensity of system disturbance and observe vertically, the optimistic case $(\theta =0.2)$ performs obviously different from the other two cases in stochastic situations. Therefore, the above proves that for the developed control algorithm, both risk sensitivity and magnitudes of disturbance have a significant function in generating heterogeneous driving behaviors. Notice that $\theta =\pm 0.2$ does not have a symmetric influence on control performance (see Fig. 4), we repute this is owing to the exponential design on control cost function instead of the traditional linear one.

Therefore, for a more in-depth analysis, we study the asymmetric controller response to $\theta $ . This time we fix $H_{t} $ as $0.2\times I^{2\times 2}$ for three cases while setting $\theta =-2$ for the risk-averse case. The results are provided in Fig. 6. From Fig. 6, the distinctions in control evolution can be obviously seen among three circumstances. Similarly, Fig. 7. provides the explicit cost distribution. In terms of the total cost, risk-averse case $(\theta =-2)$ costs most followed by risk-neutral case $(\theta =0)$ and risk-preferring case $(\theta =0.2)$ . The underlying reason may be that risk-averse case viewed system disturbance as maximum, therefore, more system cost is demanded against the disturbance. Further, with the changing of risk sensitivity parameter, state deviation cost and control input cost seem to have an opposite developing trend in relative magnitude order.

$FIGURE 6. - Asymmetric controller response to the risk-sensitive parameter: (a) $\theta =-0.2$ ; (b) $\theta =0$ (c) $\theta =0.2$ .$

FIGURE 6.

Asymmetric controller response to the risk-sensitive parameter: (a) $\theta =-0.2$ ; (b) $\theta =0$ (c) $\theta =0.2$ .

Show All

FIGURE 7.

AV’s cost distribution of asymmetric controller response.

Show All

D. Heterogeneous Driving Behavior Caused by Joint Impact of Risk Sensitivity and Expensive/Non-Expensive Control Mode

Expensive control mode and non-expensive control mode are realized by changing the values of predetermined weight matrices $M_{t} $ and $N_{t} $ . As for the expensive control mode, to avoid sudden starts and stops, we set $M_{t} \times N_{t}^{-1} =\frac {1}{2}\times I^{2\times 2}$ such that the variation of input is paid additional attention. On the other hand, we set $M_{t} \times N_{t}^{-1} =2\times I^{2\times 2}$ to represent non-expensive control which puts more emphasis on state deviation. $H_{t},S_{t} $ are predefined as fixed values 0.1 and 0.2 respectively here. The results are reported in Fig. 8.

$FIGURE 8. - Performance comparison for non-expensive and expensive control mode with different risk sensitive parameters: (a) $\theta =-0.2$ ; (b) $\theta =0$ (c) $\theta =0.2$ .$

FIGURE 8.

Performance comparison for non-expensive and expensive control mode with different risk sensitive parameters: (a) $\theta =-0.2$ ; (b) $\theta =0$ (c) $\theta =0.2$ .

Show All

Compare horizontally, the spacing convergence speed towards the equilibrium one for non-expensive control mode is quicker than expensive mode. Meanwhile, the relative speed error fluctuates in a smaller range for non-expensive mode compared with the expensive mode.

This is mainly because the corresponding selectable range of acceleration for expensive control mode shrinks due to the larger input weight matrix setting and objective of minimizing the cost function. Hence, the system approaches equilibrium state slower meanwhile less speed change rate causes speed oscillations more difficult to recover. Studying vertically gives us more insights into the impacts of the risk parameters on system control smoothness after convergence. The best smoothness performance happens in the risk-preferring case $(\theta =0.2)$ . The reason can be derived as risk-preferring attitude considers minimized disturbance during driving, therefore have a smoother driving pace under less disturbance.

SECTION IV.

Conclusion

Diverse driving preference requirements on risk call for the designing of personalized ACC controller considering various human drivers’ risk sensitivities. In this paper, a driving risk-resistance characteristic depicting optimal control algorithm for ACC is presented based on the LEQG control framework. The proposed algorithm can qualify and quantify AV’s risk sensitivity preference description under mixed disturbances as well as incorporating driving comfort. With different settings of risk-sensitive parameters and control modes, six categories of AVs’ heterogeneous driving behaviors when facing disturbances can be interpreted. For the validation of this contribution, sensitivity analysis and several tests are accomplished. Generally, the control performances of the proposed algorithm are satisfying. According to the results of simulated experiments, risk sensitivity, disturbance magnitude, and control mode are all effective factors on trajectory generation. Future work will extended current framework to CACC case by allowing cooperation multiple vehicles (such as [33]–[35]).

References is not available for this document.

A Personalized Human Drivers’ Risk Sensitive Characteristics Depicting Stochastic Optimal Control Algorithm for Adaptive Cruise Control

Abstract:

Metadata

Abstract:

Funding Agency:

Introduction