Introduction
Designing data-based or learning-enabled controllers that satisfy system specifications (e.g., safety, stability, and performance) has emerged as a critical concern recently in control engineering [1]. More specifically, as a prominent control design approach, data-driven control design involves learning controllers for an unknown system based solely on measurements obtained from the system and some prior information about its characteristics [2]. A common approach is to first identify a system model through input-output measurements of the system and then use the identified model to design a model-based controller [3]. This is also called indirect data-driven control. As an alternative, there has been a surge of interest recently in designing control systems directly from process data while bypassing an intermediary system identification stage [1], [2], [3], [4]. This is called direct data-driven control. In many situations, direct data-driven approaches can be advantageous to indirect data-driven approaches for many reasons, some of which are listed in [5].
In many control systems, safety is a critical concern, and it is of vital importance to design control systems that respect safety concerns. Safety concerns are typically imposed on the system as constraints that must be satisfied all the time. Safety, however, is the bare minimum requirement of safety-critical systems, and it is desired to design safe controllers with some performance guarantees. To provide performance guarantees, optimal control design aims at finding the best controller from a set of admissible controllers based on certain performance criteria, such as maximizing profits, minimizing costs, or maximizing efficiency. Both safety and optimality are important concepts in the control of autonomous systems, and finding the balance between them is essential to achieving acceptable outcomes.
To ensure safety, the concept of set forward invariance has been widely utilized, which requires the system's state to remain within a safe set once it starts from that set [6]. In most existing approaches, control barrier functions (CBFs) are employed to guarantee the forward invariance of the safe set. CBFs are based on the notion of defining a continuously differentiable function, known as a barrier function, which maps the system's state to a scalar value. These CBFs establish a linear inequality condition for the system's input to ensure the forward invariance of the safety set. In many CBF-based methods, safety and performance requirements are combined through a quadratic programming (QP) formulation. This is realized through minimizing the intervention with a nominal or optimal controller while imposing point-wise control Lyapunov functions (CLFs) and CBFs as soft and hard inequality constraints, respectively [7]. This approach is reactive in the sense that the safety of the system is certified myopically at every time instance. Other reactive safety certificates are also presented in [8], [9]. This myopic intervention can also lead to convergence to undesired stable equilibrium points on the boundary of the safe set, as shown in [10]. Besides being myopic, another challenge with these existing safety certificates is that they require a complete and accurate knowledge of the system model. Moreover, the nominal or optimal controller is also assumed to be given a priori. A fundamental challenge is to learn optimal controllers and safe controllers using only a single trajectory of input-state data and then combining them to manage conflicts proactively.
To diminish the complete reliance on the model, a data-driven approach to safe control design is presented in [11] using the concept of contraction sets [12]. Besides, a data-driven safe controller is designed in [4] by utilizing the control barrier certificate notions. However, one drawback of these approaches is that they require the state derivatives to be measured or approximated. This can be costly or introduce noise to the measurements, which can significantly deteriorate the performance and can also jeopardize the system's safety.
Besides efficient data-driven safe control design without the state derivative requirements, which is lacking, sample-efficient learning algorithms for learning optimal controllers for CT systems are also surprisingly lacking. Reinforcement learning (RL) algorithms such as policy iteration [13] and policy gradients [13] have been widely utilized to learn optimal control policies. However, the iterative nature of these algorithms makes them data-hungry. The need for online data to evaluate control policies or the cost gradients can also be costly and risky. It is highly desired to develop one-shot learning algorithms that can learn an optimal control policy using only a single data trajectory. Finally, it is highly desired to combine two control policies (i.e., safe control and optimal control) to avoid convergence to an undesired equilibrium. This will be in sharp contrast to existing myopic approaches that assume an optimal controller is given and certify the safety of its actions using CBFs.
This paper presents a sample-efficient data-based safe, and optimal controller for CT linear quadratic regulator (LQR) problems with safety constraints. The presented approach 1) learns both safe and optimal control policies using convex optimization formulations and using only input-state data in one shot; 2) combines the two policies using a computationally efficient interpolation technique. A convex optimization formulation of the LQR presented in [14] is leveraged to develop the one-shot optimization for optimality. Then, to design both safe and optimal controllers, the closed-loop system is represented based on data and is used in the optimization frameworks to turn them into data-based convex optimization frameworks. The safety of the resulting controller is guaranteed. Besides, convergence to the equilibrium is also guaranteed. Importantly, our approach neither relies on knowledge of the system dynamics nor requires measuring or approximating the derivative of the state as sampled data. It is also shown that the safe controller predominantly contributes to the overall controller near safety boundaries, ensuring safety takes precedence. However, as the system trajectories move away from the safety boundaries, the optimal controller takes over the contribution. The feasibility and stability of the controller are demonstrated, and simulation results illustrate the effectiveness of our method.
A. Notations
In this paper,
Problem Formulation and Preliminaries
A. Problem Formulation: Safe Optimal Control
This subsection formalizes the safe optimal control problem and provides a background on its challenges.
Consider a continuous-time linear system described by
\begin{align*}
\dot{x}(t) = Ax(t) + Bu(t), \tag{1}
\end{align*}
Our main objective is to design a control input
To take into account optimality, the objective function for system (1) is considered as
\begin{align*}
{J (x(t),u(t))} = \int \nolimits _{t}^{ + \infty } {({x^{\prime }}(\tau)Qx(\tau)} + {u^{\prime }}(\tau)Ru(\tau))d\tau, \tag{2}
\end{align*}
Definition 1
(Control Barrier Functions): Suppose
\begin{align*}
\mathop {\sup }\limits _{u \in {\mathbb {U}}} [\dot{h}(x) + \gamma (h(x))]\geq 0.
\end{align*}
Based on Definition 1, upon existence of a CBF
\begin{align*}
{K_{cbf}}(x) = \lbrace u \in {\mathbb {U}}|\,\,\dot{h}(x) + \gamma (h(x)) \geq 0\rbrace. \tag{3}
\end{align*}
We now formalize the safe optimal control problem based on the performance (2) and the safety requirement (3) by the following optimization problem.
Problem 1 (Safe Optimal Control):
Find a control policy
\begin{align*}
&{\min \limits _{u\in \mathbb {U} }J(x,u) }\tag{4a}\\
\text{s.t.}\quad \dot{h}&(x) + \gamma (h(x)) \geq 0,\tag{4b}\\
&{\dot{x}=Ax+Bu}, \tag{4c}
\end{align*}
In this optimization, the cost
While reinforcement learning (RL) algorithms have been developed to solve the optimal LQR problem without requiring the knowledge of the system dynamics, the following challenges remain: 1) RL algorithms such as policy iteration [13] iteratively solve the LQR problem and they might need to perform many iterations to learn an optimal control policy. Besides, they can be data-intensive, especially if they need to generate new data at every iteration to evaluate a new policy. 2) Safe control design approaches are typically model-based methods, which can ruin the model-free nature of the RL algorithms, as the whole safe optimal control design still requires the system model. To observe the model dependence of the CBF-based safe control design, since the safety constraints depend on a general class
\begin{align*}
\frac{{\partial h(x)}}{{\partial x}}(Ax + Bu) + \gamma h(x) \geq 0 \tag{5}
\end{align*}
B. Background on State-Input Representation of the Closed-Loop System
The input and state measurements collected from the system are organized as follows
\begin{align*}
{U_{0,T}} &= [\begin{array}{cccc}u({t_{0}})&{u({t_{0}} + \tau) }&{{\cdots }}&{u({t_{0}} + (T - 1)} \end{array}\tau) ],\tag{6a}\\
{X_{0,T}} &= [\begin{array}{cccc}x({t_{0}})&{x({t_{0}} + \tau) }&{{\cdots }}&{x({t_{0}} + (T - 1)} \end{array}\tau) ], \tag{6b}
\end{align*}
\begin{align*}
{X_{1,T}} = [\begin{array}{lccc}\dot{x}({t_{0}})&{\dot{x}({t_{0}} + \tau) }&{{\cdots }}&{\dot{x}({t_{0}} + (T - 1)} \end{array}\tau) ].
\end{align*}
Assumption 1:
The matrix
For Assumption 1 to be held,
The following results from the literature provide a data-based representation of the closed-loop system while leveraging the state-input-state derivative data.
Lemma 1 ([4]):
Consider the system (1) under the nonlinear control input
\begin{align*}
{{\mathbb {I}}_{n}} = {X_{0,T}}W(x).
\end{align*}
If
\begin{align*}
\dot{x} = {X_{1,T}}W(x)x, \tag{7}
\end{align*}
\begin{align*}
A + BF(x) = {X_{1,T}}W(x), \tag{8}
\end{align*}
Remark 1:
(8) provides a data-based representation of the closed-loop system dynamics and can replace the constraint (5) to find the safe controller in a data-driven manner. However, the drawback of data-driven approaches presented in [3], [4] is that the derivatives of the state of the system are needed, which are typically unavailable as direct measurements and have the following representation [1]
\begin{align*}
{X_{1,T}} = A{X_{0,T}} + B{U_{0,T}}= [\begin{array}{cc}B&A \end{array}]\left[ {\begin{array}{c}{U_{0,T}}\\
{{X_{0,T}}} \end{array}} \right]. \tag{9}
\end{align*}
To address the challenges outlined above in existing approaches, this paper presents a new learning-enabled safe optimal control design approach to solve the optimization problem (4) in a data-driven manner. Addressing both safety and optimality simultaneously presents its own set of challenges, particularly when formulating the problem solely based on data, which adds further complexity. However, the proposed approach in this paper is able to tackle these issues and can entail the construction of data-driven safe control input, as well as the development of a one-shot optimization to find optimal control input. This methodology is achieved by bypassing the necessity for explicit knowledge of system dynamics and eliminating the requirement for knowledge of state derivatives from sampled data. Instead, our method relies solely on measured input/state data to trade-off between safety and optimality.
Data-Driven System Representation
To design a safe controller while the system dynamic is unknown and the derivative of the state of the system is not available, it is desired to write constraint (5) in terms of only input and state measurements. Besides, to design an optimal controller in a one-shot and data-efficient manner, a data-based representation of the closed-loop system is required. This section provides a data-based representation of open-loop systems for polynomial controllers and a data-based closed-loop representation for linear controllers.
To ensure that the safe control input covers the entire safe region, the following general format is considered for the safe controller:
\begin{align*}
{u^{s}}=F(x)x. \tag{10}
\end{align*}
\begin{align*}
\dot{x} = \left({A + BF(x)} \right)\ x. \tag{11}
\end{align*}
To obviate the requirement of the derivative of the state of the system in the proposed approach, the following theorem is presented.
Theorem 1:
Consider the system (1). Let
\begin{align*}
[\begin{array}{lc}B&A \end{array}] = ({{\mathcal H}_{T}}(x(T)) - {{\mathcal H}_{T}}(x(0))){D^{\prime }}{(D{D^{\prime }})^{ - 1}}, \tag{12}
\end{align*}
\begin{align*}
{{\mathcal H}_{T}}\left({x(t)} \right) = \left[ {\begin{array}{lcc}x(t)&{x\left({t + T} \right)}& \cdots \end{array}\;\;x\left({t + (n - 1)T} \right)\;} \right], \tag{13}
\end{align*}
\begin{align*}
D = \left[ {\begin{array}{c}{U_{0,T}}\\
{{X_{0,T}}} \end{array}} \right]({X^{\prime }_{0,T}}{({X_{0,T}}.{X^{\prime }_{0,T}})^{ - 1}})(\mathop \smallint \nolimits _{0}^{T} {{\mathcal H}_{T}}(x)d\tau). \tag{14}
\end{align*}
Proof:
(13) represents a time-varying matrix defined on the interval
\begin{align*}
{{\mathcal H}_{T}}(\dot{x}(t)) = {{\mathcal H}_{T}}((A + BF(x))x(t)). \tag{15}
\end{align*}
Based on Lemma 1, in
\begin{align*}
{{\mathcal H}_{T}}(x(T)) - {{\mathcal H}_{T}}(x(0)) = [\begin{array}{lc}B&A \end{array}]\left[ {\begin{array}{c}{U_{0,T}}\\
{{X_{0,T}}} \end{array}} \right] \\
\times({X^{\prime }_{0,T}}{({X_{0,T}}.{X^{\prime }_{0,T}})^{ - 1}})(\mathop \smallint \nolimits _{0}^{T} {{\mathcal H}_{T}}(x)d\tau). \tag{16}
\end{align*}
By considering (12) in constraint (5), we can find the safe control policy using only input and state measurements. Optimal LQR control, however, requires a linear controller, which can further reduce the data requirements. The following corollary shows that for linear controllers, as a special case of polynomial controllers, the entire closed-loop system can be represented by data. Learning the closed-loop system directly to satisfy safety or optimality requires less amount of data than learning the open-loop dynamics first and then designing a safe controller.
Corollary 1:
Consider the system (1) under the linear controller
\begin{align*}
{{\mathcal H}_{T}}(\dot{x}) = (A + BK){{\mathcal H}_{T}}(x), \tag{17}
\end{align*}
\begin{align*}
\left({A + BK} \right) = ({{\mathcal H}_{T}}(x(T)) - {{\mathcal H}_{T}}(x(0))){(\mathop \smallint \nolimits _{0}^{T} {{\mathcal H}_{T}}(x)d\tau) ^{ - 1}}. \tag{18}
\end{align*}
Moreover, this closed-loop representation can be leveraged to design safe or optimal controllers if the rank of
Proof:
Using
\begin{align*}
\dot{x} = (A + BK)x. \tag{19}
\end{align*}
Applying (13) in (19) results in (17). Taking the integral from this equation yields
\begin{align*}
{{\mathcal H}_{T}}(x\left({T)} \right) - {{\mathcal H}_{T}}(x\left({0)} \right) = \left({A + BK} \right)\mathop \smallint \nolimits _{0}^{T} {{\mathcal H}_{T}}(x)d\tau.
\end{align*}
Since only the closed-loop system with the size
Data-Driven Optimal Control Design
Finding the optimal control input that minimizes (2) for the linear system (1) leads to solving the algebraic Riccati equation [15], which requires complete knowledge of the system dynamics. To obviate this requirement, iterative RL-based approaches [13], [16], [17] have been presented to learn the optimal control policy. Despite their advantages, iterative RL algorithms have disadvantages compared to the one-shot optimization methods. A significant drawback of iterative RL approaches pertains to their demanding computational requirements. In contrast, the one-shot optimization strategies center around directly solving optimization problems to discover the best solution at once, displaying the potential for greater efficiency in computational processing and data utilization. Even though one-shot learning of LQR for discrete-time systems is considered [1], it is lacking for continuous-time systems.
To this end, in this paper, we present a one-shot learning approach for solving the LQR problem for continuous-time systems (i.e., learning the optimal controller
Lemma 2 ([14]):
Consider the linear system (1) with the quadratic cost function (2). Then, the control gain that optimizes the cost is obtained by
\begin{align*}
\mathop {\min }\limits _{P,Y} (Trace(QP) &+ Trace({R^{\frac{1}{2}}}Y{P^{ - 1}}{Y^{\prime }}{R^{\frac{1}{2}}}))\tag{20a}\\
\text{s.t.}\; AP + P{A^{\prime }} &- BY - {Y^{\prime }}{B^{\prime }} + I \prec 0,\tag{20b}\\
&P = {P^{\prime }} \succ 0, \tag{20c}
\end{align*}
According to [14], [18], the objective function in (20) comprises the sum of two components, and the second term
\begin{align*}
\phi (P,Y) =\min (Trace(X))\\
\text{s.t.}\;\left[ {\begin{array}{lc}X&{{R^{\frac{1}{2}}}Y}\\
{{Y^{\prime }}{R^{\frac{1}{2}}}}&P \end{array}} \right]\succ 0 .
\end{align*}
\begin{align*}
&\min \beta \tag{21a}\\
\text{s.t.}\; C(\beta,&P,Y,X) \succ 0, \tag{21b}
\end{align*}
\begin{align*}
C(\beta, P,Y,X) &= diag({C_{1}},{C_{2}},{C_{3}}), \tag{22a}\\
{C_{1}}(\beta,P,Y,X) &= - Trace({QP}) - Trace(X) + \beta, \tag{22b}\\
{C_{2}}(\beta, P,Y,X) &= - AP - P{A^{\prime }} + BY + {Y^{\prime }}{B^{\prime }} - I,\tag{22c}\\
{C_{3}}(\beta, P,Y,X) &= \left[ {\begin{array}{lc}X&{{R^{\frac{1}{2}}}Y}\\
{{Y^{\prime }}{R^{\frac{1}{2}}}}&P \end{array}} \right]. \tag{22d}
\end{align*}
Theorem 2:
Consider the system (1) and the optimization problem in (21). Let
\begin{align*}
&\min \beta \tag{23a}\\
\text{s.t.}\; C(\beta, P,Y,X)=&diag({C_{1}},{C_{2}},{C_{3}}) \succ 0, \tag{23b}
\end{align*}
where
\begin{align*}
{C_{1}} &= - Trace(QP) - Trace(X) + \beta,\tag{24a}\\
{C_{2}} &= \left({({{\mathcal H}_{T}}(x(T)) - {{\mathcal H}_{T}}(x(0))){D^{\prime }}{{(D{D^{\prime }})}^{ - 1}}} \right)\left[ {\begin{array}{c}Y\\
{ - P} \end{array}} \right] \\
&\quad + {\left[ {\begin{array}{c}Y\\
{ - P} \end{array}} \right]^{\prime }}{\left({({{\mathcal H}_{T}}(x(T)) - {{\mathcal H}_{T}}(x(0))){D^{\prime }}{{(D{D^{\prime }})}^{ - 1}}} \right)^{\prime }} - I,\tag{24b}\\
{C_{3}} &= \left[ {\begin{array}{cc}X&{{R^{\frac{1}{2}}}Y}\\
{{Y^{\prime }}{R^{\frac{1}{2}}}}&P \end{array}} \right]. \tag{24c}
\end{align*}
Proof:
For system (1), if
\begin{align*}
\dot{x} = (A + BK)x.
\end{align*}
\begin{align*}
{{\mathcal H}_{T}}(\dot{x}) = (A + BK){{\mathcal H}_{T}}(x). \tag{25}
\end{align*}
\begin{align*}
{{\mathcal H}_{T}}(x\left({T)} \right) - {{\mathcal H}_{T}}(x\left({0)} \right) = \left({A + BK} \right)\mathop \smallint \nolimits _{0}^{T} {{\mathcal H}_{T}}(x)d\tau.
\end{align*}
\begin{align*}
\left[ {\begin{array}{cc}B&A \end{array}} \right]\left[ {\begin{array}{c}Y\\
{ - P} \end{array}} \right] + {\left[ {\begin{array}{c}Y\\
{ - P} \end{array}} \right]^{\prime }}{\left[ {\begin{array}{cc}B&A \end{array}} \right]^{\prime }} - I \succ 0. \tag{26}
\end{align*}
In (24),
Remark 2:
Using approaches presented in Sections III and IV, the data-driven safe control design approach and data-driven one-shot approach to find optimal control input can be obtained. However, it is desired to consider both safety and optimality simultaneously in the design of control inputs for systems. To do this, in the next section, a new approach is presented to combine safe and optimal control inputs proactively.
Data-Driven Initiative Controller
To consider both safety and optimality in the control design, an initiative control input
\begin{align*}
u = \alpha (t) {u^{L}} + (1 - \alpha (t)) {u^{s}}, \tag{27}
\end{align*}
In control policy (27), two special scenarios could happen: 1) the system trajectories are far from the safety boundary. Therefore,
The goal is formulated as the following maximization problem
\begin{align*}
\max \alpha (t)&\tag{28a}\\
\text{s.t.}\;\ \ \frac{{\partial h}}{{\partial x}}(Ax + Bu) &+\gamma h \geq 0. \tag{28b}
\end{align*}
Theorem 3:
Consider the system (1) and the maximization problem in (28). Consider the collected data in (6). The data-based representation of the maximization problem in (28) with its constraint could be written as
\begin{gather*}
\max \alpha (t) \tag{29a}\\
\frac{{\partial h}}{{\partial x}}(({{\mathcal H}_{T}}(x(T)) - {{\mathcal H}_{T}}(x(0))){D^{\prime }}{(D{D^{\prime }})^{ - 1}} \\
\times\left[ {\begin{array}{l}{U_{0,T}}\\
{{X_{0,T}}} \end{array}} \right]{({X_{0,T}})^\dag }x \\
- \alpha (t) {(RK^{L}{P^{ - 1}})^{\prime }}({u^{L}} - {U_{0,T}}{({X_{0,T}})^\dag }x)) + \gamma h \geq 0. \tag{29b}
\end{gather*}
Proof:
For system (1), by substituting the proposed initiative control input (27) in the constraint of (28), one has
\begin{align*}
\frac{{\partial h}}{{\partial x}}(Ax + B(\alpha (t) {u^{L}} + (1 - \alpha (t)){u^{s}})) + \gamma h \geq 0. \tag{30}
\end{align*}
\begin{align*}
\frac{{\partial h}}{{\partial x}}((Ax + B{u^{s}}) + B\alpha (t) ({u^{L}} - {u^{s}})) + \gamma h \geq 0. \tag{31}
\end{align*}
\begin{align*}
\frac{{\partial h}}{{\partial x}}((Ax + BF(x)x) + B\alpha (t) ({u^{L}} - F(x)x)) + \gamma h \geq 0. \tag{32}
\end{align*}
\begin{align*}
&\frac{{\partial h}}{{\partial x}}([\begin{array}{lc}B&A \end{array}]\left[ {\begin{array}{l}{U_{0,T}}\\
{{X_{0,T}}} \end{array}} \right]{({X_{0,T}})^\dag }x +B\alpha (t) ({u^{L}} - F(x)x)) \\
&\qquad + \gamma h \geq 0. \tag{33}
\end{align*}
\begin{align*}
&\frac{{\partial h}}{{\partial x}}(({{\mathcal H}_{T}}(x(T)) - {{\mathcal H}_{T}}(x(0))){D^{\prime }}{(D{D^{\prime }})^{ - 1}} \\
&\quad \times \left[ {\begin{array}{c}{U_{0,T}}\\
{{X_{0,T}}} \end{array}} \right]{({X_{0,T}})^\dag }x + B\alpha (t) ({u^{L}} - F(x)x)) + \gamma h \geq 0.
\end{align*}
Since
\begin{align*}
\max &\alpha (t) \tag{34a}\\
&\frac{{\partial h}}{{\partial x}}(({{\mathcal H}_{T}}(x(T)) - {{\mathcal H}_{T}}(x(0))){D^{\prime }}{(D{D^{\prime }})^{ - 1}} \\
\quad \times&\left[ {\begin{array}{c}{U_{0,T}}\\
{{X_{0,T}}} \end{array}} \right]{({X_{0,T}})^\dag }x \\
+& B\alpha (t) ({u^{L}} - {U_{0,T}}(X_{0,T})^\dag x)) + \gamma h \geq 0. \tag{34b}
\end{align*}
As can be seen, only the
\begin{align*}
{u^{L}} = -{R^{ - 1}}{B^{\prime }}Px,
\end{align*}
\begin{align*}
K^{L} &= -{R^{ - 1}}{B^{\prime }}P \Rightarrow RK^{L} = -{B^{\prime }}P,\tag{35a}\\
RK^{L}{P^{ - 1}} &= -{B^{\prime }} \Rightarrow B = -{(RK^{L}{P^{ - 1}})^{\prime }}. \tag{35b}
\end{align*}
\begin{align*}
V = {x^{\prime }}Px,
\end{align*}
Substituting matrix
Corollary 2:
Consider the system (1). Assume the linear controller
\begin{gather*}
\max \alpha (t) \tag{36a}\\
\frac{{\partial h}}{{\partial x}}({{\mathcal H}_{T}}(x(T)) - {{\mathcal H}_{T}}(x(0))){\left({\mathop \smallint \nolimits _{0}^{T} {{\mathcal H}_{T}}(x)d\tau } \right)^{\prime }} \\
{\left({(\mathop \smallint \nolimits _{0}^{T} {{\mathcal H}_{T}}(x)d\tau) {{(\mathop \smallint \nolimits _{0}^{T} {{\mathcal H}_{T}}(x)d\tau) }^{\prime }}} \right)^{ - 1}}x \\
-\alpha {(RK^{L}{P^{ - 1}})^{\prime }}({u^{L}} - {U_{0,T}}{({X_{0,T}})^\dag }x)) + \gamma h \geq 0. \tag{36b}
\end{gather*}
Proof:
If we consider
\begin{align*}
\frac{{\partial h}}{{\partial x}}(((A + Bk)x) + B\alpha (t) ({u^{L}} - K^{s}x)) + \gamma h \geq 0. \tag{37}
\end{align*}
\begin{align*}
&\frac{{\partial h}}{{\partial x}}(({{\mathcal H}_{T}}(x(T)) - {{\mathcal H}_{T}}(x(0))){\left({\mathop \smallint \nolimits _{0}^{T} {{\mathcal H}_{T}}(x)d\tau } \right)^{\prime }} \\
&\quad \times{\left({(\mathop \smallint \nolimits _{0}^{T} {{\mathcal H}_{T}}(x)d\tau) {{(\mathop \smallint \nolimits _{0}^{T} {{\mathcal H}_{T}}(x)d\tau) }^{\prime }}} \right)^{ - 1}}x \\
&\quad + B\alpha ({u^{L}} - K^{s}x)) + \gamma h \geq 0. \tag{38}
\end{align*}
\begin{align*}
&\frac{{\partial h}}{{\partial x}}(({{\mathcal H}_{T}}(x(T)) - {{\mathcal H}_{T}}(x(0))){\left({\mathop \smallint \nolimits _{0}^{T} {{\mathcal H}_{T}}(x)d\tau } \right)^{\prime }} \\
&\quad \times{\left({(\mathop \smallint \nolimits _{0}^{T} {{\mathcal H}_{T}}(x)d\tau) {{(\mathop \smallint \nolimits _{0}^{T} {{\mathcal H}_{T}}(x)d\tau) }^{\prime }}} \right)^{ - 1}}x \\
&\quad + B\alpha ({u^{L}} - {U_{0,T}}{({X_{0,T}})^\dag }x)) + \gamma h \geq 0. \tag{39}
\end{align*}
By considering (39) and (35) instead of the constraint in (28), we get (36), and this completes the proof.
Note that by considering a linear safe control structure
Stability and Feasibility Analysis
In this part, the proofs of recursive feasibility and asymptotic stability of the system, which play a prominent role, are provided. Since these proofs are inspired by the interpolating control method, first, the interpolating control is reviewed.
Interpolating control relies on a vertex representation approach, which involves smoothly interpolating between a controller located at the vertex of a feasible set and a high-gain feedback controller that prioritizes safety and robustness. The interpolation process is designed to ensure that the resulting control input stays within the feasible set while still achieving the desired performance. Here, based on the interpolating control theory, an interpolation coefficient is presented, which is
\begin{align*}
x(t)={\alpha (t)}{x}^{L}(t) + \left({1 - {\alpha (t)}} \right){x}^{s}(t), \tag{40}
\end{align*}
In the concept of interpolating control, the following constraint should be satisfied for the controlled invariant set
\begin{align*}
{C_{\bar{t}}} = \left\lbrace {x \in {R^{n}}:{F_{\bar{t}}}x \leq {g_{\bar{t}}}} \right\rbrace
\end{align*}
Theorem 4:
Recursive feasibility: The control method described by (27) and (40), which involves interpolating at regular intervals, is capable of achieving a feasible solution for system (1), for all initial states
Proof:
In terms of recursive feasibility, it has to be proven that
\begin{align*}
F_{u}u(t) &= {F_{u}}(\alpha (t){u^{L}}(t) + (1 - \alpha (t)){u^{s}}(t))\\
&= \alpha (t){F_{u}}{u^{L}}(t) + (1 - \alpha (t)){F_{u}}{u^{s}}(t)\\
&\leq (\alpha (t){g_{u}} + (1 - \alpha (t)){g_{u}}) = {g_{u}},
\end{align*}
\begin{align*}
\dot{x}(t) &= Ax(t) + Bu(t)\\
&= A(\alpha (t){x^{L}}(t) + (1 - \alpha (t)){x^{s}}(t)\,) + B(\alpha (t){u^{L}}(t) \\
&\quad +(1 - \alpha (t)){u^{s}}(t)\,)\\
&\!=\! \alpha (t) (A{x^{L}}(t) \!+ B{u^{L}}(t)) \!+ (1 - \alpha (t)) (A{x^{s}}(t) \!+ B{u^{s}}(t)).
\end{align*}
Since
Theorem 5:
Asymptotic stability: Considering system (1), the control law (27) guarantees asymptotic stability for all initial states
Proof:
It has to be proven that all solutions starting in
\begin{align*}
x(t)&={\alpha ^*(t)}{x^*}^{L}(t) + \left({1 - {\alpha ^*(t)}} \right){x^*}^{s}(t),\\
u(t)&={\alpha ^*(t)}{u}^{L}(t) + \left({1 - {\alpha ^*(t)}} \right){u}^{s}(t).
\end{align*}
\begin{align*}
\dot{x}(t) &= Ax(t) + Bu(t)\\
&={\alpha ^*(t)}{\dot{x}}^{L}(t) + \left({1 - {\alpha ^*(t)}} \right){\dot{x}}^{s}(t),
\end{align*}
\begin{align*}
{\dot{x}}^{L}(t)&=A{x^*}^{L}(t) + B{u}^{L}(t)\in {\Omega _{\max }},\\
{\dot{x}}^{s}(t)&=A{x^*}^{s}(t) + B{u}^{s}(t) \in {C_{\bar{t}}}.\\
\end{align*}
\begin{align*}
{{\alpha }^*}(t)={{\alpha }^*}(x(t))&= \max \limits _{\alpha,r^{L}} \alpha \tag{41a}\\
\qquad \text{s.t.}\quad F_{0} r^{L} &\leq \alpha g_{0},\tag{41b}\\
F_{N} (x(t)-r^{L}) &\leq (1-\alpha)g_{N},\tag{41c}\\
0 \leq \alpha &\leq 1, \tag{41d}
\end{align*}
\begin{align*}
{{\dot{\alpha }}^*}(t)={{\alpha }^*}(\dot{x}(t))&= \max \limits _{\alpha,r^{L}} \alpha \tag{42a}\\
\qquad \qquad \text{s.t.}\;\ F_{0} r^{L} &\leq \alpha g_{0},\tag{42b}\\
F_{N} (\dot{x}(t)-r^{L}) &\leq (1-\alpha)g_{N},\tag{42c}\\
0 &\leq \alpha \leq 1. \tag{42d}
\end{align*}
The vertex control law ensures that the system is capable of achieving feasible solutions in a recursive manner, meaning that feasible solutions can be found repeatedly over time. Furthermore, it ensures that the system will be asymptotically stable. This means that the system will eventually converge to a stable state, and this stability will persist even if there are minor variations in the initial conditions of the system.
Simulation Results
In this section, two numerical examples have been brought to verify the effectiveness of the proposed approach.
A. Example 1:
Consider the double integrator system, which is a canonical example of a second-order control system, the state space model has the following format:
\begin{align*}
{{\dot{x}}_{1}} &= {x_{2}},\\
{{\dot{x}}_{2}} &= u.
\end{align*}
The matrices regarding the cost function in (2) are being considered as follows:
\begin{align*}
Q &= \left[ {\begin{array}{cc}1&0\\
0&1 \end{array}} \right],\\
R &= 1.
\end{align*}
The initial condition has been set as
As can be seen in Fig. 1, the unsafe set is shown as a red ellipse and the system trajectory has been shown by the blue line. This figure shows the system trajectory converges to the equilibrium point. Fig. 2 demonstrates the control input of a data-driven initiative controller, which indicates both safe control and optimal control input. In Fig. 3, the trajectory of
B. Example 2:
In order to demonstrate our approach better and consider a realistic example, the lane-keeping problem has been considered for an autonomous vehicle as another numerical example. The state space model of the lane-keeping problem is given as:
\begin{align*}
\begin{bmatrix}\dot{y} \\
\dot{v} \\
\dot{\phi } \\
\dot{\psi } \end{bmatrix} \!=\! \begin{bmatrix}0 \!&\! 1 \!&\! V_{0} \!&\! 0 \\
0 \!&\! -\frac{C_{f} + C_{r}}{M V_{0}} \!&\! 0 \!&\! \!-\frac{b C_{r} - a C_{f}}{M V_{0}} \!- V_{0} \\
0 \!&\! 0 \!&\! 0 \!&\! 1 \\
0 \!&\! \!-\frac{b C_{r} \!- a C_{f}}{I_{z} V_{0}} \!&\! 0 \!&\! 0 \end{bmatrix} \begin{bmatrix}y \\
v \\
\phi \\
\psi \end{bmatrix} \!+\! \begin{bmatrix}0 \\
\frac{C_{f}}{M} \\
0 \\
\frac{a C_{f}}{I_{z}} \!\end{bmatrix} u, \tag{43}
\end{align*}
The problem aims to seek to maintain the vehicle's position in the center of the driving lane. The details of this system model can be found in [22]. The states of the system could be denoted as:
\begin{align*}
\mathbf {x} = \begin{bmatrix}x_{1} \\
x_{2} \\
x_{3} \\
x_{4} \\
\end{bmatrix} = \begin{bmatrix}y \\
v \\
\phi \\
\psi \\
\end{bmatrix}^{T}
\end{align*}
\begin{align*}
Q =& 10 \times I_{4\times 4},\\
R =& 1.
\end{align*}
Conclusion
This article has developed a novel control law that merges safety and optimality at the same time. The key advantage of our proactive approach is its independence from knowledge of the system dynamics. In our proposed method, a data-driven safe controller is formulated, and optimal control input is computed based on data using a one-shot optimization problem. The initiative maximization problem has been formulated based on input/state data, eliminating the need for state derivatives in the formulation. The recursive feasibility and asymptotic stability of the control input have been proven. It is shown that the coefficient allocated to safety and optimal control input is proportional to its distance from the safety boundary, indicating a trade-off between safety and optimality. Simulation results have been presented to demonstrate the effectiveness of the proposed method.