A Computationally-Efficient Data-Driven Safe Optimal Algorithm Through Control Merging

SECTION I.

Introduction

Designing data-based or learning-enabled controllers that satisfy system specifications (e.g., safety, stability, and performance) has emerged as a critical concern recently in control engineering [1]. More specifically, as a prominent control design approach, data-driven control design involves learning controllers for an unknown system based solely on measurements obtained from the system and some prior information about its characteristics [2]. A common approach is to first identify a system model through input-output measurements of the system and then use the identified model to design a model-based controller [3]. This is also called indirect data-driven control. As an alternative, there has been a surge of interest recently in designing control systems directly from process data while bypassing an intermediary system identification stage [1], [2], [3], [4]. This is called direct data-driven control. In many situations, direct data-driven approaches can be advantageous to indirect data-driven approaches for many reasons, some of which are listed in [5].

In many control systems, safety is a critical concern, and it is of vital importance to design control systems that respect safety concerns. Safety concerns are typically imposed on the system as constraints that must be satisfied all the time. Safety, however, is the bare minimum requirement of safety-critical systems, and it is desired to design safe controllers with some performance guarantees. To provide performance guarantees, optimal control design aims at finding the best controller from a set of admissible controllers based on certain performance criteria, such as maximizing profits, minimizing costs, or maximizing efficiency. Both safety and optimality are important concepts in the control of autonomous systems, and finding the balance between them is essential to achieving acceptable outcomes.

To ensure safety, the concept of set forward invariance has been widely utilized, which requires the system's state to remain within a safe set once it starts from that set [6]. In most existing approaches, control barrier functions (CBFs) are employed to guarantee the forward invariance of the safe set. CBFs are based on the notion of defining a continuously differentiable function, known as a barrier function, which maps the system's state to a scalar value. These CBFs establish a linear inequality condition for the system's input to ensure the forward invariance of the safety set. In many CBF-based methods, safety and performance requirements are combined through a quadratic programming (QP) formulation. This is realized through minimizing the intervention with a nominal or optimal controller while imposing point-wise control Lyapunov functions (CLFs) and CBFs as soft and hard inequality constraints, respectively [7]. This approach is reactive in the sense that the safety of the system is certified myopically at every time instance. Other reactive safety certificates are also presented in [8], [9]. This myopic intervention can also lead to convergence to undesired stable equilibrium points on the boundary of the safe set, as shown in [10]. Besides being myopic, another challenge with these existing safety certificates is that they require a complete and accurate knowledge of the system model. Moreover, the nominal or optimal controller is also assumed to be given a priori. A fundamental challenge is to learn optimal controllers and safe controllers using only a single trajectory of input-state data and then combining them to manage conflicts proactively.

To diminish the complete reliance on the model, a data-driven approach to safe control design is presented in [11] using the concept of contraction sets [12]. Besides, a data-driven safe controller is designed in [4] by utilizing the control barrier certificate notions. However, one drawback of these approaches is that they require the state derivatives to be measured or approximated. This can be costly or introduce noise to the measurements, which can significantly deteriorate the performance and can also jeopardize the system's safety.

Besides efficient data-driven safe control design without the state derivative requirements, which is lacking, sample-efficient learning algorithms for learning optimal controllers for CT systems are also surprisingly lacking. Reinforcement learning (RL) algorithms such as policy iteration [13] and policy gradients [13] have been widely utilized to learn optimal control policies. However, the iterative nature of these algorithms makes them data-hungry. The need for online data to evaluate control policies or the cost gradients can also be costly and risky. It is highly desired to develop one-shot learning algorithms that can learn an optimal control policy using only a single data trajectory. Finally, it is highly desired to combine two control policies (i.e., safe control and optimal control) to avoid convergence to an undesired equilibrium. This will be in sharp contrast to existing myopic approaches that assume an optimal controller is given and certify the safety of its actions using CBFs.

This paper presents a sample-efficient data-based safe, and optimal controller for CT linear quadratic regulator (LQR) problems with safety constraints. The presented approach 1) learns both safe and optimal control policies using convex optimization formulations and using only input-state data in one shot; 2) combines the two policies using a computationally efficient interpolation technique. A convex optimization formulation of the LQR presented in [14] is leveraged to develop the one-shot optimization for optimality. Then, to design both safe and optimal controllers, the closed-loop system is represented based on data and is used in the optimization frameworks to turn them into data-based convex optimization frameworks. The safety of the resulting controller is guaranteed. Besides, convergence to the equilibrium is also guaranteed. Importantly, our approach neither relies on knowledge of the system dynamics nor requires measuring or approximating the derivative of the state as sampled data. It is also shown that the safe controller predominantly contributes to the overall controller near safety boundaries, ensuring safety takes precedence. However, as the system trajectories move away from the safety boundaries, the optimal controller takes over the contribution. The feasibility and stability of the controller are demonstrated, and simulation results illustrate the effectiveness of our method.

A. Notations

In this paper, $\mathbb {N} > 0$ ($\mathbb {N}^+$) shows the set of positive natural numbers, excluding zero. Also, $\mathbb {R} > 0$ ($\mathbb {R}^+$) demonstrates the positive real numbers, excluding zero. $\mathbb {X}_{u}$ represents the unsafe set. $\mathbb {U}$ shows the set of all admissible control inputs. The safe control input is shown by ${u}^{s}$, and ${u}^{L}$ presents the learning-based optimal control input. ${{\mathbb {I}}_{n}}$ is represented as an identity matrix in $ \mathbb {R}^{n \times n}$. A prime symbol has been used to show the transpose of the matrix. Therefore, the transpose of matrix $A$ has been represented as ${A}^{\prime }$. A symmetric matrix $P \in \mathbb {R}^{n \times n}$ is said to be positive definite, denoted by $P\succ 0$, if all of its eigenvalues are positive. Note that ${({X_{0,T}}\,)^\dag }$ represents the Pseudo inverse of $X_{0,T}$.

SECTION II.

Problem Formulation and Preliminaries

A. Problem Formulation: Safe Optimal Control

This subsection formalizes the safe optimal control problem and provides a background on its challenges.

Consider a continuous-time linear system described by \begin{align*} \dot{x}(t) = Ax(t) + Bu(t), \tag{1} \end{align*} View Sourcewhere $x(t)\in \mathbb {R}^{n}$ denotes the state of the system, and $u(t)\in \mathbb {U}$ indicates the control input. For the notation simplicity, $x(t)$ and $u(t)$ are replaced by $x$ and $u$, respectively, in the rest of the paper.

Our main objective is to design a control input $u$ for (1), certifying safety and optimality while no prior knowledge of the system dynamics is required.

To take into account optimality, the objective function for system (1) is considered as \begin{align*} {J (x(t),u(t))} = \int \nolimits _{t}^{ + \infty } {({x^{\prime }}(\tau)Qx(\tau)} + {u^{\prime }}(\tau)Ru(\tau))d\tau, \tag{2} \end{align*} View Sourcein which $R$ and $Q$ are positive-definite symmetric matrices (i.e., $R = {R^{\prime }} > 0, Q = {Q^{\prime }} > 0$).

Definition 1

(Control Barrier Functions): Suppose ${\mathcal {C}} \subset {\mathcal {D}} \subset {\mathbb {R}^{n}}$ represents the super level set of a continuously differentiable function $h(x) : {\mathcal {D}} \rightarrow \mathbb {R}$. Then, $h(x)$ is called a Control Barrier Function (CBF) for the control system (1) if there exists an extended class $\mathcal {K}$ function $\gamma$ such that [7]:

\begin{align*} \mathop {\sup }\limits _{u \in {\mathbb {U}}} [\dot{h}(x) + \gamma (h(x))]\geq 0. \end{align*} View Source

Based on Definition 1, upon existence of a CBF $h(x)$, the set of all control inputs that render ${\mathcal {C}}$ safe is [7]: \begin{align*} {K_{cbf}}(x) = \lbrace u \in {\mathbb {U}}|\,\,\dot{h}(x) + \gamma (h(x)) \geq 0\rbrace. \tag{3} \end{align*} View Source

We now formalize the safe optimal control problem based on the performance (2) and the safety requirement (3) by the following optimization problem.

Problem 1 (Safe Optimal Control):

Find a control policy $u$ that solves \begin{align*} &{\min \limits _{u\in \mathbb {U} }J(x,u) }\tag{4a}\\ \text{s.t.}\quad \dot{h}&(x) + \gamma (h(x)) \geq 0,\tag{4b}\\ &{\dot{x}=Ax+Bu}, \tag{4c} \end{align*} View Source

In this optimization, the cost $J(x,u)$ defined in (2) is optimized while ensuring that the safety constraint is certified (i.e., the control input belongs to the set $K_{cbf}$ defined in (3)) in the presence of uncertainty in the system dynamics. Solving the constrained optimal control problem (4) is very difficult in general, even if the system dynamic is known. This is because it is computationally intractable to find a feedback controller $u$ that minimizes the objective function and satisfies the safety constraint simultaneously. Due to the difficulty of directly solving the optimization (4), we separate safety concern and optimality concern by designing a safe controller and an optimal controller separately and combining them with each other while optimizing the contribution of the optimal controller.

While reinforcement learning (RL) algorithms have been developed to solve the optimal LQR problem without requiring the knowledge of the system dynamics, the following challenges remain: 1) RL algorithms such as policy iteration [13] iteratively solve the LQR problem and they might need to perform many iterations to learn an optimal control policy. Besides, they can be data-intensive, especially if they need to generate new data at every iteration to evaluate a new policy. 2) Safe control design approaches are typically model-based methods, which can ruin the model-free nature of the RL algorithms, as the whole safe optimal control design still requires the system model. To observe the model dependence of the CBF-based safe control design, since the safety constraints depend on a general class $\mathcal {K}$ function, in the following, we assume that $\gamma (h(x))=\gamma h(x)$, for which the safety constraint in the optimization (4) turns into \begin{align*} \frac{{\partial h(x)}}{{\partial x}}(Ax + Bu) + \gamma h(x) \geq 0 \tag{5} \end{align*} View SourceAs can be seen, the system dynamic shows up in safety constraint (5). Even though data-based safe control design is considered in [4], state derivative measurements are required, and no performance requirements are considered. To resolve these disadvantages of safe optimal control learning methods, this paper aims to learn safe and optimal controllers using a single trajectory of state-input data and in a one-shot or non-iterative fashion.

B. Background on State-Input Representation of the Closed-Loop System

The input and state measurements collected from the system are organized as follows \begin{align*} {U_{0,T}} &= [\begin{array}{cccc}u({t_{0}})&{u({t_{0}} + \tau) }&{{\cdots }}&{u({t_{0}} + (T - 1)} \end{array}\tau) ],\tag{6a}\\ {X_{0,T}} &= [\begin{array}{cccc}x({t_{0}})&{x({t_{0}} + \tau) }&{{\cdots }}&{x({t_{0}} + (T - 1)} \end{array}\tau) ], \tag{6b} \end{align*} View Sourcewhere $T \in \mathbb {N}^{+}$ shows the number of collected samples and $\tau \in \mathbb {R}^{+}$ demonstrates the sampling time. To compare with the existing results, we also introduce the collected state derivative data as follows, even though our learning approach will not need it. \begin{align*} {X_{1,T}} = [\begin{array}{lccc}\dot{x}({t_{0}})&{\dot{x}({t_{0}} + \tau) }&{{\cdots }}&{\dot{x}({t_{0}} + (T - 1)} \end{array}\tau) ]. \end{align*} View Source

Assumption 1:

The matrix ${X_{0,T}} \in R^{n \times T}$ in (6) is full row rank.

For Assumption 1 to be held, $T \geq n$ must be satisfied, where $n \in \mathbb {N}$ shows the dimension of $x$. This condition can be verified since the matrix $X_{0,T}$ is constructed based on the sampled data.

The following results from the literature provide a data-based representation of the closed-loop system while leveraging the state-input-state derivative data.

Lemma 1 ([4]):

Consider the system (1) under the nonlinear control input $u(x)=F(x)x$ where $F(x)$ is a polynomial function. Let Assumption 1 hold and $W(x)$ is defined as a $T \times n$ matrix such that

\begin{align*} {{\mathbb {I}}_{n}} = {X_{0,T}}W(x). \end{align*} View Source

If $u = F(x)x={U_{0,T}}W(x)x$, then the closed-loop dynamics (1) is represented as \begin{align*} \dot{x} = {X_{1,T}}W(x)x, \tag{7} \end{align*} View Sourceor equivalently, the closed-loop system can be written as \begin{align*} A + BF(x) = {X_{1,T}}W(x), \tag{8} \end{align*} View Sourcein which (7) and (8) are data-based representation of the closed-loop system.

Remark 1:

(8) provides a data-based representation of the closed-loop system dynamics and can replace the constraint (5) to find the safe controller in a data-driven manner. However, the drawback of data-driven approaches presented in [3], [4] is that the derivatives of the state of the system are needed, which are typically unavailable as direct measurements and have the following representation [1] \begin{align*} {X_{1,T}} = A{X_{0,T}} + B{U_{0,T}}= [\begin{array}{cc}B&A \end{array}]\left[ {\begin{array}{c}{U_{0,T}}\\ {{X_{0,T}}} \end{array}} \right]. \tag{9} \end{align*} View SourceMoreover, the performance is not considered in the safe control design approaches presented in [4].

To address the challenges outlined above in existing approaches, this paper presents a new learning-enabled safe optimal control design approach to solve the optimization problem (4) in a data-driven manner. Addressing both safety and optimality simultaneously presents its own set of challenges, particularly when formulating the problem solely based on data, which adds further complexity. However, the proposed approach in this paper is able to tackle these issues and can entail the construction of data-driven safe control input, as well as the development of a one-shot optimization to find optimal control input. This methodology is achieved by bypassing the necessity for explicit knowledge of system dynamics and eliminating the requirement for knowledge of state derivatives from sampled data. Instead, our method relies solely on measured input/state data to trade-off between safety and optimality.

SECTION III.

Data-Driven System Representation

To design a safe controller while the system dynamic is unknown and the derivative of the state of the system is not available, it is desired to write constraint (5) in terms of only input and state measurements. Besides, to design an optimal controller in a one-shot and data-efficient manner, a data-based representation of the closed-loop system is required. This section provides a data-based representation of open-loop systems for polynomial controllers and a data-based closed-loop representation for linear controllers.

To ensure that the safe control input covers the entire safe region, the following general format is considered for the safe controller: \begin{align*} {u^{s}}=F(x)x. \tag{10} \end{align*} View SourceBy applying (10) in (1) instead of $u$, the closed-loop system is described as \begin{align*} \dot{x} = \left({A + BF(x)} \right)\ x. \tag{11} \end{align*} View Source

To obviate the requirement of the derivative of the state of the system in the proposed approach, the following theorem is presented.

Theorem 1:

Consider the system (1). Let $x$ be a continuous-time signal ($x_{[0,nT]} \subseteq \mathbb {R}$) for some $n \in \mathbb {N}^+$ and $T \in \mathbb {R}^+$. Then, the input-state representation of $A$ and $B$ is \begin{align*} [\begin{array}{lc}B&A \end{array}] = ({{\mathcal H}_{T}}(x(T)) - {{\mathcal H}_{T}}(x(0))){D^{\prime }}{(D{D^{\prime }})^{ - 1}}, \tag{12} \end{align*} View Source in which \begin{align*} {{\mathcal H}_{T}}\left({x(t)} \right) = \left[ {\begin{array}{lcc}x(t)&{x\left({t + T} \right)}& \cdots \end{array}\;\;x\left({t + (n - 1)T} \right)\;} \right], \tag{13} \end{align*} View Sourceand \begin{align*} D = \left[ {\begin{array}{c}{U_{0,T}}\\ {{X_{0,T}}} \end{array}} \right]({X^{\prime }_{0,T}}{({X_{0,T}}.{X^{\prime }_{0,T}})^{ - 1}})(\mathop \smallint \nolimits _{0}^{T} {{\mathcal H}_{T}}(x)d\tau). \tag{14} \end{align*} View Source

Proof:

(13) represents a time-varying matrix defined on the interval $t \in [0, T]$. The proof relies on the expression of ${{\mathcal H}_{T}}(\dot{x}(t))$ using data first and then removing its dependence on the state derivative in two steps. To this end, using (11) one has \begin{align*} {{\mathcal H}_{T}}(\dot{x}(t)) = {{\mathcal H}_{T}}((A + BF(x))x(t)). \tag{15} \end{align*} View Source

Based on Lemma 1, in $A + BF(x) = {X_{1,T}}W(x)$, the state derivative measurements ${X_{1,T}}$ are required, which is not desired. However, ${X_{1,T}}$ can be expressed in terms of state and input data as (9). Therefore, using (9) while substituting (8) in (15) and taking the integral of (15) on the interval $[0,T]$ result in \begin{align*} {{\mathcal H}_{T}}(x(T)) - {{\mathcal H}_{T}}(x(0)) = [\begin{array}{lc}B&A \end{array}]\left[ {\begin{array}{c}{U_{0,T}}\\ {{X_{0,T}}} \end{array}} \right] \\ \times({X^{\prime }_{0,T}}{({X_{0,T}}.{X^{\prime }_{0,T}})^{ - 1}})(\mathop \smallint \nolimits _{0}^{T} {{\mathcal H}_{T}}(x)d\tau). \tag{16} \end{align*} View Sourcewhere $ {X^{\prime }_{0,T}}\,{({X_{0,T}}.{X^{\prime }_{0,T}})^{ - 1}}$ is the pseudo-inverse of $X_{0,T}$ represented as ${({X_{0,T}}\,)^\dag }$. As can be seen, the derivative of the state of the system does not show up in (16), which makes the presented formulation more practical. By considering $D$ from (14) in (16), we get (12) and this completes the proof.$\blacksquare$

By considering (12) in constraint (5), we can find the safe control policy using only input and state measurements. Optimal LQR control, however, requires a linear controller, which can further reduce the data requirements. The following corollary shows that for linear controllers, as a special case of polynomial controllers, the entire closed-loop system can be represented by data. Learning the closed-loop system directly to satisfy safety or optimality requires less amount of data than learning the open-loop dynamics first and then designing a safe controller.

Corollary 1:

Consider the system (1) under the linear controller ${u=K\,x}$. Let the assumptions and conditions of Theorem 1 be satisfied. Then, the state-input data-based representation of the closed-loop system is \begin{align*} {{\mathcal H}_{T}}(\dot{x}) = (A + BK){{\mathcal H}_{T}}(x), \tag{17} \end{align*} View Sourcewhere \begin{align*} \left({A + BK} \right) = ({{\mathcal H}_{T}}(x(T)) - {{\mathcal H}_{T}}(x(0))){(\mathop \smallint \nolimits _{0}^{T} {{\mathcal H}_{T}}(x)d\tau) ^{ - 1}}. \tag{18} \end{align*} View Source

Moreover, this closed-loop representation can be leveraged to design safe or optimal controllers if the rank of $X_{0,T}$ is $n$ and at least $n+1$ data samples are collected.

Proof:

Using ${u=K\,x}$ in (1), the closed-loop system becomes \begin{align*} \dot{x} = (A + BK)x. \tag{19} \end{align*} View Source

Applying (13) in (19) results in (17). Taking the integral from this equation yields \begin{align*} {{\mathcal H}_{T}}(x\left({T)} \right) - {{\mathcal H}_{T}}(x\left({0)} \right) = \left({A + BK} \right)\mathop \smallint \nolimits _{0}^{T} {{\mathcal H}_{T}}(x)d\tau. \end{align*} View SourceAs a result, $(A+BK)$ can be written as in (18). Besides, when the rank of $X_{0,T}$ is $n$, a solution to (18) exists. Moreover, when at least $n+1$ data samples are collected, the inverse is not unique, and thus $({{\mathcal H}_{T}}(x)d\tau) ^{ - 1}$ can be considered as a decision variable to directly learn a closed-loop system with safety satisfaction properties.$\blacksquare$

Since only the closed-loop system with the size $n \times n$ must be learned, $rank(X_{0,T})=n$ needs to be satisfied. However, learning open-loop dynamics requires a state-input data matrix of rank $n+m$. This advantage will also be leveraged in the design of an optimal controller in the subsequent sections. When it comes to safety, however, even though a linear controller ${u^{s}} = K^{s}x$ makes the design simpler and more data-efficient, this safe control input might not cover the entire safe region for some systems and some safe sets.

SECTION IV.

Data-Driven Optimal Control Design

Finding the optimal control input that minimizes (2) for the linear system (1) leads to solving the algebraic Riccati equation [15], which requires complete knowledge of the system dynamics. To obviate this requirement, iterative RL-based approaches [13], [16], [17] have been presented to learn the optimal control policy. Despite their advantages, iterative RL algorithms have disadvantages compared to the one-shot optimization methods. A significant drawback of iterative RL approaches pertains to their demanding computational requirements. In contrast, the one-shot optimization strategies center around directly solving optimization problems to discover the best solution at once, displaying the potential for greater efficiency in computational processing and data utilization. Even though one-shot learning of LQR for discrete-time systems is considered [1], it is lacking for continuous-time systems.

To this end, in this paper, we present a one-shot learning approach for solving the LQR problem for continuous-time systems (i.e., learning the optimal controller ${u}^{L}$ that minimizes (2)) using only state-input measured data. To this end, we leverage the following semi-definite program (SDP) representation of the LQR problem and turn it into a data-based optimization.

Lemma 2 ([14]):

Consider the linear system (1) with the quadratic cost function (2). Then, the control gain that optimizes the cost is obtained by $K = YP^{-1}$ where $P$ and $Y$ are the solutions to \begin{align*} \mathop {\min }\limits _{P,Y} (Trace(QP) &+ Trace({R^{\frac{1}{2}}}Y{P^{ - 1}}{Y^{\prime }}{R^{\frac{1}{2}}}))\tag{20a}\\ \text{s.t.}\; AP + P{A^{\prime }} &- BY - {Y^{\prime }}{B^{\prime }} + I \prec 0,\tag{20b}\\ &P = {P^{\prime }} \succ 0, \tag{20c} \end{align*} View Source

According to [14], [18], the objective function in (20) comprises the sum of two components, and the second term $ \phi (P,Y)=Trace({R^{\frac{1}{2}}}Y{P^{ - 1}}{Y^{\prime }}{R^{\frac{1}{2}}})$ can be represented as: \begin{align*} \phi (P,Y) =\min (Trace(X))\\ \text{s.t.}\;\left[ {\begin{array}{lc}X&{{R^{\frac{1}{2}}}Y}\\ {{Y^{\prime }}{R^{\frac{1}{2}}}}&P \end{array}} \right]\succ 0 . \end{align*} View SourceTherefore, (20) can be written as [14]: \begin{align*} &\min \beta \tag{21a}\\ \text{s.t.}\; C(\beta,&P,Y,X) \succ 0, \tag{21b} \end{align*} View Sourcewhere $C(\beta, P,Y,X)$ is given by [14]: \begin{align*} C(\beta, P,Y,X) &= diag({C_{1}},{C_{2}},{C_{3}}), \tag{22a}\\ {C_{1}}(\beta,P,Y,X) &= - Trace({QP}) - Trace(X) + \beta, \tag{22b}\\ {C_{2}}(\beta, P,Y,X) &= - AP - P{A^{\prime }} + BY + {Y^{\prime }}{B^{\prime }} - I,\tag{22c}\\ {C_{3}}(\beta, P,Y,X) &= \left[ {\begin{array}{lc}X&{{R^{\frac{1}{2}}}Y}\\ {{Y^{\prime }}{R^{\frac{1}{2}}}}&P \end{array}} \right]. \tag{22d} \end{align*} View SourceAs discussed in [14], several issues related to optimal control problems can be expressed as convex programs containing a finite set of variables. The conditions (21) and (22) corresponding to the LQR problem are well-posed if $R$ is positive definite ($R > 0$), $(A, B)$ is controllable, and $(A,Q)$ is observable. The following theorem presents a data-based version of the SDP optimization (21).

Theorem 2:

Consider the system (1) and the optimization problem in (21). Let $u^{L} = Kx$ and consider the collected data in (6). The one-shot data-based representation of the SDP optimization problem in (21) with the constraints (22) becomes

\begin{align*} &\min \beta \tag{23a}\\ \text{s.t.}\; C(\beta, P,Y,X)=&diag({C_{1}},{C_{2}},{C_{3}}) \succ 0, \tag{23b} \end{align*} View Source

where \begin{align*} {C_{1}} &= - Trace(QP) - Trace(X) + \beta,\tag{24a}\\ {C_{2}} &= \left({({{\mathcal H}_{T}}(x(T)) - {{\mathcal H}_{T}}(x(0))){D^{\prime }}{{(D{D^{\prime }})}^{ - 1}}} \right)\left[ {\begin{array}{c}Y\\ { - P} \end{array}} \right] \\ &\quad + {\left[ {\begin{array}{c}Y\\ { - P} \end{array}} \right]^{\prime }}{\left({({{\mathcal H}_{T}}(x(T)) - {{\mathcal H}_{T}}(x(0))){D^{\prime }}{{(D{D^{\prime }})}^{ - 1}}} \right)^{\prime }} - I,\tag{24b}\\ {C_{3}} &= \left[ {\begin{array}{cc}X&{{R^{\frac{1}{2}}}Y}\\ {{Y^{\prime }}{R^{\frac{1}{2}}}}&P \end{array}} \right]. \tag{24c} \end{align*} View Source

Proof:

For system (1), if $u= Kx$, then, the closed-loop system is \begin{align*} \dot{x} = (A + BK)x. \end{align*} View SourceAccording to [19] and (13), ${\mathcal H}_{T}$ has the capability to gather the past data. Therefore, \begin{align*} {{\mathcal H}_{T}}(\dot{x}) = (A + BK){{\mathcal H}_{T}}(x). \tag{25} \end{align*} View SourceSince directly measuring the state derivative $\dot{x}$ is not a typical scenario, as previously stated, (25) can be integrated to derive the following: \begin{align*} {{\mathcal H}_{T}}(x\left({T)} \right) - {{\mathcal H}_{T}}(x\left({0)} \right) = \left({A + BK} \right)\mathop \smallint \nolimits _{0}^{T} {{\mathcal H}_{T}}(x)d\tau. \end{align*} View SourceAn aspect that worth taking into consideration is that ${C_{2}}$ in (22) could be written as \begin{align*} \left[ {\begin{array}{cc}B&A \end{array}} \right]\left[ {\begin{array}{c}Y\\ { - P} \end{array}} \right] + {\left[ {\begin{array}{c}Y\\ { - P} \end{array}} \right]^{\prime }}{\left[ {\begin{array}{cc}B&A \end{array}} \right]^{\prime }} - I \succ 0. \tag{26} \end{align*} View SourceNote that the system dynamic shows up in the constraint (26). Based on (12), (14), and (16), the optimization problem in (21) with the mentioned constraints in (22) are turned into (23) and the constraints in (24), which completes the proof.$\blacksquare$

In (24), ${C_{2}}$ is replaced with data while the system dynamics is not required. Equation (23) with the mentioned constraints in (24) presents the data-driven optimization problem of (21) while the state derivative measurements are not required. Inspired by the proposed method in the previous section, this part proposes a data-driven version of (21).

Remark 2:

Using approaches presented in Sections III and IV, the data-driven safe control design approach and data-driven one-shot approach to find optimal control input can be obtained. However, it is desired to consider both safety and optimality simultaneously in the design of control inputs for systems. To do this, in the next section, a new approach is presented to combine safe and optimal control inputs proactively.

SECTION V.

Data-Driven Initiative Controller

To consider both safety and optimality in the control design, an initiative control input $u$ is considered as \begin{align*} u = \alpha (t) {u^{L}} + (1 - \alpha (t)) {u^{s}}, \tag{27} \end{align*} View Sourcewhere $\alpha (t)\in [0,1]$ determines the share of safe and optimal controllers in overall control input $u$.

In control policy (27), two special scenarios could happen: 1) the system trajectories are far from the safety boundary. Therefore, $\alpha (t)=1$. In this case, ${u} = {u}^{L}$ and the performance specifications of the system are fully certified. 2) the system trajectories are very close to the safety boundary. So, the safe controller must dominate the optimal controller. Then, $\alpha (t)=0$ and consequently, ${u} = {u}^{s}$. In this case, the system should only care about safety as it is a hard constraint all the time and should not be violated. In other situations, the goal is to find the largest value of $\alpha (t)$ so that the safety constraints are always satisfied while we can get as much performance as possible.

The goal is formulated as the following maximization problem \begin{align*} \max \alpha (t)&\tag{28a}\\ \text{s.t.}\;\ \ \frac{{\partial h}}{{\partial x}}(Ax + Bu) &+\gamma h \geq 0. \tag{28b} \end{align*} View SourceSolving the above optimization problem provides a proactive control approach that takes into account both safety and optimality aspects for systems described in (1). This approach differs significantly from existing reactive approaches [8], [9], where the system reacts with sharp control actions when safety constraints are about to be violated. Such reactive actions can be disruptive and potentially detrimental to the system.

Theorem 3:

Consider the system (1) and the maximization problem in (28). Consider the collected data in (6). The data-based representation of the maximization problem in (28) with its constraint could be written as \begin{gather*} \max \alpha (t) \tag{29a}\\ \frac{{\partial h}}{{\partial x}}(({{\mathcal H}_{T}}(x(T)) - {{\mathcal H}_{T}}(x(0))){D^{\prime }}{(D{D^{\prime }})^{ - 1}} \\ \times\left[ {\begin{array}{l}{U_{0,T}}\\ {{X_{0,T}}} \end{array}} \right]{({X_{0,T}})^\dag }x \\ - \alpha (t) {(RK^{L}{P^{ - 1}})^{\prime }}({u^{L}} - {U_{0,T}}{({X_{0,T}})^\dag }x)) + \gamma h \geq 0. \tag{29b} \end{gather*} View Source

Proof:

For system (1), by substituting the proposed initiative control input (27) in the constraint of (28), one has \begin{align*} \frac{{\partial h}}{{\partial x}}(Ax + B(\alpha (t) {u^{L}} + (1 - \alpha (t)){u^{s}})) + \gamma h \geq 0. \tag{30} \end{align*} View Source(30) can be simplified as \begin{align*} \frac{{\partial h}}{{\partial x}}((Ax + B{u^{s}}) + B\alpha (t) ({u^{L}} - {u^{s}})) + \gamma h \geq 0. \tag{31} \end{align*} View SourceSubstituting ${u^{s}}= F(x)x$ into (31) yields \begin{align*} \frac{{\partial h}}{{\partial x}}((Ax + BF(x)x) + B\alpha (t) ({u^{L}} - F(x)x)) + \gamma h \geq 0. \tag{32} \end{align*} View SourceBy using (8) and (9) in (32), one has \begin{align*} &\frac{{\partial h}}{{\partial x}}([\begin{array}{lc}B&A \end{array}]\left[ {\begin{array}{l}{U_{0,T}}\\ {{X_{0,T}}} \end{array}} \right]{({X_{0,T}})^\dag }x +B\alpha (t) ({u^{L}} - F(x)x)) \\ &\qquad + \gamma h \geq 0. \tag{33} \end{align*} View SourceSubstituting (12) into (33) yields \begin{align*} &\frac{{\partial h}}{{\partial x}}(({{\mathcal H}_{T}}(x(T)) - {{\mathcal H}_{T}}(x(0))){D^{\prime }}{(D{D^{\prime }})^{ - 1}} \\ &\quad \times \left[ {\begin{array}{c}{U_{0,T}}\\ {{X_{0,T}}} \end{array}} \right]{({X_{0,T}})^\dag }x + B\alpha (t) ({u^{L}} - F(x)x)) + \gamma h \geq 0. \end{align*} View Source

Since $F(x)={U_{0,T}}W(x)$, the maximization problem in (28) will be changed into: \begin{align*} \max &\alpha (t) \tag{34a}\\ &\frac{{\partial h}}{{\partial x}}(({{\mathcal H}_{T}}(x(T)) - {{\mathcal H}_{T}}(x(0))){D^{\prime }}{(D{D^{\prime }})^{ - 1}} \\ \quad \times&\left[ {\begin{array}{c}{U_{0,T}}\\ {{X_{0,T}}} \end{array}} \right]{({X_{0,T}})^\dag }x \\ +& B\alpha (t) ({u^{L}} - {U_{0,T}}(X_{0,T})^\dag x)) + \gamma h \geq 0. \tag{34b} \end{align*} View Source

As can be seen, only the $B$ matrix shows up in (34). To obviate the requirement of $B$ matrix, inspired by the fact that $B$ can be computed by using predetermined ${u^{L}}$ as \begin{align*} {u^{L}} = -{R^{ - 1}}{B^{\prime }}Px, \end{align*} View Sourcehere ${u^{L}}$, $R$ and $P$ are all known. Since ${u^{L}}=K^{L}x$, this computation can be expressed as follows: \begin{align*} K^{L} &= -{R^{ - 1}}{B^{\prime }}P \Rightarrow RK^{L} = -{B^{\prime }}P,\tag{35a}\\ RK^{L}{P^{ - 1}} &= -{B^{\prime }} \Rightarrow B = -{(RK^{L}{P^{ - 1}})^{\prime }}. \tag{35b} \end{align*} View SourceIn this part, the Lyapunov function candidate $V$ has the following format: \begin{align*} V = {x^{\prime }}Px, \end{align*} View Sourcewhere as mentioned earlier $P \succ 0$.

Substituting matrix $B$ from (35) in (34) leads to (29) and this completes the proof.$\blacksquare$

Corollary 2:

Consider the system (1). Assume the linear controller ${u^{s}=K^{s}\,x}$. The maximization problem in (28) can be presented in a data-driven manner as follows. \begin{gather*} \max \alpha (t) \tag{36a}\\ \frac{{\partial h}}{{\partial x}}({{\mathcal H}_{T}}(x(T)) - {{\mathcal H}_{T}}(x(0))){\left({\mathop \smallint \nolimits _{0}^{T} {{\mathcal H}_{T}}(x)d\tau } \right)^{\prime }} \\ {\left({(\mathop \smallint \nolimits _{0}^{T} {{\mathcal H}_{T}}(x)d\tau) {{(\mathop \smallint \nolimits _{0}^{T} {{\mathcal H}_{T}}(x)d\tau) }^{\prime }}} \right)^{ - 1}}x \\ -\alpha {(RK^{L}{P^{ - 1}})^{\prime }}({u^{L}} - {U_{0,T}}{({X_{0,T}})^\dag }x)) + \gamma h \geq 0. \tag{36b} \end{gather*} View Source

Proof:

If we consider ${u^{s}}= K^{s}x$ in (28), for this special case the constraint (31) will be turned into: \begin{align*} \frac{{\partial h}}{{\partial x}}(((A + Bk)x) + B\alpha (t) ({u^{L}} - K^{s}x)) + \gamma h \geq 0. \tag{37} \end{align*} View SourceAfter substituting (18) in (37) \begin{align*} &\frac{{\partial h}}{{\partial x}}(({{\mathcal H}_{T}}(x(T)) - {{\mathcal H}_{T}}(x(0))){\left({\mathop \smallint \nolimits _{0}^{T} {{\mathcal H}_{T}}(x)d\tau } \right)^{\prime }} \\ &\quad \times{\left({(\mathop \smallint \nolimits _{0}^{T} {{\mathcal H}_{T}}(x)d\tau) {{(\mathop \smallint \nolimits _{0}^{T} {{\mathcal H}_{T}}(x)d\tau) }^{\prime }}} \right)^{ - 1}}x \\ &\quad + B\alpha ({u^{L}} - K^{s}x)) + \gamma h \geq 0. \tag{38} \end{align*} View SourceSubstituting $K^{s} = {U_{0,T}}{({X_{0,T}})^\dag }$ in (38) results in \begin{align*} &\frac{{\partial h}}{{\partial x}}(({{\mathcal H}_{T}}(x(T)) - {{\mathcal H}_{T}}(x(0))){\left({\mathop \smallint \nolimits _{0}^{T} {{\mathcal H}_{T}}(x)d\tau } \right)^{\prime }} \\ &\quad \times{\left({(\mathop \smallint \nolimits _{0}^{T} {{\mathcal H}_{T}}(x)d\tau) {{(\mathop \smallint \nolimits _{0}^{T} {{\mathcal H}_{T}}(x)d\tau) }^{\prime }}} \right)^{ - 1}}x \\ &\quad + B\alpha ({u^{L}} - {U_{0,T}}{({X_{0,T}})^\dag }x)) + \gamma h \geq 0. \tag{39} \end{align*} View Source

By considering (39) and (35) instead of the constraint in (28), we get (36), and this completes the proof.$\blacksquare$

Note that by considering a linear safe control structure $u^{s}=K^{s}x$, the solution of the maximization problem in (36) is more conservative than the one in (34).

SECTION VI.

Stability and Feasibility Analysis

In this part, the proofs of recursive feasibility and asymptotic stability of the system, which play a prominent role, are provided. Since these proofs are inspired by the interpolating control method, first, the interpolating control is reviewed.

Interpolating control relies on a vertex representation approach, which involves smoothly interpolating between a controller located at the vertex of a feasible set and a high-gain feedback controller that prioritizes safety and robustness. The interpolation process is designed to ensure that the resulting control input stays within the feasible set while still achieving the desired performance. Here, based on the interpolating control theory, an interpolation coefficient is presented, which is $\alpha (t)$. Based on interpolating theory, $x(t)$ can be decomposed as follows [20]: \begin{align*} x(t)={\alpha (t)}{x}^{L}(t) + \left({1 - {\alpha (t)}} \right){x}^{s}(t), \tag{40} \end{align*} View Sourcewhere $\alpha (t)\in [0,1]$. Given the interpolation coefficient $\alpha (t)$, one can obtain the control law (27). The proposed control law in (27) provides a smooth transition between two controllers. The primary benefit of using the vertex control scheme is that it provides a large domain of attraction for the system.

In the concept of interpolating control, the following constraint should be satisfied for the controlled invariant set ${C_{\bar{t}}}$: \begin{align*} {C_{\bar{t}}} = \left\lbrace {x \in {R^{n}}:{F_{\bar{t}}}x \leq {g_{\bar{t}}}} \right\rbrace \end{align*} View Sourcewhere $F_{\bar{t}}$ is a constant matrix, and $g_{\bar{t}}$ is a constant vector. The possible values for the feasible domain of vertex control, denoted as $C_{\bar{t}}$, can be as extensive as those for any other limited control method. One approach to address this limitation is to switch to a different, more aggressive, nearby controller, such as $u^{L}$, once the state reaches ${\Omega _{\max }}$, where ${\Omega _{\max }} = \lbrace {x \in {R^{n}}:{F_{0}}x \leq {g_{0}}} \rbrace$ is the maximal invariant set. Interpolating control greatly decreases the amount of computational work needed [21].

Theorem 4:

Recursive feasibility: The control method described by (27) and (40), which involves interpolating at regular intervals, is capable of achieving a feasible solution for system (1), for all initial states $x_{0} \in {C_{\bar{t}}}$.

Proof:

In terms of recursive feasibility, it has to be proven that $u(t) \in {\mathbb {U}}$, which means ${F_{u}}u(t) \leq {g_{u}}$ and $\dot{x}(t) \in {C_{\bar{t}}}$. To show $u(t) \in {\mathbb {U}}$, by using (27) one has \begin{align*} F_{u}u(t) &= {F_{u}}(\alpha (t){u^{L}}(t) + (1 - \alpha (t)){u^{s}}(t))\\ &= \alpha (t){F_{u}}{u^{L}}(t) + (1 - \alpha (t)){F_{u}}{u^{s}}(t)\\ &\leq (\alpha (t){g_{u}} + (1 - \alpha (t)){g_{u}}) = {g_{u}}, \end{align*} View Sourceand also by the use of (40) beside (27), one has \begin{align*} \dot{x}(t) &= Ax(t) + Bu(t)\\ &= A(\alpha (t){x^{L}}(t) + (1 - \alpha (t)){x^{s}}(t)\,) + B(\alpha (t){u^{L}}(t) \\ &\quad +(1 - \alpha (t)){u^{s}}(t)\,)\\ &\!=\! \alpha (t) (A{x^{L}}(t) \!+ B{u^{L}}(t)) \!+ (1 - \alpha (t)) (A{x^{s}}(t) \!+ B{u^{s}}(t)). \end{align*} View Source

Since $ (A{x^{L}}(t) + B{u^{L}}(t))\in {\Omega _{\max }} \subseteq {C_{\bar{t}}}$ and $ A{x^{s}}(t) + B{u^{s}}(t)\in {C_{\bar{t}}}$, it follows that $\dot{x}(t) \in {C_{\bar{t}}}$.$\blacksquare$

Theorem 5:

Asymptotic stability: Considering system (1), the control law (27) guarantees asymptotic stability for all initial states $x_{0}$. That is, the system will eventually reach a stable state, regardless of the initial starting point $x_{0}$.

Proof:

It has to be proven that all solutions starting in ${C_{\bar{t}}}{\setminus }{\Omega _{\max }}$ will reach ${\Omega _{\max }}$ in finite time. One obtains \begin{align*} x(t)&={\alpha ^*(t)}{x^*}^{L}(t) + \left({1 - {\alpha ^*(t)}} \right){x^*}^{s}(t),\\ u(t)&={\alpha ^*(t)}{u}^{L}(t) + \left({1 - {\alpha ^*(t)}} \right){u}^{s}(t). \end{align*} View SourceAlso, the following is valid: \begin{align*} \dot{x}(t) &= Ax(t) + Bu(t)\\ &={\alpha ^*(t)}{\dot{x}}^{L}(t) + \left({1 - {\alpha ^*(t)}} \right){\dot{x}}^{s}(t), \end{align*} View Sourcewhere \begin{align*} {\dot{x}}^{L}(t)&=A{x^*}^{L}(t) + B{u}^{L}(t)\in {\Omega _{\max }},\\ {\dot{x}}^{s}(t)&=A{x^*}^{s}(t) + B{u}^{s}(t) \in {C_{\bar{t}}}.\\ \end{align*} View SourceWe have considered $V=1-{{\alpha }^*}(t)$ as a Lyapunov candidate, which is a non-negative function. Consider the following linear programming problem: \begin{align*} {{\alpha }^*}(t)={{\alpha }^*}(x(t))&= \max \limits _{\alpha,r^{L}} \alpha \tag{41a}\\ \qquad \text{s.t.}\quad F_{0} r^{L} &\leq \alpha g_{0},\tag{41b}\\ F_{N} (x(t)-r^{L}) &\leq (1-\alpha)g_{N},\tag{41c}\\ 0 \leq \alpha &\leq 1, \tag{41d} \end{align*} View Sourcewhere $r^{L}=\alpha x^{L}$. The optimal solution of (41) is ${{\alpha }^*}(t)$. $\text{max}(\alpha)=\alpha ^{*}$ byfinding the best ${x^*}^{s}$ and ${x^*}^{L}$. Also, we have: \begin{align*} {{\dot{\alpha }}^*}(t)={{\alpha }^*}(\dot{x}(t))&= \max \limits _{\alpha,r^{L}} \alpha \tag{42a}\\ \qquad \qquad \text{s.t.}\;\ F_{0} r^{L} &\leq \alpha g_{0},\tag{42b}\\ F_{N} (\dot{x}(t)-r^{L}) &\leq (1-\alpha)g_{N},\tag{42c}\\ 0 &\leq \alpha \leq 1. \tag{42d} \end{align*} View SourceIt should be noted that ${{\alpha }^*}(t)$ is a feasible solution of (42) while ${{\dot{\alpha }}^*}(t)$ is the optimal solution of (42). Therefore, ${{\dot{\alpha }}^*}(t)\geq {{\alpha }^*}(t)>0$. To guarantee asymptotic stability, we need to show that $\dot{V}=-\dot{\alpha }^* < 0$. Since ${{\dot{\alpha }}^*}(t)\geq {{\alpha }^*}(t)>0$, the proof is completed.$\blacksquare$

The vertex control law ensures that the system is capable of achieving feasible solutions in a recursive manner, meaning that feasible solutions can be found repeatedly over time. Furthermore, it ensures that the system will be asymptotically stable. This means that the system will eventually converge to a stable state, and this stability will persist even if there are minor variations in the initial conditions of the system.

SECTION VII.

Simulation Results

In this section, two numerical examples have been brought to verify the effectiveness of the proposed approach.

A. Example 1:

Consider the double integrator system, which is a canonical example of a second-order control system, the state space model has the following format: \begin{align*} {{\dot{x}}_{1}} &= {x_{2}},\\ {{\dot{x}}_{2}} &= u. \end{align*} View Source

The matrices regarding the cost function in (2) are being considered as follows: \begin{align*} Q &= \left[ {\begin{array}{cc}1&0\\ 0&1 \end{array}} \right],\\ R &= 1. \end{align*} View Source

The initial condition has been set as $x_{0} = [4,\,-3.5]^{\prime }$, and $\gamma =10$. The unsafe set $\mathbb {X}_{u}$ is defined as $\mathbb {X}_{u}=\lbrace (x_{1},x_{2}):(x_{1}-1)^{2}+(x_{2}+1)^{2}-(1/4)^{2}< 0\rbrace$.

As can be seen in Fig. 1, the unsafe set is shown as a red ellipse and the system trajectory has been shown by the blue line. This figure shows the system trajectory converges to the equilibrium point. Fig. 2 demonstrates the control input of a data-driven initiative controller, which indicates both safe control and optimal control input. In Fig. 3, the trajectory of $\alpha (t)$ has been brought, which is, as mentioned earlier always between zero and one.

Figure 1.

The trajectory of the system in the presence of the unsafe set.

Show All

Figure 2.

Initiative controller input.

Show All

$Figure 3. - Evolution of $\alpha (t)$ VS time.$

Figure 3.

Evolution of $\alpha (t)$ VS time.

Show All

B. Example 2:

In order to demonstrate our approach better and consider a realistic example, the lane-keeping problem has been considered for an autonomous vehicle as another numerical example. The state space model of the lane-keeping problem is given as: \begin{align*} \begin{bmatrix}\dot{y} \\ \dot{v} \\ \dot{\phi } \\ \dot{\psi } \end{bmatrix} \!=\! \begin{bmatrix}0 \!&\! 1 \!&\! V_{0} \!&\! 0 \\ 0 \!&\! -\frac{C_{f} + C_{r}}{M V_{0}} \!&\! 0 \!&\! \!-\frac{b C_{r} - a C_{f}}{M V_{0}} \!- V_{0} \\ 0 \!&\! 0 \!&\! 0 \!&\! 1 \\ 0 \!&\! \!-\frac{b C_{r} \!- a C_{f}}{I_{z} V_{0}} \!&\! 0 \!&\! 0 \end{bmatrix} \begin{bmatrix}y \\ v \\ \phi \\ \psi \end{bmatrix} \!+\! \begin{bmatrix}0 \\ \frac{C_{f}}{M} \\ 0 \\ \frac{a C_{f}}{I_{z}} \!\end{bmatrix} u, \tag{43} \end{align*} View Sourcewhere $V_{0}=27.7 \,\text{(m/s)}$, $C_{f}=133000\, \text{(N/rad)}$, $C_{r}=98800 \,\text{(N/rad)}$, $M=1650 \,\text{(kg)}$, $I_{z}=2315.3 \, \text{(m}^{2}\text{.kg)}$, $a=1.11 \,\text{(m)}$, and $b=1.59 \,\text{(m)}$.

The problem aims to seek to maintain the vehicle's position in the center of the driving lane. The details of this system model can be found in [22]. The states of the system could be denoted as: \begin{align*} \mathbf {x} = \begin{bmatrix}x_{1} \\ x_{2} \\ x_{3} \\ x_{4} \\ \end{bmatrix} = \begin{bmatrix}y \\ v \\ \phi \\ \psi \\ \end{bmatrix}^{T} \end{align*} View SourceThe values of the parameters in (43) are brought from [23]. The initial condition has been set as $x_{0} = [0.1,-0.1, 0.2, 0]^{\prime }$, and $\gamma =10$. The unsafe set $\mathbb {X}_{u}$ here is defined as $\mathbb {X}_{u}=\lbrace y=x_{1}:y=x_{1}>1\rbrace$. The $R$ and $Q$ matrices regarding the cost function in (2) has been considered as: \begin{align*} Q =& 10 \times I_{4\times 4},\\ R =& 1. \end{align*} View SourceAs can be seen in Fig. 4, the unsafe set is shown as a black dashed line ($y_\text{{max}}$) and the system trajectory with our proposed method has been shown by the blue line. In Fig. 5, the path of $\alpha (t)$ is presented, which ranges from zero to one.

Figure 4.

The trajectory of the lane-keeping system in the presence of the unsafe set.

Show All

$Figure 5. - Evolution of $\alpha (t)$ VS time in the lane-keeping system.$

Figure 5.

Evolution of $\alpha (t)$ VS time in the lane-keeping system.

Show All

SECTION VIII.

Conclusion

This article has developed a novel control law that merges safety and optimality at the same time. The key advantage of our proactive approach is its independence from knowledge of the system dynamics. In our proposed method, a data-driven safe controller is formulated, and optimal control input is computed based on data using a one-shot optimization problem. The initiative maximization problem has been formulated based on input/state data, eliminating the need for state derivatives in the formulation. The recursive feasibility and asymptotic stability of the control input have been proven. It is shown that the coefficient allocated to safety and optimal control input is proportional to its distance from the safety boundary, indicating a trade-off between safety and optimality. Simulation results have been presented to demonstrate the effectiveness of the proposed method.

MIT Libraries

MIT Libraries

A Computationally-Efficient Data-Driven Safe Optimal Algorithm Through Control Merging

Alerts

Abstract:

Metadata

Abstract:

Funding Agency:

Introduction

A. Notations

Problem Formulation and Preliminaries

A. Problem Formulation: Safe Optimal Control

Definition 1

Problem 1 (Safe Optimal Control):

B. Background on State-Input Representation of the Closed-Loop System

Assumption 1:

Lemma 1 ([4]):

Remark 1:

Data-Driven System Representation

Theorem 1:

Proof:

Corollary 1:

Proof:

Data-Driven Optimal Control Design

Lemma 2 ([14]):

Theorem 2:

Proof:

Remark 2:

Data-Driven Initiative Controller

Theorem 3:

Proof:

Corollary 2:

Proof:

Stability and Feasibility Analysis

Theorem 4:

Proof:

Theorem 5:

Proof:

Simulation Results

A. Example 1:

B. Example 2:

Conclusion

References