Self-Triggered Approximate Optimal Neuro-Control for Nonlinear Systems Through Adaptive Dynamic Programming

SECTION I.

Introduction

In the modern control framework, the control performance greatly depends on the communication networks, which transmits a large amount of data among controllers, sensors, and actuators. Especially, when a system structure grows its size and complexity, the deployment cost is increased accordingly. Therefore, in order to ensure both the system stability and the control performance, there exists a strong desire for controlled systems to not only provide an adequate level of control performance but also reduce resource and energy consumption.

Much effort has been made on developing optimal control methods to achieve adequate control level for nonlinear systems with different scenarios. It is worth pointing out that dynamic programming [1], which has been paid great attention by many researchers, is regarded as a classical and effective tool to obtain optimal control solutions of nonlinear systems. Furthermore, to avoid its “curse of dimensionality” that occurs in systems with the increasing structural complexity, adaptive dynamic programming (ADP), which is assisted by neural networks (NNs) or fuzzy logic systems, has been developed by Werbos [2] in the 1970s. During the last several decades, ADP-based optimal [3], [4], [5] and suboptimal [6] control approaches have received great success in tackling the control problems of continuous-time (CT) or discrete-time (DT) nonlinear systems with uncertainties or external disturbances [7], [8], input or output constraints [9], time-delay [10] and failures [11] to solve trajectory tracking [12], [13], zero-sum or nonzero-sum games [14], and so on. Moreover, some attempts have been made to implement the ADP-based control strategies to practical systems, such as residential energy scheduling [15], induction motor driving [16], power systems [17], microgrid energy management [18], near space vehicles [19], robot systems [12], and active suspension systems [20].

In order to reduce resource and energy consumption, ADP-based feedback control policies have been developed from periodic to aperiodic schedule, i.e., from time-triggered to event-triggered control strategy. Distinguished from the time-triggered control framework, the event-triggered control method [12], [21] is responsive and generates sensor sampling and control action as long as the system state diverges more than a certain threshold determined by the developed appropriate event-triggering condition, that is to say, the data transmission among the controllers, the sensors, and the actuators is greatly reduced to save the computational burden, the communication bandwidth, and the energy consumption.

Although the event-triggered control offers clear superiority to the periodic control, it is worth emphasizing that the triggering condition is continuously monitored based on current measurements from a hardware device. It implies that full-state information should be assumed to be available, but this assumption is always violated in most practical situations. Some beneficial attempts, which combine the event- and self-triggered mechanisms, have been made for different types of plants. Kishida et al. [22] presented an integrated event- and self-triggered networked control scheme for uncertain linear systems with finite-gain $\mathcal {L} _{2}$ stability, where the event-triggering condition decides whether a new control signal would be transmitted to the actuator at each sampling, and the self-triggering condition determines the time instants of sampling. Zhou et al. [23] proposed a combined self- and event-triggered control strategy to investigate the output control approach for quantized linear systems. Qi et al. [24] presented an event- and self-triggered boldsymbol control method for switched linear systems with exogenous disturbance. The current sampled data are employed to adaptively predict the inter-execution intervals to decrease resource consumption. Sahoo et al. [25] proposed a mixed event- and self-triggering-based regulation method for CT linear dynamical systems. Zuo et al. [26] developed both event- and self-triggered transmission strategies for dynamical systems by designing a triggering condition to save network resources.

To avoid the continuous monitoring of the system state by a hardware device, the self-triggering mechanism is introduced to design controllers by some researchers. In contrast to the event-triggered control which updates the control policy based on the developed triggering condition, the next triggering time in the self-triggered control [27] is determined at the former triggering time instant, that is to say, the continuous monitoring of the system state through hardware devices is not required anymore at the sensor side, and the embedded devices can shut their communication until the next transmission time instant. Focusing on linear systems, Zhang et al. [28] designed a self-triggered gain scheduling control for achieving the stabilization semiglobally for input constrained linear systems to avoid continuous monitoring the system states. Lu and Maciejowski [29] investigated a self-triggering mechanism-based model predictive control (MPC) strategy for linear systems in the presence of both state and input constraints, where both the updating of MPC control policy and the next triggering instant are determined according to the relaxed dynamic programming inequality. Brunner et al. [30] investigated a novel self-triggered aperiodic control method for perturbed DT linear systems by evaluating the set-membership conditions. In this way, a tradeoff is realized between the communication rate and the worse case asymptotic limitation on the closed-loop system state. For nonlinear systems, Liu et al. [31] presented an optimized self-triggered strategy-based robust MPC scheme for constrained DT uncertain nonlinear systems subject to disturbances. Li and Li [32] proposed a self-triggered distributed MPC scheme to reach consensus of a heterogeneous time-varying multiagent system. The self-triggering time intervals are alternatively optimized by the control inputs, and the influence on the system performance is analyzed. Based on the state information of each agent collected from its neighbors, Fan et al. [33] proposed a self-triggered consensus algorithm with Zeno-exclusion analysis for multiagent systems. Gao et al. [34] proposed a state estimation-based self-triggered control scheme for cyber-physical systems subject to joint attacks of both sensors and actuators with the enhancement of resource saving. Furthermore, Gao et al. [35] developed a robust self-triggered control scheme for time-varying constrained uncertain systems by analyzing the reachability for both linear and nonlinear scenarios. Furthermore, some attempts have been made for practical implementations. Cao et al. [36] proposed a self-triggered MPC strategy to investigate the trajectory tracking control problem for nonholonomic vehicles considering coupled input constraints and bounded perturbations. By introducing the self-triggered communication strategy, Zhou and Tokekar [37] developed a decentralized target tracking method for multirobot teams, and the time when a particular robot should search the online data from its neighbors and when it is secure to manipulate with possibly out of date data is determined based on the self-triggering mechanism to reduce the communication bandwidth.

From the aforementioned state-of-the-art of existing self-triggered control schemes, it is widely explored in MPC, distributed control, and networked control, rare works focused on optimal control. Lou and Ji [38] investigated a new self-triggered adaptive optimal control method for nonlinear CT systems leading to a high quantitative accuracy at a limited channel transmission rate. Kobayashi and Hiraishi [39] investigated the synthesis of self-triggered control problem for network systems in an optimal manner by computing the control input and the sampling time simultaneously.

In order to avoid the continuous monitoring through hardware devices in existing event-triggered control strategies, as well as to decrease the computational burden, the communication bandwidth, and the energy consumption, we present a new self-triggered approximate optimal neuro-control method for nonlinear systems based on ADP. The main contributions and novelties are concluded as following three aspects.

Different from existing ADP-based event-triggered control strategies [12], [21], the next triggering instant in the developed ADP-based self-triggered control is predicted by a software based on the previous triggering instant, that is to say, a new triggering condition based on self-triggering mechanism is explored to predict the next triggering instant. Thus, the continuous monitoring of the system state in the event-triggered control strategy is avoided, and the hardware device for continuous monitoring the system state is not required anymore.
By selecting a proper design structure for the nested updating policies, the critic NN weight error dynamics is guaranteed to be asymptotically stable, rather than uniformly ultimately bounded (UUB) in most of existing ADP-based control schemes.
The self-triggered control scheme cannot only guarantee the system to be stable in an optimal manner but also reduces the computation, the precious communication bandwidth, and the energy consumption. In other words, the developed control scheme offers a feasible tradeoff between the overall resource cost and the system control performance, which is significant in real implementations.

The remainder of this article is structured as follows. In Section II, the problem statement is given. In Section III, the self-triggered approximate optimal neuro-control is designed via the ADP framework in detail, and the stability analysis is offered. In Section IV, simulation studies illustrate the effectiveness of the developed approach. In Section V, concluding remarks are briefly described.

SECTION II.

Problem Statement

The considered nonlinear system dynamics is modeled in the general form as

$\begin{equation*} \dot x = f\left ({x}\right) + g\left ({x}\right)u\left ({x}\right) \tag{1}\end{equation*}$ View Source

where

$x(t)\in {\mathbb R}^{n}$

and

$u(t)\in {\mathbb R}^{m}$

are the system state and control input vectors, respectively.

$f(x) \in {\mathbb R}^{n}$

and

$g(x) \in {\mathbb R}^{n \times m}$

represent the known drift dynamics and control input matrix, respectively. To ease the notation,

$x(t) = x$

is denoted in the sequel.

To make the analysis more amenable, the following two assumptions are provided.

Assumption 1:

The vector-valued drift function $f(x)$ is Lipschitz continuous on the compact set $\Omega \in {\mathbb R}^{n}$ including the origin such that the solution $x(t)$ of the nonlinear system (1) is unique for the given equilibrium point $x_{0} \in \Omega$ and $u$ . Moreover, there is a scalar $D_{f}>0$ such that $\|f(x)\|\le D_{f} \|x\|$ , where $\|\cdot \|$ indicates the 2-norm of a vector, and the considered system (1) can be stabilized on $\Omega$ .

Assumption 2:

The control input matrix $g(x)$ is norm-bounded as $0 < \|g(x)\|_{F} \le D_{g}$ for arbitrary $x \in \Omega$ , where $D_{g}>0$ and $\|\cdot \|$ denotes the Frobenius norm of a matrix.

Define the infinite-horizon cost function as

$\begin{equation*} V\left ({x,u}\right) = \int _{t}^{\infty} { U\left ({x\left ({\tau }\right),u\left ({\tau }\right)}\right) \mathrm {d}\tau } \tag{2}\end{equation*}$ View Source

where

$U(x,u) = {x^{\mathsf {T}}}Qx + {u^{\mathsf {T}}}Ru \ge 0$

for all

$x \in {\mathbb R}{^{n}}$

and

$u \in {\mathbb R}{^{m}}$

,

$U(0,0) = 0$

,

$V(0,0) = 0$

, and

$Q \in {\mathbb R}{^{n \times n}}$

and

$R \in {\mathbb R}{^{m \times m}}$

are symmetric positive definite matrices.

Definition 1:

For the nonlinear system (1), a control policy $u (x)$ is defined to be admissible with respect to (2) if $u (x)$ is continuous on a set $\Omega \in {\mathbb R}^{n}$ , $u (0)=0$ , $u (x)$ stabilizes the nonlinear system (1), and $V (x_{0})$ in (2) is finite for all $x \in \Omega$ , where $x_{0}=x(0)$ is the initial state of $x$ . The admissible control set $\psi (\Omega)$ consists of admissible control policies.

For any given admissible control policy $u$ in the admissible control set $\psi (\Omega)$ , if the associated cost function (2) is continuously differentiable and $V \in C^{1}$ , then we can derive the nonlinear equation by taking the time derivative of (2) as in [41]

$\begin{equation*} U\left ({x,u}\right) + \nabla {V^{ \mathsf {T}}}\left ({x}\right)\left ({{f\left ({x}\right) + g\left ({x}\right)u} }\right) = 0 \tag{3}\end{equation*}$ View Source

where

$\nabla {V^{ \mathsf {T}}}(x) = {\partial V(x) \mathord {\left /{ {\vphantom {{\partial V^{\ast} (x)} {\partial x}}} }\right. } {\partial x}}$

indicates the partial gradient of

$V(x)$

with respect to the system state

$x$

.

To drive the closed-loop system (1) to be convergent, the optimal control policy $u^{\ast }(t)\in \psi (\Omega)$ should be achieved by seeking the minimized cost function, i.e., the optimal cost function as

$\begin{equation*} {V^ {*} }\left ({x}\right) = \min _{u \in \psi \left ({\Omega }\right)} \int _{t}^{\infty} {U\left ({x\left ({\tau }\right),u\left ({\tau }\right)}\right)\mathrm {d}\tau }. \tag{4}\end{equation*}$ View Source

By considering the optimal cost function (4), the associate Hamiltonian is defined for the nonlinear system (1) as

$\begin{equation*} H\left ({{x,u,\nabla V^{\ast} \left ({x}\right)} }\right) \!=\!U\left ({x,u}\right) \!+\! \nabla {V^{\ast \mathsf {T}}}\left ({x}\right)\left ({{f\left ({x}\right) \!+\! g\left ({x}\right)u} }\right). \tag{5}\end{equation*}$

View Source

Depending on the Bellman principle of optimality [1], the optimal cost function

${V^ {*} }(x)$

is derived by seeking the solution of the following Hamilton–Jacobi–Bellman equation (HJBE)

$\begin{equation*} 0 = \min _{u \in \psi \left ({\Omega }\right)} H\left ({{x,u^{\ast},\nabla {V^{*} }\left ({x}\right)} }\right). \tag{6}\end{equation*}$

View Source

Hence, the optimal neuro-control policy is illustrated in the closed form as

$\begin{equation*} {u^ {*} }\left ({x}\right) = - \frac {1}{2}{R^{ - 1}}{g^{\mathsf {T}}}\left ({x}\right)\nabla {V^ {*} }\left ({x}\right). \tag{7}\end{equation*}$

View Source

By involving the self-triggered mechanism, the present optimal neuro-control policy is updated when the self-triggered time instant is satisfied only, which is predicted by the present state and a given function through software. Thus, it is easy to implement in contrast to the event-triggered optimal control schemes.

With the help of the prediction, the optimal neuro-control policy is updated under the self-triggered mechanism as $u_{k}=u(x(t_{k}))$ at the predicted time $t_{k}$ . Therefore, the control target of this article is to present a self-triggered approximate optimal neuro-control policy $u(x(t_{k}))$ to make the closed-loop system (1) to be stable.

SECTION III.

Self-Triggered Approximate Optimal Neuro-Controller Design and Stability Analysis

A. Asymptotically Converged Critic NN

Starting from the cost function (2), it is tough to solve the HJBE (6). Fortunately, based on the general approximation ability of NNs, $V(x)$ is accurately approximated by a feedforward NN with only one hidden layer [40], [41] such that

$\begin{equation*} V\left ({x}\right) = W_{c}^{\mathsf {T}}\sigma \left ({x}\right) + {\varepsilon _{c}}\left ({x}\right) \tag{8}\end{equation*}$ View Source

where

${W_{c}} \in {\mathbb R}{^{l_{c}}}$

,

$\sigma (x) \in {\mathbb R}{^{l_{c}}}$

, and

${\varepsilon _{c}}(x)\in {\mathbb R}$

represent the unknown optimal weight vector, the activation function, and the approximation error, respectively, and

${l_{c}}$

indicates the quantity of hidden neurons. Thus, taking the partial derivative of (8) along the system state

$x$

, we have

$\begin{equation*} \nabla V\left ({x}\right) = { {\nabla \sigma ^{\mathsf {T}} \left ({x}\right)} }{W_{c}} + \nabla {\varepsilon _{c}^{\mathsf {T}}}\left ({x}\right) \tag{9}\end{equation*}$

View Source

where

$\nabla \sigma ^{\mathsf {T}} (x) = {\partial \sigma (x) \mathord {\left /{ {\vphantom {}} }\right. } {\partial x \in {\mathbb R}{^{ n\times {l_{c}}}}}}$

and

$\nabla {\varepsilon _{c}^{\mathsf {T}}}(x) = {{\partial {\varepsilon _{c}}(x)} \mathord {\left /{ {\vphantom {{\partial {\varepsilon _{c}}(x)} {\partial x \in {\mathbb R} {^{n}}}}} }\right. } {\partial x \in {\mathbb R}{^{n}}}}$

are the corresponding partial gradients along the system state

$x$

, respectively.

In this case, the Hamiltonian for the nonlinear system (1) is defined as

$\begin{equation*} H\left ({{x,u,{W_{c}}} }\right) = U\left ({{x,u} }\right)+ W_{c}^{\mathsf {T}}\nabla \sigma \left ({x}\right)\dot x + \nabla {\varepsilon _{c}}\left ({x}\right)\dot x. \tag{10}\end{equation*}$ View Source

In order to derive the optimal control policy (7) customized to a given controlled plant, the unknown optimal weight vector

${W_{c}}$

is trained by approximating (8) as

$\begin{equation*} \hat V\left ({x}\right) = \hat W_{c}^{\mathsf {T}}\sigma \left ({x}\right) \tag{11}\end{equation*}$

View Source

where

${\hat W_{c}}$

is the estimate of

${W_{c}}$

. The partial gradient with respect to the corresponding system state

$x$

of (11) is described as

$\begin{equation*} \nabla \hat V\left ({x}\right) = { {\nabla \sigma ^{\mathsf {T}}\left ({x}\right)}}{\hat W_{c}}. \tag{12}\end{equation*}$

View Source

Under the approximation (11) of the cost function

$V(x)$

, the Hamiltonian (43) is approximated by

$\begin{align*} H\left ({{x,u,{\hat W_{c}}} }\right)& =U\left ({{x,u} }\right) + \hat W_{c}^{\mathsf {T}}\nabla \sigma \left ({x}\right)\dot x \\ &={e_{c}}. \tag{13}\end{align*}$

View Source

By comparing (13) with (5), the Hamiltonian approximation error is obtained as

$\begin{equation*} {e_{c}} = \varepsilon - \tilde W_{c}^{\mathsf {T}}{\theta } \tag{14}\end{equation*}$ View Source

where

${\tilde W_{c}} = {W_{c}} - {\hat W_{c}}$

denotes the weight approximation error vector,

${\theta } = \nabla \sigma (x)\dot x$

, and

$\varepsilon = - \nabla {\varepsilon _{c}}(x)\dot x$

is norm-bounded by a positive constant

$\varepsilon _{M}$

as

$\|\varepsilon \| \le \varepsilon _{M}$

.

Similarly, to derive the critic NN weight vector, the target function ${E_{c}} = (1/2)e_{c}^{\mathsf {T}}{e_{c}}$ is minimized through the commonly used steepest descent algorithm. Thus, the weight vector is updated by

$\begin{equation*} {\dot {\hat W}_{c}} = - {l_{c}}\left [{ \frac {\partial E_{c}} {\partial \hat W_{c}}}\right] \tag{15}\end{equation*}$ View Source

where

${l_{c}} > 0$

is a learning rate. On this basis, the nested updating policies are described as

$\begin{equation*} {\dot {\hat W}_{c}} = - {l_{c}}\left ({{e_{c} - {\hat \varepsilon }_{M}-\Gamma \mathop {\mathrm{ sgn}}\left ({{\tilde W_{c}^{\mathsf {T}}}{\theta }}\right) } }\right){\theta } \tag{16}\end{equation*}$

View Source

where

$\Gamma >0$

is a design parameter,

${\hat \varepsilon }_{M}$

is the estimate of

${\varepsilon }_{M}$

tuned by

$\begin{equation*} \dot {\hat \varepsilon }_{M} = - {l_{\varepsilon} }\hat W_{c}^{\mathsf {T}}{\theta }, \tag{17}\end{equation*}$

View Source

and

$l_{\varepsilon} >0$

is a learning rate. Define the weight approximation error vector as

${\tilde W_{c}} = {W_{c}} - {\hat W_{c}}$

, and it is adjusted by

$\begin{align*} {\dot {\tilde W}_{c}}& = - {\dot {\hat W}_{c}} \\ & = {l_{c}}\left ({\varepsilon -\varepsilon _{M}+ {\tilde \varepsilon }_{M}- \tilde W_{c}^{\mathsf {T}}{\theta }-\Gamma \mathop {\mathrm{ sgn}}\left ({{\tilde W_{c}^{\mathsf {T}}}{\theta }}\right) }\right){\theta } \tag{18}\end{align*}$

View Source

where

${\tilde \varepsilon }_{M} = \varepsilon _{M} - {\hat \varepsilon }_{M}$

, and it is adjusted by

$\begin{equation*} {\dot {\tilde \varepsilon }}_{M} = -{l_{\varepsilon} }\tilde W_{c}^{\mathsf {T}}{\theta }. \tag{19}\end{equation*}$

View Source

Remark 1:

From (18) and (19), we know that the approximation error item $\tilde \varepsilon _{M}$ , adjusted by (19), is embedded in the updating policy (18) of ${\tilde W}_{c}$ . Thus, the updating policies (18) and (19) are then regarded as “the nested updating policies.”

Next, we will show the nested updating policies (18) and (19) can ensure the asymptotic stability of the critic NN weight error dynamics, rather than UUB in most existing ADP-based optimal control methods.

Theorem 1:

For the nonlinear system (1), the developed nested updating policies (18) and (19) ensure the critic NN weight error dynamics to be asymptotically stable.

Proof:

Considering both approximation errors in the nested updating policies, select a Lyapunov function candidate as

$\begin{equation*} {L_{1}} = \mathrm {tr}\left ({{\frac {1}{{2{l_{c}}}}\tilde W_{c}^{\mathsf {T}}{\tilde W_{c}}} }\right) + \frac {1}{{2{l_{\varepsilon} }}}{\tilde \varepsilon _{M} ^{2}}. \tag{20}\end{equation*}$ View Source

For (20), taking its time derivative and plugging the nested updating policy (18) into it, we derive

$\begin{align*} {\dot L_{1}} &=\mathrm {tr}\left ({{\frac {1}{{{l_{c}}}}\tilde W_{c}^{\mathsf {T}}{{\dot {\tilde W}}_{c}}} }\right) + \frac {1}{{{l_{\varepsilon} }}} {\dot {\tilde \varepsilon }}_{M}{\tilde \varepsilon _{M}} \\ &=\mathrm {tr}\left ({{\tilde W_{c}^{\mathsf {T}}\left ({\varepsilon -\varepsilon _{M}}\right) {\theta }} }\right)+\mathrm {tr}\left ({{\tilde W_{c}^{\mathsf {T}}{\tilde \varepsilon _{M}} {\theta }} }\right) \\ &\quad -\Gamma \big \| {\tilde W_{c}^{\mathsf {T}}}{\theta } \big \|- {\big \| {\tilde W_{c}^{\mathsf {T}}{\theta }} \big \|^{2}} + \frac {1}{{{l_{\varepsilon} }}}{\dot {\tilde \varepsilon }}_{M}{\tilde \varepsilon _{M}}. \tag{21}\end{align*}$

View Source

For matrices

$X$

and

$Y$

, if

$YX \in {\mathbb R}$

, we notice that

$\mathrm {tr}(XY) = \mathrm {tr}(YX)=YX$

. Introducing the updating policy (19) into (21), we derive

$\begin{align*} {\dot L_{1}} &=\mathrm {tr}\left ({{\tilde W_{c}^{\mathsf {T}}{\theta }\left ({\varepsilon -\varepsilon _{M}}\right)} }\right)+ \mathrm {tr}\left ({{\tilde W_{c}^{\mathsf {T}}{\theta }{\tilde \varepsilon _{M}}} }\right)-\Gamma \big \| {\tilde W_{c}^{\mathsf {T}}}{\theta } \big \| \\ &\quad - {\big \| {\tilde W_{c}^{\mathsf {T}}{\theta }} \big \|^{2}} + \frac {1}{{{l_{\varepsilon} }}} {\dot {\tilde \varepsilon }}_{M}{\tilde \varepsilon _{M}} \\ &\le \big \|{\tilde W_{c}^{\mathsf {T}}}{\theta }\big \|\big \|\varepsilon -\varepsilon _{M}\big \| + \mathrm {tr}\left ({{\tilde W_{c}^{\mathsf {T}}{\theta }{\tilde \varepsilon _{M}}} }\right)-\Gamma \big \|{\tilde W_{c}^{\mathsf {T}}}{\theta } \big \| \\ &\quad - {\big \| {\tilde W_{c}^{\mathsf {T}}{\theta }} \big \|^{2}} + \frac {1}{{{l_{\varepsilon} }}} {\dot {\tilde \varepsilon }}_{M}{\tilde \varepsilon _{M}} \\ &\le 2\varepsilon _{M}\big \|{\tilde W_{c}^{\mathsf {T}}}{\theta }\big \| + \mathrm {tr}\left ({{\tilde W_{c}^{\mathsf {T}}{\theta }{\tilde \varepsilon _{M}}} }\right) -\Gamma \big \| {\tilde W_{c}^{\mathsf {T}}}{\theta } \big \| \\ &\quad - {\big \| {\tilde W_{c}^{\mathsf {T}}{\theta }} \big \|^{2}} + \frac {1}{{{l_{\varepsilon} }}} {\dot {\tilde \varepsilon }}_{M}{\tilde \varepsilon _{M}} \\ &= -\left ({\Gamma -2\varepsilon _{M}}\right)\big \|{\tilde W_{c}^{\mathsf {T}}}{\theta }\big \|- {\big \| {\tilde W_{c}^{\mathsf {T}}{\theta }} \big \|^{2}}. \tag{22}\end{align*}$

View Source

From the above analysis, we know that if there exists a design parameter

$\Gamma \ge 2\varepsilon _{M}$

, then,

${\dot L}_{1} \le 0$

. Hence, it indicates that the nested updating policies (18) and (19) guarantee the asymptotic stability of the critic NN weight error vector

${\tilde W_{c}}$

. This ends the proof.

B. Self-Triggered Approximate Optimal Neuro-Control

Based on the partial derivative of the approximate critic NN (12), the desired optimal neuro-control policy (7) is approximated by

$\begin{equation*} {\hat u}\left ({x}\right) = - \frac {1}{2}{R^{ - 1}}{g^{\mathsf {T}}}\left ({x}\right){\left ({{\nabla \sigma \left ({x}\right)} }\right)^{\mathsf {T}}}{\hat W_{c}}. \tag{23}\end{equation*}$ View Source

According to the self-triggered mechanism, the optimal neuro-control policy can be updated at the predicted time sequence as

$\begin{equation*} {\hat u}\left ({x_{k}}\right)= {\hat u}_{k}= - \frac {1}{2}{R^{ - 1}}{g^{\mathsf {T}}}\left ({x_{k}}\right){\left ({{\nabla \sigma \left ({x_{k}}\right)} }\right)^{\mathsf {T}}}{\hat W_{c}}. \tag{24}\end{equation*}$ View Source

C. Stability Analysis

Before showing the stability of the closed-loop system (1) with the developed self-triggered approximate optimal neuro-control policy (24), the following assumptions are provided.

Assumption 3:

On the compact set $\Omega \in {\mathbb R}^{n}$ , the control input function $u(x)$ is Lipschitz continuous, i.e., there is a scalar $D_{u}>0$ such that $\|u(x(t))-u(x(t_{k}))\|=\|u^{\ast }-u_{k}\|\le D_{u}\|x_{e}\|$ , where $x_{k}=x(t_{k})$ and $x_{e}=x-x_{k}$ .

Assumption 4:

$\tilde W_{c}$ , $\nabla \sigma (x)$ , and $\nabla \varepsilon _{c}$ are norm-bounded as $\|\tilde W_{c}\| \le W_{cM}$ , $\|\nabla \sigma (x)\| \le \sigma _{cM}$ , and $\|\nabla \varepsilon _{c}\| \le \varepsilon _{cM}$ , where $W_{cM}$ , $\sigma _{cM}$ , and $\varepsilon _{cM}$ are unknown positive scalars.

Theorem 2:

For the nonlinear system (1), consider the cost function (2), Assumptions 1 –4, and the developed self-triggered approximate optimal neuro-control policy (24). If there are positive constants $\delta \in (0,1]$ and $\epsilon \in (0,1)$ such that

$\begin{equation*} \delta \left ({1-\epsilon ^{2}}\right)x_{k}^{\mathsf {T}}Nx_{k}\ge x_{e}^{\mathsf {T}}\left ({\left ({1-\frac {1}{\epsilon ^{2}}}\right)N+M}\right)x_{e} \tag{25}\end{equation*}$ View Source

where

$N$

and

$M$

are later defined positive definite diagonal matrices with proper dimensions. Then, the UUB stability can be assured for the closed-loop nonlinear system (1).

Proof:

Select a Lyapunov function candidate as

$\begin{equation*} L_{2}=\frac {1}{2}x^{\mathsf {T}}x+{V^{*}}\left ({x}\right). \tag{26}\end{equation*}$ View Source

Substituting the developed self-triggered approximate optimal neuro-control policy (24), the time derivative of (26) becomes

$\begin{align*} \dot L_{2}&=x^{\mathsf {T}}\dot x+\nabla {V^{\ast } }\left ({x}\right)\dot x \\ &\le x^{\mathsf {T}}\left ({f\left ({x}\right)+g\left ({x}\right)\hat u_{k} }\right) \\ &\quad +\nabla {V^ {\ast }}\left ({x}\right)\left ({f\left ({x}\right)+g\left ({x}\right)\left ({u^{\ast }\left ({x}\right)+\hat u_{k}-u^{\ast }\left ({x}\right)}\right)}\right) \\ &\le x^{\mathsf {T}}\left ({f\left ({x}\right)+g\left ({x}\right)\hat u_{k} }\right)+\nabla {V^ {\ast }}\left ({x}\right)\left ({f\left ({x}\right)+g\left ({x}\right)u^{\ast }\left ({x}\right) }\right) \\ &\quad +2 u^{\ast \mathsf {T}}R\left ({u^{\ast }\left ({x}\right)-\hat u_{k}}\right). \tag{27}\end{align*}$

View Source

Based on Assumptions 1 and 2, considering (3) and utilizing Young’s inequality, we obtain

$\begin{align*} \dot L_{2}&\le -x^{\mathsf {T}}D_{f}x+\frac {1}{2}x^{\mathsf {T}}D_{g}x+\frac {1}{2}\hat u_{k}^{\mathsf {T}}D_{g}\hat u_{k} \\ &\quad +2u^{\ast \mathsf {T}}R\left ({u^{\ast }-\hat u_{k}}\right)-x^{\mathsf {T}}Qx-u^{\ast \mathsf {T}}Ru^{*} \\ &\le -x^{\mathsf {T}}\left ({D_{f}+\frac {1}{2}D_{g}}\right)x+2u^{\ast \mathsf {T}}R\left ({u^{\ast }-\hat u_{k}}\right) \\ &\quad +\frac {1}{2}{\left ({u^{\ast }-u^{\ast }+\hat u_{k}}\right)}^{\mathsf {T}}D_{g}{\left ({u^{\ast }-u^{\ast }+\hat u_{k}}\right)} \\ &\quad -x^{\mathsf {T}}Qx-u^{\ast \mathsf {T}}Ru^{*} \\ &= -x^{\mathsf {T}}\left ({D_{f}+\frac {1}{2}D_{g}}\right)x+\frac {1}{2}{\left ({u^{\ast }-\hat u_{k}}\right)}^{\mathsf {T}}D_{g}{\left ({u^{\ast }-\hat u_{k}}\right)} \\ &\quad -u^{\ast \mathsf {T}}\left ({R-\frac {1}{2}D_{g}}\right) u^{\ast }-x^{\mathsf {T}}Qx \\ &\quad +u^{\ast \mathsf {T}}\left ({2R-D_{g}I}\right){\left ({u^{\ast }-\hat u_{k}}\right)}. \tag{28}\end{align*}$

View Source

According to Assumption 4, we notice that

$\begin{align*} \|u^{\ast }-\hat u_{k}\| &= \|u^{\ast }-\hat u+ \hat u- \hat u_{k}\| \\ &\le \|u^{\ast }-\hat u\| + \|\hat u- \hat u_{k}\| \\ &\le \frac {1}{2}\|R^{-1}\|_{F}\|g\left ({x}\right)\|_{F}\| \tilde W_{c} \nabla \sigma \left ({x}\right)+\nabla e_{c} \left ({x}\right)\| \\ &\quad +D_{u} \|x_{e}\| \\ &\le \frac {1}{2}\|R^{-1}\|_{F}D_{g} \mu +D_{u} \|x_{e}\| \tag{29}\end{align*}$

View Source

where

$\mu =\|W_{cM} \sigma _{M} + e_{cM}\|$

. Thus, we have

$\begin{align*} \dot L_{2}\le & -x^{\mathsf {T}}\left ({Q-\left ({D_{f}+\frac {1}{2}D_{g}}\right)I}\right)x+\frac {1}{4}R^{-2}D_{g}^{3} \mu ^{2} \\ &\quad +x_{e}^{\mathsf {T}}D_{g} D_{u}^{2} x_{e}-u^{\ast \mathsf {T}}\left ({R-\frac {1}{2}D_{g}}\right) u^{\ast } \\ &\quad +\|u^{\ast }\|\|2R-D_{g}I\|_{F}\left ({\frac {1}{2}R^{-1}D_{g} \mu + D_{u}\|x_{e}\|}\right) \\ \le & -x^{\mathsf {T}}\left ({Q-\left ({D_{f}+\frac {1}{2}D_{g}}\right)I}\right)x+\frac {1}{4}R^{-2}D_{g}^{3} \mu ^{2} \\ &\quad +x_{e}^{\mathsf {T}}D_{g} D_{u}^{2} x_{e}-u^{\ast \mathsf {T}}\left ({R-\frac {1}{2}D_{g}}\right) u^{\ast } \\ &\quad +\frac {1}{2}u^{\ast \mathsf {T}}D_{u}^{2} u^{\ast }+\frac {1}{2}\|2R-D_{g}I\|^{2}_{F} x_{e}^{\mathsf {T}} x_{e} \\ \le & -x^{\mathsf {T}}\left ({Q-\left ({D_{f}+\frac {1}{2}D_{g}}\right)I}\right)x \\ &\quad -u^{\ast \mathsf {T}}\left ({R-\frac {1}{2}\left ({D_{g}-D_{u}^{2}-D_{g}^{2}}\right)I}\right) u^{\ast } \\ &\quad +x_{e}^{\mathsf {T}}\left ({D_{g} D_{u}^{2}+\frac {1}{2}\|2R-D_{g}I\|^{2}_{F}}\right) x_{e} +\Delta \\ \le & -x^{\mathsf {T}}\left ({Q-\left ({D_{f}+\frac {1}{2}D_{g}}\right)I-N}\right)x-x^{\mathsf {T}}Nx \\ &\quad -u^{\ast \mathsf {T}}\left ({R-\frac {1}{2}\left ({D_{g}-D_{u}^{2}-D_{g}^{2}}\right)I}\right) u^{\ast } \\ &\quad +x_{e}^{\mathsf {T}}M x_{e} +\Delta \tag{30}\end{align*}$

View Source

where

$\Delta = (1/4)R^{-2}D_{g}^{3} \mu ^{2}+ (1/8)R^{-2}\mu ^{2}\|2R-D_{g}I\|^{2}_{F}$

,

$N \in {\mathbb R}^{n \times n}$

is a positive definite diagonal matrix, and

$M=(D_{g} D_{u}^{2}+ (1/2)\|2R-D_{g}I\|^{2}_{F})I$

.

Noticing that $x=x_{k}+x_{e}$ , we have

$\begin{align*} x^{\mathsf {T}}Nx &= \left ({x_{k}+x_{e}}\right)^{\mathsf {T}}N\left ({x_{k}+x_{e}}\right) \\ &= x_{k}^{\mathsf {T}}Nx_{k}+x_{e}^{\mathsf {T}}Nx_{e}+2x_{k}^{\mathsf {T}}Nx_{e} \\ &= \left ({1-\epsilon ^{2}}\right)x_{k}^{\mathsf {T}}Nx_{k}+\left ({1-\frac {1}{\epsilon ^{2}}}\right)x_{e}^{\mathsf {T}}Nx_{e} \\ &\quad +\left ({\epsilon x_{k}+\frac {1}{\epsilon }x_{e}}\right)N\left ({\epsilon x_{k}+\frac {1}{\epsilon }x_{e}}\right) \\ &\ge \left ({1-\epsilon ^{2}}\right)x_{k}^{\mathsf {T}}Nx_{k}+\left ({1-\frac {1}{\epsilon ^{2}}}\right)x_{e}^{\mathsf {T}}Nx_{e}. \tag{31}\end{align*}$ View Source

Thus, (30) becomes

$\begin{align*} \dot L_{2}&\le -x^{\mathsf {T}}Q_{0}x-x^{\mathsf {T}}Nx-u^{\ast \mathsf {T}}R_{0} u^{\ast }+\Delta \\ &\quad -\left ({1-\epsilon ^{2}}\right)x_{k}^{\mathsf {T}}Nx_{k}+x_{e}^{\mathsf {T}}\left ({\left ({1-\frac {1}{\epsilon ^{2}}}\right)N+M}\right)x_{e} \tag{32}\end{align*}$

View Source

where

$Q_{0}=Q-(D_{f}+ (1/2)D_{g})I-N$

and

$R_{0}=R- (1/2)(D_{g}-D_{u}^{2}-D_{g}^{2})I$

. Considering the condition (25), we have

$\begin{align*} \dot L_{2}&\le -x^{\mathsf {T}}Q_{0}x-x^{\mathsf {T}}Nx-u^{\ast \mathsf {T}}R_{0} u^{\ast }+\Delta \\ &\le -\lambda _{\min } \left ({Q_{0}}\right)\| {x}\|^{2}-\lambda _{\min } \left ({R_{0}}\right)\| {u^{\ast }}\|^{2}+\Delta \\ &\le -\lambda _{\min } \left ({Q_{0}}\right)\| {x}\|^{2}+\Delta. \tag{33}\end{align*}$

View Source

We can conclude that

$\dot L_{2}\le 0$

as long as

$Q_{0}$

and

$R_{0}$

are selected as positive definite matrices, i.e.,

$Q>(D_{f}+ (1/2)D_{g})I+N$

and

$R> (1/2)(D_{g}-D_{u}^{2}-D_{g}^{2})I$

, as well as

$x$

locates outside the following compact set:

$\begin{equation*} \Omega _{x}=\left \{{x \colon \|x\| \le \sqrt {\frac {\Delta }{\lambda _{\min }\left ({Q_{0}}\right)}} }\right \}.\end{equation*}$

View Source

Thus, the closed-loop nonlinear system (1) is assured to be UUB under the present self-triggered approximate optimal neuro-control scheme. This ends the proof.

Remark 2:

For Assumptions 1 and 2, we notice that practical systems are always modeled as continuous nonlinear differential equations. Thus, it is common that it has a unique equilibrium point $x_{0}$ . Furthermore, $f(x)$ and $g(x)$ denote the drift function and control input matrix in practical application systems, respectively. For example, $f(x)$ in robot systems consists of the inertia, Coriolis, and centripetal force, and $g(x)$ indicates the inertia, so it is feasible to assume them to be norm-bounded, and $g(x)$ cannot be zero. For Assumption 3, the ADP-based control strategy is developed based on policy iteration, which starts from a proper initial admissible control. It means that the initial admissible control can be regarded as a priori control. Meanwhile, we consider to design the control scheme in infinite horizon, so the Lipschitz continuity of control input $u(x)$ is satisfied. For Assumption 4, it is certain that the value of cost function is finite in practical systems, so its approximation cannot be infinite. Since $\tilde W_{c}$ , $\nabla \sigma (x)$ , and $\nabla \varepsilon _{c}$ denote the weight approximation error, and the partial gradients of activation function and NN approximation error, they can be ensured to be finite. Thus, Assumptions 3 and 4 can be fulfilled in practice. Moreover, these assumptions are commonly used in developing ADP-based control schemes [3], [7], [41].

D. Self-Triggered Mechanism

In this section, the inter-execution time $\Delta _{k}=t_{k+1}-t_{k}, k=1,2,\ldots,\infty$ , is determined as $\Delta _{k} =\Phi (x_{k})$ , which implies that inter-execution time is lower bounded with a positive scalar, that is to say, the self-triggered mechanism is admissible. Before the proof, we provide the definition of Zeno behavior as follows.

Definition 2 (Zeno Behavior):

The nonlinear system is Zeno if

$\begin{equation*} \lim _{k\to \infty } t_{k}=\sum _{k=0}^{\infty }\Delta _{k}=t_{\infty } < \infty \tag{34}\end{equation*}$ View Source

where

$\Delta _{k}$

is defined as the

$k$

th intersampling time and

$t_{\infty }$

is called the Zeno time [42].

Theorem 3:

The self-triggered time sequence $\{t_{k}\}$ is determined by

$\begin{equation*} t_{k+1}=t_{k}+\frac {1}{\lambda } \ln \left ({\lambda \frac { \delta x^{\mathsf {T}}_{k}Nx_{k}}{\rho \left ({x_{k}}\right)} +1 }\right) \tag{35}\end{equation*}$ View Source

where

$\lambda = 2+ 2D_{f}$

and

$\rho (x_{k})=(2D_{f}^{2}+ D_{g}^{2}D_{u}^{2})\lambda _{\max }(M)x_{k}^{\mathsf {T}}x_{k}$

. Then, the self-triggering condition (35) is admissible for the nonlinear system (1). Furthermore, the inter-execution time is provided by

$\begin{equation*} \Delta _{k}=\frac {1}{\lambda } \ln \left ({\lambda \frac { \delta x^{\mathsf {T}}_{k}Nx_{k}}{\rho \left ({x_{k}}\right)} +1 }\right)>0. \tag{36}\end{equation*}$

View Source

Proof:

Choose a Lyapunov function candidate as $L_{3}(t)=x_{e}^{\mathsf {T}}Mx_{e}, t \in [t_{k},t_{k+1})$ . We have its time derivative as

$\begin{align*} {\dot L_{3}} &=2x_{e}^{\mathsf {T}}M\dot x_{e} \\ &=2x_{e}^{\mathsf {T}}M\left ({f\left ({x}\right)+g\left ({x}\right)\hat u_{k}}\right) \\ &\le 2x_{e}^{\mathsf {T}}Mx_{e}+ f^{\mathsf {T}}\left ({x}\right)Mf\left ({x}\right)+ \left ({g\left ({x}\right)\hat u_{k} }\right)^{\mathsf {T}}Mg\left ({x}\right)\hat u_{k} \\ &\le 2x_{e}^{\mathsf {T}}Mx_{e} + x^{\mathsf {T}}D_{f}^{2}M x + D_{g}^{2}D_{u}^{2}x_{k}^{\mathsf {T}}M x_{k} \\ &\le 2x_{e}^{\mathsf {T}}Mx_{e} + 2\left ({x_{k}^{\mathsf {T}}D_{f}^{2}M x_{k}+x_{e}^{\mathsf {T}}D_{f} M x_{e}}\right) \\ &\quad + D_{g}^{2}D_{u}^{2}x_{k}^{\mathsf {T}}M x_{k} \\ &=x_{e}^{\mathsf {T}}\left ({2+D_{f} }\right)M x_{e} +x_{k}^{\mathsf {T}}\left ({2D_{f}^{2}+ D_{g}^{2}D_{u}^{2}}\right)M x_{k} \\ &\le \lambda x_{e}^{\mathsf {T}}Mx_{e} + \rho \left ({x_{k}}\right). \tag{37}\end{align*}$ View Source

Thus, we have

$\begin{equation*} {\dot L_{3}} \le \lambda L_{3} +\rho \left ({x_{k}}\right). \tag{38}\end{equation*}$

View Source

By using the comparison principle, it yields

$\begin{equation*} {L_{3}} \le \frac {\rho \left ({x_{k}}\right)}{\lambda } \left ({e^{\lambda \left ({t-t_{k}}\right)}-1 }\right), t \in \left [{t_{k}, t_{k+1}}\right). \tag{39}\end{equation*}$

View Source

Based on the self-triggered mechanism shown in (35), one has

$\begin{equation*} \frac {\rho \left ({x_{k}}\right)}{\lambda } \left ({e^{\lambda \left ({t-t_{k}}\right)}-1 }\right) \le \delta x^{\mathsf {T}}_{k}Nx_{k}, t \in \left [{t_{k}, t_{k+1}}\right). \tag{40}\end{equation*}$

View Source

According to (39) and (40), we can see that the nonlinear system (1) with the self-triggering condition (35) is ensured to be exponentially stable. Furthermore, it is obvious that

$\begin{equation*} \Delta _{k}=\frac {1}{\lambda } \ln \left ({\lambda \frac { \delta x^{\mathsf {T}}_{k}Nx_{k}}{\rho \left ({x_{k}}\right)} +1 }\right)>0.\end{equation*}$

View Source

This indicates that the inter-execution time between the current and the next predict time instant is larger than zero, which means that the Zeno behavior can be avoided. This ends the proof.

In summary, the ADP-based self-triggered approximate optimal neuro-control scheme is described in Algorithm 1.

Algorithm 1 ADP-Based Self-Triggered Approximate Optimal Neuro-Control

1:

Initialization: Initialize parameters $Q, R, \lambda, \delta, l_{c}, l_{\varepsilon}, N, \Gamma$ , the terminal time of system operation $T$ , and the computation accuracy $\xi >0$ of cost function. Let $p=0$ , $k=0$ , $t_{0}=0$ , $V^{(0)}(x_{0})=0$ , and begin with admissible control policy $u_{0} ^{(0)} (x_{0})$ .

2:

Policy evaluation: Let $k\geq 0$ , $p=p+1$ , based on the control policy $u ^{(p)}_{k} (x_{k})$ , solve the following nonlinear equation for $u ^{(p)}_{k} (x_{k})$ :

$\begin{equation*} U\left ({x_{k},u ^{\left ({p}\right)}_{k} }\right) + \nabla {V^{\left ({p}\right) \mathsf {T}}}\left ({x_{k}}\right)\left ({{f\left ({x_{k}}\right) + g\left ({x_{k}}\right)u ^{\left ({p}\right)}_{k}} }\right) = 0. \tag{41}\end{equation*}$ View Source

3:

Policy improvement: Update the control policy $u^{(p)} _{k} (x_{k})$ by

$\begin{equation*} u ^{\left ({p+1}\right)}_{k} \left ({x_{k} }\right)=-\frac {1}{2}R_{i}^{-1} g_{i}^{\mathsf {T}} \left ({x_{k}}\right)\nabla V^{\left ({p}\right)} \left ({x_{k} }\right). \tag{42}\end{equation*}$ View Source

4:

If $\| {V^{(p+1)} (x_{k})-V^{(p)} (x_{k})} \|\le \xi$ , go to 5 and obtain the self-triggered approximated optimal neuro-control $u_{k}$ at time instant $t_{k}$ ; else, return to 2.

5:

Self-triggered mechasnism: Let $k=k+1$ , predict the next triggering time instant by

$\begin{equation*} t_{k+1}=t_{k}+\frac {1}{\lambda } \ln \left ({\lambda \frac { \delta x^{\mathsf {T}}_{k}Nx_{k}}{\rho \left ({x_{k}}\right)} +1 }\right). \tag{43}\end{equation*}$ View Source

6:

If $t_{k} \le T$ , go to Step 2; else, stop.

E. Computational Requirements

Inspired by [43] and [44], the minimal computational requirements will be analyzed for the developed self-triggered approximate optimal neuro-control (24).

For the Veronese map $\vartheta$ from $x=[x_{1},x_{2},\ldots,x_{n}]^{\mathsf {T}}\in {\mathbb R}^{n}$ to the quadratic function $x^{\mathsf {T}}x \in {\mathbb R}^{n \times n}$ , we have

$\begin{equation*} \vartheta \colon =\left [{x_{1}^{2},x_{1}x_{2},\ldots,x_{1}x_{n},x_{2}^{2},x_{2}x_{3},\ldots,x_{n}^{2}}\right]^{\mathsf {T}}.\end{equation*}$ View Source

Therefore, its computation has space complexity $N(t_{s}) (n(n+1)/2)$ , where $t_{s}$ is the discretization step, $N(t_{s})= \lfloor (\Delta _{k\max }/t_{s}) \rfloor - \lfloor (\Delta _{k\min }/t_{s}) \rfloor$ , $\Delta _{k\max }$ is the maximum time the system is allowed to run in the open loop, and $\Delta _{k\min }$ is the minimum time on guaranteeing the time cost for one step computation. For the time complexity, define $t_{c}>0$ as the time it takes to execute an instruction in a given digital platform, and the implementation requires $N(t_{s}) (n(n+1)/2)t_{c}$ multiplications and the same amount of additions in preprocessing step, as well as $N(t_{s})t_{c}$ for testing $\dot L_{2} \le -\lambda _{\min } (Q_{0})\| x\|^{2} +\Delta$ in the running step.

Now, turn to our concern. Noticing that the Lyapunov function candidate (26) consists of a quadratic function of $x$ and an optimal cost function $V^{\ast }$ (i.e., an integral of quadratic functions of $x$ and $u$ from the initial time $t$ to infinity). Thus, the implementation of the developed self-triggered control policy (24) has a space complexity as

$\begin{equation*} M_{s}=N\left ({t_{s}}\right)\left ({\frac {n\left ({n+1}\right)}{2}+\frac {m\left ({m+1}\right)}{2}+1}\right)\end{equation*}$ View Source

since it requires the space of

$N(t_{s}) (n(n+1)/2)$

for computing

$x^{\mathsf {T}}x$

,

$N(t_{s}) (m(m+1)/2)$

for computing

$u^{\mathsf {T}}Ru$

, and

$N(t_{s})$

for storing the integral at

$t_{k}$

. Meanwhile, it has a time complexity as

$\begin{equation*} M_{t}=\left ({2n\left ({n+1}\right)+m\left ({m+1}\right)+N\left ({t_{s}}\right)}\right)t_{c}\end{equation*}$

View Source

since it requires preprocessing time as

$N(t_{s})n(n+1)t_{c}$

for

$x^{\mathsf {T}}x$

,

$N(t_{s})n(n+1)t_{c}$

for

$x^{\mathsf {T}}Qx$

and

$N(t_{s})m(m+1)t_{c}$

for

$u^{\mathsf {T}}Qu$

in the integral, and

$N(t_{s})t_{c}$

for testing the inequality

$\dot L_{2} \le -\lambda _{\min } (Q_{0})\| x\|^{2} +\Delta$

in the running step. It is worth pointing out that the minimum inter-execution time

$\Delta _{k\min }$

is determined by

$M_{t}$

, and the maximum inter-execution time

$\Delta _{k\max }$

is determined by guaranteeing

$\dot L_{2} = -\lambda _{\min } (Q_{0})\| x\|^{2} + \Delta$

.

Remark 3:

By introducing the self-triggering mechanism, this article presents a novel ADP-based optimal neuro-control scheme for nonlinear systems. The superiority of this approach is emphasized as follows: 1) it reduces the computational and communication resources, as well as energy cost since the control policy (24) is updated aperiodically and 2) the updating time instants of control policy are predicted by the designed self-triggered mechanism (35). It implies that the hardware monitoring the full system state in the event-triggered control structure is no longer required.

SECTION IV.

Simulation Studies

In this section, we provide two simulation examples, including practical and numerical systems, to verify the present self-triggered approximate optimal neuro-control scheme via ADP (24) to be effective.

A. Example 1

A torsional pendulum bar system is employed with the dynamics expressed as

$\begin{align*}& \frac {\mathrm {d}\theta }{\mathrm {d}t}=\omega \\[-0.5em]{}\smash {\left \{{\vphantom {\begin{matrix}.\\.\\.\\.\\ \end{matrix}}}\right.}& \\[-0.5em]& \mathcal {J}\frac {\mathrm {d}\omega }{\mathrm {d}t}=u-Mgl\sin \theta -f_{d}\frac {\mathrm {d}\theta }{\mathrm {d}t} \tag{44}\end{align*}$ View Source

where

$M=1/3\,\,\mathrm {kg}$

and

$l=2/3\,\,\mathrm {m}$

indicate the mass and length of the pendulum bar, respectively. Let the rotary inertia be

$\mathcal {J}=4/3 \mathrm {mL}^{2}$

, the frictional factor be

$f_{d}=0.2$

, and the gravitational acceleration be

$g=9.8\,\,\mathrm {m/s}^{2}$

. Replacing the system state

$\theta$

and

$\omega$

by

$x_{1}$

and

$x_{2}$

, the torsional pendulum bar system is expressed in the state-space form as

$\begin{align*} \dot {x} = \left [{{\begin{array}{c} x_{2} \\ -\frac {Mgl}{\mathcal {J}}\sin x_{1}-\frac {f_{d}}{\mathcal {J}}x_{2} \end{array}}}\right]+\left [{{\begin{array}{c} 0 \\ \frac {1}{\mathcal {J}} \end{array}}}\right]u.\end{align*}$

View Source

In the cost function,

$Q$

and

$R$

are selected as identity matrices with appropriate dimensions. Given the initial system state vector as

$x_{0}=[{1,-1}]^{\mathsf {T}}$

, the activation function of critic NN is chosen as

$\sigma (x)=[x_{1}^{2}, x_{1}x_{2}, x_{2}^{2}]^{\mathsf {T}}$

, and the weight vector is defined as

$\hat {W}_{c}=[\hat {W}_{c1}, \hat {W}_{c2}, \hat {W}_{c3}]^{\mathsf {T}}$

with initial value

$\hat {W}_{c}^{0}=[{0.502, -0.489, 0.012}]^{\mathsf {T}}$

. It is worth pointing out that their selection depends on the experience according to [40] and [41]. The learning rates in the nested updating policies for the critic NN are chosen as

$l_{c}=l_{\varepsilon} =0.4$

and

$\Gamma =0.5$

. In the self-triggered mechanism, based on the “trial and error,” we choose the parameter

$\lambda =0.1$

and

$\delta =0.1,0.3,0.5,0.7,0.9,1$

to test the sensitivity on the sampling frequency and show comparison results, respectively.

Taking $\delta =0.7$ as the representative for explaining the simulation results, simulation results are provided in Figs. 1–5 and Table I. Fig. 1 illustrates the evolution of the critic NN weights. We can see that they converge to $[{0.986, -0.371, 0.407}]^{\mathsf {T}}$ gradually. As illustrated in Fig. 2, by applying the developed self-triggered approximate optimal neuro-control policy computed by (24), the system states converge to zero after 29.2 s. Fig. 3 shows the self- and time-triggered control inputs. We can observe that the self-triggered approximate optimal neuro-control input is a piecewise continuous signal. It implies that the self-triggered approximate optimal neuro-control signal keeps unchanged during the time interval $[t_{k}, t_{k+1}$ ) and is updated at $t_{k}$ only. Fig. 4 illustrates the comparison results for the numbers of acquired samples, and the time-triggered and the self-triggered control methods require 1000 and only 131 samples, respectively. It shows that the sampling frequency is greatly reduced. From Fig. 5, we find that the minimum inter-sampling time $\Delta _{k}= 0.2$ s, which implies that the Zeno behavior has not occurred. As a representative comparison, these figures also illustrate the control performance under the parameter chosen as $\delta =0.1$ . Together with Table I, where $\Delta _{k\min }$ and $\Delta _{k\max }$ denote the actual minimum and maximum inter-execution times, respectively, we can conclude that with the increase of $\delta$ , the number of required samples is decreased, and the system states take less settling time to approach the neighborhood of the equilibrium. It implies that we can choose a proper value according to the requirements of the transient response and resource cost.

TABLE I Sampling Times With Different Values of Parameter

$\delta$

$Table I- Sampling Times With Different Values of Parameter $\delta$$

Fig. 1.

Evolution of the critic NN weights for the torsional pendulum system.

Show All

Fig. 2.

State trajectories of the torsional pendulum system.

Show All

Fig. 3.

Time- and self-triggered control inputs of the torsional pendulum system.

Show All

Fig. 4.

Required samples of time- and self-triggered controls of the torsional pendulum system.

Show All

Fig. 5.

Inter-execution times of the torsional pendulum system.

Show All

B. Example 2

The overhead crane system, which transports loads from one place to another, plays an important role in industry. In contrast to Example 1, its dynamic model is more complicated and has a high order. The dynamic model of this system is formulated as that in [45], and the same parameters of the overhead crane plants are selected.

In the simulation, $Q$ and $R$ are also selected as identity matrices with appropriate dimensions. The initial system state vector as $x_{0}=[{0.5,-0.5,0.8,-0.9}]^{\mathsf {T}}$ . The activation function of critic NN is chosen as $\sigma (x)=[x_{1}^{2}, x_{1}x_{2}, x_{1}x_{3}$ , $x_{1}x_{4}, x_{2}^{2}, x_{2}x_{3}, x_{2}x_{4}, x_{3}x_{4}, x_{4}^{2}]^{\mathsf {T}}$ , and the weight vector is defined as $\hat {W}_{c}=[\hat {W}_{c1}, \hat {W}_{c2}, \ldots, \hat {W}_{c9}]^{\mathsf {T}}$ with initial value $\hat {W}_{c}^{0}=[0.656,0.759,-0.892,-0.497,0.707,0.905$ , $-0.736,-0.896,0.092]^{\mathsf {T}}$ . The parameters in the nested updating policies are chosen as $l_{c}=1.2, l_{\varepsilon} =0.01$ , and $\Gamma =0.001$ . The parameter in the self-triggered mechanism is chosen as $\lambda =0.4$ and $\delta =0.7$ .

Simulation results of Example 2 are presented in Figs. 6–10. Fig. 6 describes the convergence process of the critic NN weights, we can observe that $\hat {W}_{c}$ converges to $[0.862,0.715,-0.889,-0.436,0.734,0.976,-0.798,-1.214, 0.013]^{\mathsf {T}}$ . As displayed in Fig. 7, the system states gradually converge to the equilibrium point after 13 s. Fig. 8 describes a piecewise signal of the self-triggered control, which is updated at $t_{k}$ only, while the time-triggered control is a continuous signal. From Fig. 9, it shows that in contrast to the time-triggered control input, which requires 800 samples, the self-triggered one needs 320 samples only, which means that the sampling has been reduced by 60%. Fig. 10 shows that the minimum inter-sampling time $\Delta _{k\min }= 0.05$ s. Thus, we can declare that the developed self-triggered approximate optimal neuro-control scheme is effective to assure the closed-loop overhead crane system to be stable in a UUB manner.

Fig. 6.

Evolution of the critic NN weights of the overhead crane system.

Show All

Fig. 7.

State trajectories of the overhead crane system.

Show All

Fig. 8.

Time- and self-triggered control inputs of the overhead crane system.

Show All

Fig. 9.

Required samples of time- and self-triggered controls of the overhead crane system.

Show All

Fig. 10.

Inter-execution times of the overhead crane system.

Show All

SECTION V.

Conclusion

A self-triggered approximate optimal neuro-control scheme is presented for nonlinear systems through ADP. By guaranteeing the asymptotic stability of the weight error dynamics, the critic NN is established to approximate the solution of the HJBE. Hereafter, the optimal neuro-control is indirectly derived in the ADP framework. By introducing the self-triggered mechanism, the updating time instants are predicted to determine the time updating the control policy. It is worth pointing out that a proper self-triggering condition, which predicts the next updating instant of the control policy, is designed to avoid the continuous monitoring of the system state by hardware devices in the event-triggered control approach, and the computation, the communication, and the energy consumption are decreased in an alternative way. Furthermore, the nested updating policies guarantee the asymptotic stability of the critic weight error dynamics, rather than UUB in most existing ADP-based optimal control methods.

Self-Triggered Approximate Optimal Neuro-Control for Nonlinear Systems Through Adaptive Dynamic Programming

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

Introduction

Problem Statement

Assumption 1:

Assumption 2:

Definition 1:

Self-Triggered Approximate Optimal Neuro-Controller Design and Stability Analysis

A. Asymptotically Converged Critic NN

Remark 1:

Theorem 1:

Proof:

B. Self-Triggered Approximate Optimal Neuro-Control

C. Stability Analysis

Assumption 3:

Assumption 4:

Theorem 2:

Proof:

Remark 2:

D. Self-Triggered Mechanism

Definition 2 (Zeno Behavior):

Theorem 3:

Proof:

Algorithm 1 ADP-Based Self-Triggered Approximate Optimal Neuro-Control

E. Computational Requirements

Remark 3:

Simulation Studies

A. Example 1

B. Example 2

Conclusion

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Self-Triggered Approximate Optimal Neuro-Control for Nonlinear Systems Through Adaptive Dynamic Programming

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

Introduction

Problem Statement

Assumption 1:

Assumption 2:

Definition 1:

Self-Triggered Approximate Optimal Neuro-Controller Design and Stability Analysis

A. Asymptotically Converged Critic NN

Remark 1:

Theorem 1:

Proof:

B. Self-Triggered Approximate Optimal Neuro-Control

C. Stability Analysis

Assumption 3:

Assumption 4:

Theorem 2:

Proof:

Remark 2:

D. Self-Triggered Mechanism

Definition 2 (Zeno Behavior):

Theorem 3:

Proof:

Algorithm 1 ADP-Based Self-Triggered Approximate Optimal Neuro-Control

E. Computational Requirements

Remark 3:

Simulation Studies

A. Example 1

B. Example 2

Conclusion

References

IEEE Account

Purchase Details

Profile Information

Need Help?