In the modern control framework, the control performance greatly depends on the communication networks, which transmits a large amount of data among controllers, sensors, and actuators. Especially, when a system structure grows its size and complexity, the deployment cost is increased accordingly. Therefore, in order to ensure both the system stability and the control performance, there exists a strong desire for controlled systems to not only provide an adequate level of control performance but also reduce resource and energy consumption.
Much effort has been made on developing optimal control methods to achieve adequate control level for nonlinear systems with different scenarios. It is worth pointing out that dynamic programming [1], which has been paid great attention by many researchers, is regarded as a classical and effective tool to obtain optimal control solutions of nonlinear systems. Furthermore, to avoid its “curse of dimensionality” that occurs in systems with the increasing structural complexity, adaptive dynamic programming (ADP), which is assisted by neural networks (NNs) or fuzzy logic systems, has been developed by Werbos [2] in the 1970s. During the last several decades, ADP-based optimal [3], [4], [5] and suboptimal [6] control approaches have received great success in tackling the control problems of continuous-time (CT) or discrete-time (DT) nonlinear systems with uncertainties or external disturbances [7], [8], input or output constraints [9], time-delay [10] and failures [11] to solve trajectory tracking [12], [13], zero-sum or nonzero-sum games [14], and so on. Moreover, some attempts have been made to implement the ADP-based control strategies to practical systems, such as residential energy scheduling [15], induction motor driving [16], power systems [17], microgrid energy management [18], near space vehicles [19], robot systems [12], and active suspension systems [20].
In order to reduce resource and energy consumption, ADP-based feedback control policies have been developed from periodic to aperiodic schedule, i.e., from time-triggered to event-triggered control strategy. Distinguished from the time-triggered control framework, the event-triggered control method [12], [21] is responsive and generates sensor sampling and control action as long as the system state diverges more than a certain threshold determined by the developed appropriate event-triggering condition, that is to say, the data transmission among the controllers, the sensors, and the actuators is greatly reduced to save the computational burden, the communication bandwidth, and the energy consumption.
Although the event-triggered control offers clear superiority to the periodic control, it is worth emphasizing that the triggering condition is continuously monitored based on current measurements from a hardware device. It implies that full-state information should be assumed to be available, but this assumption is always violated in most practical situations. Some beneficial attempts, which combine the event- and self-triggered mechanisms, have been made for different types of plants. Kishida et al. [22] presented an integrated event- and self-triggered networked control scheme for uncertain linear systems with finite-gain \mathcal {L} _{2}
stability, where the event-triggering condition decides whether a new control signal would be transmitted to the actuator at each sampling, and the self-triggering condition determines the time instants of sampling. Zhou et al. [23] proposed a combined self- and event-triggered control strategy to investigate the output control approach for quantized linear systems. Qi et al. [24] presented an event- and self-triggered boldsymbol control method for switched linear systems with exogenous disturbance. The current sampled data are employed to adaptively predict the inter-execution intervals to decrease resource consumption. Sahoo et al. [25] proposed a mixed event- and self-triggering-based regulation method for CT linear dynamical systems. Zuo et al. [26] developed both event- and self-triggered transmission strategies for dynamical systems by designing a triggering condition to save network resources.
To avoid the continuous monitoring of the system state by a hardware device, the self-triggering mechanism is introduced to design controllers by some researchers. In contrast to the event-triggered control which updates the control policy based on the developed triggering condition, the next triggering time in the self-triggered control [27] is determined at the former triggering time instant, that is to say, the continuous monitoring of the system state through hardware devices is not required anymore at the sensor side, and the embedded devices can shut their communication until the next transmission time instant. Focusing on linear systems, Zhang et al. [28] designed a self-triggered gain scheduling control for achieving the stabilization semiglobally for input constrained linear systems to avoid continuous monitoring the system states. Lu and Maciejowski [29] investigated a self-triggering mechanism-based model predictive control (MPC) strategy for linear systems in the presence of both state and input constraints, where both the updating of MPC control policy and the next triggering instant are determined according to the relaxed dynamic programming inequality. Brunner et al. [30] investigated a novel self-triggered aperiodic control method for perturbed DT linear systems by evaluating the set-membership conditions. In this way, a tradeoff is realized between the communication rate and the worse case asymptotic limitation on the closed-loop system state. For nonlinear systems, Liu et al. [31] presented an optimized self-triggered strategy-based robust MPC scheme for constrained DT uncertain nonlinear systems subject to disturbances. Li and Li [32] proposed a self-triggered distributed MPC scheme to reach consensus of a heterogeneous time-varying multiagent system. The self-triggering time intervals are alternatively optimized by the control inputs, and the influence on the system performance is analyzed. Based on the state information of each agent collected from its neighbors, Fan et al. [33] proposed a self-triggered consensus algorithm with Zeno-exclusion analysis for multiagent systems. Gao et al. [34] proposed a state estimation-based self-triggered control scheme for cyber-physical systems subject to joint attacks of both sensors and actuators with the enhancement of resource saving. Furthermore, Gao et al. [35] developed a robust self-triggered control scheme for time-varying constrained uncertain systems by analyzing the reachability for both linear and nonlinear scenarios. Furthermore, some attempts have been made for practical implementations. Cao et al. [36] proposed a self-triggered MPC strategy to investigate the trajectory tracking control problem for nonholonomic vehicles considering coupled input constraints and bounded perturbations. By introducing the self-triggered communication strategy, Zhou and Tokekar [37] developed a decentralized target tracking method for multirobot teams, and the time when a particular robot should search the online data from its neighbors and when it is secure to manipulate with possibly out of date data is determined based on the self-triggering mechanism to reduce the communication bandwidth.
From the aforementioned state-of-the-art of existing self-triggered control schemes, it is widely explored in MPC, distributed control, and networked control, rare works focused on optimal control. Lou and Ji [38] investigated a new self-triggered adaptive optimal control method for nonlinear CT systems leading to a high quantitative accuracy at a limited channel transmission rate. Kobayashi and Hiraishi [39] investigated the synthesis of self-triggered control problem for network systems in an optimal manner by computing the control input and the sampling time simultaneously.
In order to avoid the continuous monitoring through hardware devices in existing event-triggered control strategies, as well as to decrease the computational burden, the communication bandwidth, and the energy consumption, we present a new self-triggered approximate optimal neuro-control method for nonlinear systems based on ADP. The main contributions and novelties are concluded as following three aspects.
Different from existing ADP-based event-triggered control strategies [12], [21], the next triggering instant in the developed ADP-based self-triggered control is predicted by a software based on the previous triggering instant, that is to say, a new triggering condition based on self-triggering mechanism is explored to predict the next triggering instant. Thus, the continuous monitoring of the system state in the event-triggered control strategy is avoided, and the hardware device for continuous monitoring the system state is not required anymore.
By selecting a proper design structure for the nested updating policies, the critic NN weight error dynamics is guaranteed to be asymptotically stable, rather than uniformly ultimately bounded (UUB) in most of existing ADP-based control schemes.
The self-triggered control scheme cannot only guarantee the system to be stable in an optimal manner but also reduces the computation, the precious communication bandwidth, and the energy consumption. In other words, the developed control scheme offers a feasible tradeoff between the overall resource cost and the system control performance, which is significant in real implementations.
The remainder of this article is structured as follows. In Section II, the problem statement is given. In Section III, the self-triggered approximate optimal neuro-control is designed via the ADP framework in detail, and the stability analysis is offered. In Section IV, simulation studies illustrate the effectiveness of the developed approach. In Section V, concluding remarks are briefly described.
SECTION II.
Problem Statement
The considered nonlinear system dynamics is modeled in the general form as \begin{equation*} \dot x = f\left ({x}\right) + g\left ({x}\right)u\left ({x}\right) \tag{1}\end{equation*}
View Source
\begin{equation*} \dot x = f\left ({x}\right) + g\left ({x}\right)u\left ({x}\right) \tag{1}\end{equation*}
where x(t)\in {\mathbb R}^{n}
and u(t)\in {\mathbb R}^{m}
are the system state and control input vectors, respectively. f(x) \in {\mathbb R}^{n}
and g(x) \in {\mathbb R}^{n \times m}
represent the known drift dynamics and control input matrix, respectively. To ease the notation, x(t) = x
is denoted in the sequel.
To make the analysis more amenable, the following two assumptions are provided.
Assumption 1:
The vector-valued drift function f(x)
is Lipschitz continuous on the compact set \Omega \in {\mathbb R}^{n}
including the origin such that the solution x(t)
of the nonlinear system (1) is unique for the given equilibrium point x_{0} \in \Omega
and u
. Moreover, there is a scalar D_{f}>0
such that \|f(x)\|\le D_{f} \|x\|
, where \|\cdot \|
indicates the 2-norm of a vector, and the considered system (1) can be stabilized on \Omega
.
Assumption 2:
The control input matrix g(x)
is norm-bounded as 0 < \|g(x)\|_{F} \le D_{g}
for arbitrary x \in \Omega
, where D_{g}>0
and \|\cdot \|
denotes the Frobenius norm of a matrix.
Define the infinite-horizon cost function as \begin{equation*} V\left ({x,u}\right) = \int _{t}^{\infty} { U\left ({x\left ({\tau }\right),u\left ({\tau }\right)}\right) \mathrm {d}\tau } \tag{2}\end{equation*}
View Source
\begin{equation*} V\left ({x,u}\right) = \int _{t}^{\infty} { U\left ({x\left ({\tau }\right),u\left ({\tau }\right)}\right) \mathrm {d}\tau } \tag{2}\end{equation*}
where U(x,u) = {x^{\mathsf {T}}}Qx + {u^{\mathsf {T}}}Ru \ge 0
for all x \in {\mathbb R}{^{n}}
and u \in {\mathbb R}{^{m}}
, U(0,0) = 0
, V(0,0) = 0
, and Q \in {\mathbb R}{^{n \times n}}
and R \in {\mathbb R}{^{m \times m}}
are symmetric positive definite matrices.
Definition 1:
For the nonlinear system (1), a control policy u (x)
is defined to be admissible with respect to (2) if u (x)
is continuous on a set \Omega \in {\mathbb R}^{n}
, u (0)=0
, u (x)
stabilizes the nonlinear system (1), and V (x_{0})
in (2) is finite for all x \in \Omega
, where x_{0}=x(0)
is the initial state of x
. The admissible control set \psi (\Omega)
consists of admissible control policies.
For any given admissible control policy u
in the admissible control set \psi (\Omega)
, if the associated cost function (2) is continuously differentiable and V \in C^{1}
, then we can derive the nonlinear equation by taking the time derivative of (2) as in [41] \begin{equation*} U\left ({x,u}\right) + \nabla {V^{ \mathsf {T}}}\left ({x}\right)\left ({{f\left ({x}\right) + g\left ({x}\right)u} }\right) = 0 \tag{3}\end{equation*}
View Source
\begin{equation*} U\left ({x,u}\right) + \nabla {V^{ \mathsf {T}}}\left ({x}\right)\left ({{f\left ({x}\right) + g\left ({x}\right)u} }\right) = 0 \tag{3}\end{equation*}
where \nabla {V^{ \mathsf {T}}}(x) = {\partial V(x) \mathord {\left /{ {\vphantom {{\partial V^{\ast} (x)} {\partial x}}} }\right. } {\partial x}}
indicates the partial gradient of V(x)
with respect to the system state x
.
To drive the closed-loop system (1) to be convergent, the optimal control policy u^{\ast }(t)\in \psi (\Omega)
should be achieved by seeking the minimized cost function, i.e., the optimal cost function as \begin{equation*} {V^ {*} }\left ({x}\right) = \min _{u \in \psi \left ({\Omega }\right)} \int _{t}^{\infty} {U\left ({x\left ({\tau }\right),u\left ({\tau }\right)}\right)\mathrm {d}\tau }. \tag{4}\end{equation*}
View Source
\begin{equation*} {V^ {*} }\left ({x}\right) = \min _{u \in \psi \left ({\Omega }\right)} \int _{t}^{\infty} {U\left ({x\left ({\tau }\right),u\left ({\tau }\right)}\right)\mathrm {d}\tau }. \tag{4}\end{equation*}
By considering the optimal cost function (4), the associate Hamiltonian is defined for the nonlinear system (1) as \begin{equation*} H\left ({{x,u,\nabla V^{\ast} \left ({x}\right)} }\right) \!=\!U\left ({x,u}\right) \!+\! \nabla {V^{\ast \mathsf {T}}}\left ({x}\right)\left ({{f\left ({x}\right) \!+\! g\left ({x}\right)u} }\right). \tag{5}\end{equation*}
View Source
\begin{equation*} H\left ({{x,u,\nabla V^{\ast} \left ({x}\right)} }\right) \!=\!U\left ({x,u}\right) \!+\! \nabla {V^{\ast \mathsf {T}}}\left ({x}\right)\left ({{f\left ({x}\right) \!+\! g\left ({x}\right)u} }\right). \tag{5}\end{equation*}
Depending on the Bellman principle of optimality [1], the optimal cost function {V^ {*} }(x)
is derived by seeking the solution of the following Hamilton–Jacobi–Bellman equation (HJBE) \begin{equation*} 0 = \min _{u \in \psi \left ({\Omega }\right)} H\left ({{x,u^{\ast},\nabla {V^{*} }\left ({x}\right)} }\right). \tag{6}\end{equation*}
View Source
\begin{equation*} 0 = \min _{u \in \psi \left ({\Omega }\right)} H\left ({{x,u^{\ast},\nabla {V^{*} }\left ({x}\right)} }\right). \tag{6}\end{equation*}
Hence, the optimal neuro-control policy is illustrated in the closed form as \begin{equation*} {u^ {*} }\left ({x}\right) = - \frac {1}{2}{R^{ - 1}}{g^{\mathsf {T}}}\left ({x}\right)\nabla {V^ {*} }\left ({x}\right). \tag{7}\end{equation*}
View Source
\begin{equation*} {u^ {*} }\left ({x}\right) = - \frac {1}{2}{R^{ - 1}}{g^{\mathsf {T}}}\left ({x}\right)\nabla {V^ {*} }\left ({x}\right). \tag{7}\end{equation*}
By involving the self-triggered mechanism, the present optimal neuro-control policy is updated when the self-triggered time instant is satisfied only, which is predicted by the present state and a given function through software. Thus, it is easy to implement in contrast to the event-triggered optimal control schemes.
With the help of the prediction, the optimal neuro-control policy is updated under the self-triggered mechanism as u_{k}=u(x(t_{k}))
at the predicted time t_{k}
. Therefore, the control target of this article is to present a self-triggered approximate optimal neuro-control policy u(x(t_{k}))
to make the closed-loop system (1) to be stable.
SECTION III.
Self-Triggered Approximate Optimal Neuro-Controller Design and Stability Analysis
A. Asymptotically Converged Critic NN
Starting from the cost function (2), it is tough to solve the HJBE (6). Fortunately, based on the general approximation ability of NNs, V(x)
is accurately approximated by a feedforward NN with only one hidden layer [40], [41] such that \begin{equation*} V\left ({x}\right) = W_{c}^{\mathsf {T}}\sigma \left ({x}\right) + {\varepsilon _{c}}\left ({x}\right) \tag{8}\end{equation*}
View Source
\begin{equation*} V\left ({x}\right) = W_{c}^{\mathsf {T}}\sigma \left ({x}\right) + {\varepsilon _{c}}\left ({x}\right) \tag{8}\end{equation*}
where {W_{c}} \in {\mathbb R}{^{l_{c}}}
, \sigma (x) \in {\mathbb R}{^{l_{c}}}
, and {\varepsilon _{c}}(x)\in {\mathbb R}
represent the unknown optimal weight vector, the activation function, and the approximation error, respectively, and {l_{c}}
indicates the quantity of hidden neurons. Thus, taking the partial derivative of (8) along the system state x
, we have \begin{equation*} \nabla V\left ({x}\right) = { {\nabla \sigma ^{\mathsf {T}} \left ({x}\right)} }{W_{c}} + \nabla {\varepsilon _{c}^{\mathsf {T}}}\left ({x}\right) \tag{9}\end{equation*}
View Source
\begin{equation*} \nabla V\left ({x}\right) = { {\nabla \sigma ^{\mathsf {T}} \left ({x}\right)} }{W_{c}} + \nabla {\varepsilon _{c}^{\mathsf {T}}}\left ({x}\right) \tag{9}\end{equation*}
where \nabla \sigma ^{\mathsf {T}} (x) = {\partial \sigma (x) \mathord {\left /{ {\vphantom {}} }\right. } {\partial x \in {\mathbb R}{^{ n\times {l_{c}}}}}}
and \nabla {\varepsilon _{c}^{\mathsf {T}}}(x) = {{\partial {\varepsilon _{c}}(x)} \mathord {\left /{ {\vphantom {{\partial {\varepsilon _{c}}(x)} {\partial x \in {\mathbb R} {^{n}}}}} }\right. } {\partial x \in {\mathbb R}{^{n}}}}
are the corresponding partial gradients along the system state x
, respectively.
In this case, the Hamiltonian for the nonlinear system (1) is defined as \begin{equation*} H\left ({{x,u,{W_{c}}} }\right) = U\left ({{x,u} }\right)+ W_{c}^{\mathsf {T}}\nabla \sigma \left ({x}\right)\dot x + \nabla {\varepsilon _{c}}\left ({x}\right)\dot x. \tag{10}\end{equation*}
View Source
\begin{equation*} H\left ({{x,u,{W_{c}}} }\right) = U\left ({{x,u} }\right)+ W_{c}^{\mathsf {T}}\nabla \sigma \left ({x}\right)\dot x + \nabla {\varepsilon _{c}}\left ({x}\right)\dot x. \tag{10}\end{equation*}
In order to derive the optimal control policy (7) customized to a given controlled plant, the unknown optimal weight vector {W_{c}}
is trained by approximating (8) as \begin{equation*} \hat V\left ({x}\right) = \hat W_{c}^{\mathsf {T}}\sigma \left ({x}\right) \tag{11}\end{equation*}
View Source
\begin{equation*} \hat V\left ({x}\right) = \hat W_{c}^{\mathsf {T}}\sigma \left ({x}\right) \tag{11}\end{equation*}
where {\hat W_{c}}
is the estimate of {W_{c}}
. The partial gradient with respect to the corresponding system state x
of (11) is described as \begin{equation*} \nabla \hat V\left ({x}\right) = { {\nabla \sigma ^{\mathsf {T}}\left ({x}\right)}}{\hat W_{c}}. \tag{12}\end{equation*}
View Source
\begin{equation*} \nabla \hat V\left ({x}\right) = { {\nabla \sigma ^{\mathsf {T}}\left ({x}\right)}}{\hat W_{c}}. \tag{12}\end{equation*}
Under the approximation (11) of the cost function V(x)
, the Hamiltonian (43) is approximated by \begin{align*} H\left ({{x,u,{\hat W_{c}}} }\right)& =U\left ({{x,u} }\right) + \hat W_{c}^{\mathsf {T}}\nabla \sigma \left ({x}\right)\dot x \\ &={e_{c}}. \tag{13}\end{align*}
View Source
\begin{align*} H\left ({{x,u,{\hat W_{c}}} }\right)& =U\left ({{x,u} }\right) + \hat W_{c}^{\mathsf {T}}\nabla \sigma \left ({x}\right)\dot x \\ &={e_{c}}. \tag{13}\end{align*}
By comparing (13) with (5), the Hamiltonian approximation error is obtained as \begin{equation*} {e_{c}} = \varepsilon - \tilde W_{c}^{\mathsf {T}}{\theta } \tag{14}\end{equation*}
View Source
\begin{equation*} {e_{c}} = \varepsilon - \tilde W_{c}^{\mathsf {T}}{\theta } \tag{14}\end{equation*}
where {\tilde W_{c}} = {W_{c}} - {\hat W_{c}}
denotes the weight approximation error vector, {\theta } = \nabla \sigma (x)\dot x
, and \varepsilon = - \nabla {\varepsilon _{c}}(x)\dot x
is norm-bounded by a positive constant \varepsilon _{M}
as \|\varepsilon \| \le \varepsilon _{M}
.
Similarly, to derive the critic NN weight vector, the target function {E_{c}} = (1/2)e_{c}^{\mathsf {T}}{e_{c}}
is minimized through the commonly used steepest descent algorithm. Thus, the weight vector is updated by \begin{equation*} {\dot {\hat W}_{c}} = - {l_{c}}\left [{ \frac {\partial E_{c}} {\partial \hat W_{c}}}\right] \tag{15}\end{equation*}
View Source
\begin{equation*} {\dot {\hat W}_{c}} = - {l_{c}}\left [{ \frac {\partial E_{c}} {\partial \hat W_{c}}}\right] \tag{15}\end{equation*}
where {l_{c}} > 0
is a learning rate. On this basis, the nested updating policies are described as \begin{equation*} {\dot {\hat W}_{c}} = - {l_{c}}\left ({{e_{c} - {\hat \varepsilon }_{M}-\Gamma \mathop {\mathrm{ sgn}}\left ({{\tilde W_{c}^{\mathsf {T}}}{\theta }}\right) } }\right){\theta } \tag{16}\end{equation*}
View Source
\begin{equation*} {\dot {\hat W}_{c}} = - {l_{c}}\left ({{e_{c} - {\hat \varepsilon }_{M}-\Gamma \mathop {\mathrm{ sgn}}\left ({{\tilde W_{c}^{\mathsf {T}}}{\theta }}\right) } }\right){\theta } \tag{16}\end{equation*}
where \Gamma >0
is a design parameter, {\hat \varepsilon }_{M}
is the estimate of {\varepsilon }_{M}
tuned by \begin{equation*} \dot {\hat \varepsilon }_{M} = - {l_{\varepsilon} }\hat W_{c}^{\mathsf {T}}{\theta }, \tag{17}\end{equation*}
View Source
\begin{equation*} \dot {\hat \varepsilon }_{M} = - {l_{\varepsilon} }\hat W_{c}^{\mathsf {T}}{\theta }, \tag{17}\end{equation*}
and l_{\varepsilon} >0
is a learning rate. Define the weight approximation error vector as {\tilde W_{c}} = {W_{c}} - {\hat W_{c}}
, and it is adjusted by \begin{align*} {\dot {\tilde W}_{c}}& = - {\dot {\hat W}_{c}} \\ & = {l_{c}}\left ({\varepsilon -\varepsilon _{M}+ {\tilde \varepsilon }_{M}- \tilde W_{c}^{\mathsf {T}}{\theta }-\Gamma \mathop {\mathrm{ sgn}}\left ({{\tilde W_{c}^{\mathsf {T}}}{\theta }}\right) }\right){\theta } \tag{18}\end{align*}
View Source
\begin{align*} {\dot {\tilde W}_{c}}& = - {\dot {\hat W}_{c}} \\ & = {l_{c}}\left ({\varepsilon -\varepsilon _{M}+ {\tilde \varepsilon }_{M}- \tilde W_{c}^{\mathsf {T}}{\theta }-\Gamma \mathop {\mathrm{ sgn}}\left ({{\tilde W_{c}^{\mathsf {T}}}{\theta }}\right) }\right){\theta } \tag{18}\end{align*}
where {\tilde \varepsilon }_{M} = \varepsilon _{M} - {\hat \varepsilon }_{M}
, and it is adjusted by \begin{equation*} {\dot {\tilde \varepsilon }}_{M} = -{l_{\varepsilon} }\tilde W_{c}^{\mathsf {T}}{\theta }. \tag{19}\end{equation*}
View Source
\begin{equation*} {\dot {\tilde \varepsilon }}_{M} = -{l_{\varepsilon} }\tilde W_{c}^{\mathsf {T}}{\theta }. \tag{19}\end{equation*}
From (18) and (19), we know that the approximation error item \tilde \varepsilon _{M}
, adjusted by (19), is embedded in the updating policy (18) of {\tilde W}_{c}
. Thus, the updating policies (18) and (19) are then regarded as “the nested updating policies.”
Next, we will show the nested updating policies (18) and (19) can ensure the asymptotic stability of the critic NN weight error dynamics, rather than UUB in most existing ADP-based optimal control methods.
Theorem 1:
For the nonlinear system (1), the developed nested updating policies (18) and (19) ensure the critic NN weight error dynamics to be asymptotically stable.
Proof:
Considering both approximation errors in the nested updating policies, select a Lyapunov function candidate as \begin{equation*} {L_{1}} = \mathrm {tr}\left ({{\frac {1}{{2{l_{c}}}}\tilde W_{c}^{\mathsf {T}}{\tilde W_{c}}} }\right) + \frac {1}{{2{l_{\varepsilon} }}}{\tilde \varepsilon _{M} ^{2}}. \tag{20}\end{equation*}
View Source
\begin{equation*} {L_{1}} = \mathrm {tr}\left ({{\frac {1}{{2{l_{c}}}}\tilde W_{c}^{\mathsf {T}}{\tilde W_{c}}} }\right) + \frac {1}{{2{l_{\varepsilon} }}}{\tilde \varepsilon _{M} ^{2}}. \tag{20}\end{equation*}
For (20), taking its time derivative and plugging the nested updating policy (18) into it, we derive \begin{align*} {\dot L_{1}} &=\mathrm {tr}\left ({{\frac {1}{{{l_{c}}}}\tilde W_{c}^{\mathsf {T}}{{\dot {\tilde W}}_{c}}} }\right) + \frac {1}{{{l_{\varepsilon} }}} {\dot {\tilde \varepsilon }}_{M}{\tilde \varepsilon _{M}} \\ &=\mathrm {tr}\left ({{\tilde W_{c}^{\mathsf {T}}\left ({\varepsilon -\varepsilon _{M}}\right) {\theta }} }\right)+\mathrm {tr}\left ({{\tilde W_{c}^{\mathsf {T}}{\tilde \varepsilon _{M}} {\theta }} }\right) \\ &\quad -\Gamma \big \| {\tilde W_{c}^{\mathsf {T}}}{\theta } \big \|- {\big \| {\tilde W_{c}^{\mathsf {T}}{\theta }} \big \|^{2}} + \frac {1}{{{l_{\varepsilon} }}}{\dot {\tilde \varepsilon }}_{M}{\tilde \varepsilon _{M}}. \tag{21}\end{align*}
View Source
\begin{align*} {\dot L_{1}} &=\mathrm {tr}\left ({{\frac {1}{{{l_{c}}}}\tilde W_{c}^{\mathsf {T}}{{\dot {\tilde W}}_{c}}} }\right) + \frac {1}{{{l_{\varepsilon} }}} {\dot {\tilde \varepsilon }}_{M}{\tilde \varepsilon _{M}} \\ &=\mathrm {tr}\left ({{\tilde W_{c}^{\mathsf {T}}\left ({\varepsilon -\varepsilon _{M}}\right) {\theta }} }\right)+\mathrm {tr}\left ({{\tilde W_{c}^{\mathsf {T}}{\tilde \varepsilon _{M}} {\theta }} }\right) \\ &\quad -\Gamma \big \| {\tilde W_{c}^{\mathsf {T}}}{\theta } \big \|- {\big \| {\tilde W_{c}^{\mathsf {T}}{\theta }} \big \|^{2}} + \frac {1}{{{l_{\varepsilon} }}}{\dot {\tilde \varepsilon }}_{M}{\tilde \varepsilon _{M}}. \tag{21}\end{align*}
For matrices X
and Y
, if YX \in {\mathbb R}
, we notice that \mathrm {tr}(XY) = \mathrm {tr}(YX)=YX
. Introducing the updating policy (19) into (21), we derive \begin{align*} {\dot L_{1}} &=\mathrm {tr}\left ({{\tilde W_{c}^{\mathsf {T}}{\theta }\left ({\varepsilon -\varepsilon _{M}}\right)} }\right)+ \mathrm {tr}\left ({{\tilde W_{c}^{\mathsf {T}}{\theta }{\tilde \varepsilon _{M}}} }\right)-\Gamma \big \| {\tilde W_{c}^{\mathsf {T}}}{\theta } \big \| \\ &\quad - {\big \| {\tilde W_{c}^{\mathsf {T}}{\theta }} \big \|^{2}} + \frac {1}{{{l_{\varepsilon} }}} {\dot {\tilde \varepsilon }}_{M}{\tilde \varepsilon _{M}} \\ &\le \big \|{\tilde W_{c}^{\mathsf {T}}}{\theta }\big \|\big \|\varepsilon -\varepsilon _{M}\big \| + \mathrm {tr}\left ({{\tilde W_{c}^{\mathsf {T}}{\theta }{\tilde \varepsilon _{M}}} }\right)-\Gamma \big \|{\tilde W_{c}^{\mathsf {T}}}{\theta } \big \| \\ &\quad - {\big \| {\tilde W_{c}^{\mathsf {T}}{\theta }} \big \|^{2}} + \frac {1}{{{l_{\varepsilon} }}} {\dot {\tilde \varepsilon }}_{M}{\tilde \varepsilon _{M}} \\ &\le 2\varepsilon _{M}\big \|{\tilde W_{c}^{\mathsf {T}}}{\theta }\big \| + \mathrm {tr}\left ({{\tilde W_{c}^{\mathsf {T}}{\theta }{\tilde \varepsilon _{M}}} }\right) -\Gamma \big \| {\tilde W_{c}^{\mathsf {T}}}{\theta } \big \| \\ &\quad - {\big \| {\tilde W_{c}^{\mathsf {T}}{\theta }} \big \|^{2}} + \frac {1}{{{l_{\varepsilon} }}} {\dot {\tilde \varepsilon }}_{M}{\tilde \varepsilon _{M}} \\ &= -\left ({\Gamma -2\varepsilon _{M}}\right)\big \|{\tilde W_{c}^{\mathsf {T}}}{\theta }\big \|- {\big \| {\tilde W_{c}^{\mathsf {T}}{\theta }} \big \|^{2}}. \tag{22}\end{align*}
View Source
\begin{align*} {\dot L_{1}} &=\mathrm {tr}\left ({{\tilde W_{c}^{\mathsf {T}}{\theta }\left ({\varepsilon -\varepsilon _{M}}\right)} }\right)+ \mathrm {tr}\left ({{\tilde W_{c}^{\mathsf {T}}{\theta }{\tilde \varepsilon _{M}}} }\right)-\Gamma \big \| {\tilde W_{c}^{\mathsf {T}}}{\theta } \big \| \\ &\quad - {\big \| {\tilde W_{c}^{\mathsf {T}}{\theta }} \big \|^{2}} + \frac {1}{{{l_{\varepsilon} }}} {\dot {\tilde \varepsilon }}_{M}{\tilde \varepsilon _{M}} \\ &\le \big \|{\tilde W_{c}^{\mathsf {T}}}{\theta }\big \|\big \|\varepsilon -\varepsilon _{M}\big \| + \mathrm {tr}\left ({{\tilde W_{c}^{\mathsf {T}}{\theta }{\tilde \varepsilon _{M}}} }\right)-\Gamma \big \|{\tilde W_{c}^{\mathsf {T}}}{\theta } \big \| \\ &\quad - {\big \| {\tilde W_{c}^{\mathsf {T}}{\theta }} \big \|^{2}} + \frac {1}{{{l_{\varepsilon} }}} {\dot {\tilde \varepsilon }}_{M}{\tilde \varepsilon _{M}} \\ &\le 2\varepsilon _{M}\big \|{\tilde W_{c}^{\mathsf {T}}}{\theta }\big \| + \mathrm {tr}\left ({{\tilde W_{c}^{\mathsf {T}}{\theta }{\tilde \varepsilon _{M}}} }\right) -\Gamma \big \| {\tilde W_{c}^{\mathsf {T}}}{\theta } \big \| \\ &\quad - {\big \| {\tilde W_{c}^{\mathsf {T}}{\theta }} \big \|^{2}} + \frac {1}{{{l_{\varepsilon} }}} {\dot {\tilde \varepsilon }}_{M}{\tilde \varepsilon _{M}} \\ &= -\left ({\Gamma -2\varepsilon _{M}}\right)\big \|{\tilde W_{c}^{\mathsf {T}}}{\theta }\big \|- {\big \| {\tilde W_{c}^{\mathsf {T}}{\theta }} \big \|^{2}}. \tag{22}\end{align*}
From the above analysis, we know that if there exists a design parameter \Gamma \ge 2\varepsilon _{M}
, then, {\dot L}_{1} \le 0
. Hence, it indicates that the nested updating policies (18) and (19) guarantee the asymptotic stability of the critic NN weight error vector {\tilde W_{c}}
. This ends the proof.
B. Self-Triggered Approximate Optimal Neuro-Control
Based on the partial derivative of the approximate critic NN (12), the desired optimal neuro-control policy (7) is approximated by \begin{equation*} {\hat u}\left ({x}\right) = - \frac {1}{2}{R^{ - 1}}{g^{\mathsf {T}}}\left ({x}\right){\left ({{\nabla \sigma \left ({x}\right)} }\right)^{\mathsf {T}}}{\hat W_{c}}. \tag{23}\end{equation*}
View Source
\begin{equation*} {\hat u}\left ({x}\right) = - \frac {1}{2}{R^{ - 1}}{g^{\mathsf {T}}}\left ({x}\right){\left ({{\nabla \sigma \left ({x}\right)} }\right)^{\mathsf {T}}}{\hat W_{c}}. \tag{23}\end{equation*}
According to the self-triggered mechanism, the optimal neuro-control policy can be updated at the predicted time sequence as \begin{equation*} {\hat u}\left ({x_{k}}\right)= {\hat u}_{k}= - \frac {1}{2}{R^{ - 1}}{g^{\mathsf {T}}}\left ({x_{k}}\right){\left ({{\nabla \sigma \left ({x_{k}}\right)} }\right)^{\mathsf {T}}}{\hat W_{c}}. \tag{24}\end{equation*}
View Source
\begin{equation*} {\hat u}\left ({x_{k}}\right)= {\hat u}_{k}= - \frac {1}{2}{R^{ - 1}}{g^{\mathsf {T}}}\left ({x_{k}}\right){\left ({{\nabla \sigma \left ({x_{k}}\right)} }\right)^{\mathsf {T}}}{\hat W_{c}}. \tag{24}\end{equation*}
C. Stability Analysis
Before showing the stability of the closed-loop system (1) with the developed self-triggered approximate optimal neuro-control policy (24), the following assumptions are provided.
Assumption 3:
On the compact set \Omega \in {\mathbb R}^{n}
, the control input function u(x)
is Lipschitz continuous, i.e., there is a scalar D_{u}>0
such that \|u(x(t))-u(x(t_{k}))\|=\|u^{\ast }-u_{k}\|\le D_{u}\|x_{e}\|
, where x_{k}=x(t_{k})
and x_{e}=x-x_{k}
.
Assumption 4:
\tilde W_{c}
, \nabla \sigma (x)
, and \nabla \varepsilon _{c}
are norm-bounded as \|\tilde W_{c}\| \le W_{cM}
, \|\nabla \sigma (x)\| \le \sigma _{cM}
, and \|\nabla \varepsilon _{c}\| \le \varepsilon _{cM}
, where W_{cM}
, \sigma _{cM}
, and \varepsilon _{cM}
are unknown positive scalars.
Theorem 2:
For the nonlinear system (1), consider the cost function (2), Assumptions 1–4, and the developed self-triggered approximate optimal neuro-control policy (24). If there are positive constants \delta \in (0,1]
and \epsilon \in (0,1)
such that \begin{equation*} \delta \left ({1-\epsilon ^{2}}\right)x_{k}^{\mathsf {T}}Nx_{k}\ge x_{e}^{\mathsf {T}}\left ({\left ({1-\frac {1}{\epsilon ^{2}}}\right)N+M}\right)x_{e} \tag{25}\end{equation*}
View Source
\begin{equation*} \delta \left ({1-\epsilon ^{2}}\right)x_{k}^{\mathsf {T}}Nx_{k}\ge x_{e}^{\mathsf {T}}\left ({\left ({1-\frac {1}{\epsilon ^{2}}}\right)N+M}\right)x_{e} \tag{25}\end{equation*}
where N
and M
are later defined positive definite diagonal matrices with proper dimensions. Then, the UUB stability can be assured for the closed-loop nonlinear system (1).
Proof:
Select a Lyapunov function candidate as \begin{equation*} L_{2}=\frac {1}{2}x^{\mathsf {T}}x+{V^{*}}\left ({x}\right). \tag{26}\end{equation*}
View Source
\begin{equation*} L_{2}=\frac {1}{2}x^{\mathsf {T}}x+{V^{*}}\left ({x}\right). \tag{26}\end{equation*}
Substituting the developed self-triggered approximate optimal neuro-control policy (24), the time derivative of (26) becomes \begin{align*} \dot L_{2}&=x^{\mathsf {T}}\dot x+\nabla {V^{\ast } }\left ({x}\right)\dot x \\ &\le x^{\mathsf {T}}\left ({f\left ({x}\right)+g\left ({x}\right)\hat u_{k} }\right) \\ &\quad +\nabla {V^ {\ast }}\left ({x}\right)\left ({f\left ({x}\right)+g\left ({x}\right)\left ({u^{\ast }\left ({x}\right)+\hat u_{k}-u^{\ast }\left ({x}\right)}\right)}\right) \\ &\le x^{\mathsf {T}}\left ({f\left ({x}\right)+g\left ({x}\right)\hat u_{k} }\right)+\nabla {V^ {\ast }}\left ({x}\right)\left ({f\left ({x}\right)+g\left ({x}\right)u^{\ast }\left ({x}\right) }\right) \\ &\quad +2 u^{\ast \mathsf {T}}R\left ({u^{\ast }\left ({x}\right)-\hat u_{k}}\right). \tag{27}\end{align*}
View Source
\begin{align*} \dot L_{2}&=x^{\mathsf {T}}\dot x+\nabla {V^{\ast } }\left ({x}\right)\dot x \\ &\le x^{\mathsf {T}}\left ({f\left ({x}\right)+g\left ({x}\right)\hat u_{k} }\right) \\ &\quad +\nabla {V^ {\ast }}\left ({x}\right)\left ({f\left ({x}\right)+g\left ({x}\right)\left ({u^{\ast }\left ({x}\right)+\hat u_{k}-u^{\ast }\left ({x}\right)}\right)}\right) \\ &\le x^{\mathsf {T}}\left ({f\left ({x}\right)+g\left ({x}\right)\hat u_{k} }\right)+\nabla {V^ {\ast }}\left ({x}\right)\left ({f\left ({x}\right)+g\left ({x}\right)u^{\ast }\left ({x}\right) }\right) \\ &\quad +2 u^{\ast \mathsf {T}}R\left ({u^{\ast }\left ({x}\right)-\hat u_{k}}\right). \tag{27}\end{align*}
Based on Assumptions 1 and 2, considering (3) and utilizing Young’s inequality, we obtain \begin{align*} \dot L_{2}&\le -x^{\mathsf {T}}D_{f}x+\frac {1}{2}x^{\mathsf {T}}D_{g}x+\frac {1}{2}\hat u_{k}^{\mathsf {T}}D_{g}\hat u_{k} \\ &\quad +2u^{\ast \mathsf {T}}R\left ({u^{\ast }-\hat u_{k}}\right)-x^{\mathsf {T}}Qx-u^{\ast \mathsf {T}}Ru^{*} \\ &\le -x^{\mathsf {T}}\left ({D_{f}+\frac {1}{2}D_{g}}\right)x+2u^{\ast \mathsf {T}}R\left ({u^{\ast }-\hat u_{k}}\right) \\ &\quad +\frac {1}{2}{\left ({u^{\ast }-u^{\ast }+\hat u_{k}}\right)}^{\mathsf {T}}D_{g}{\left ({u^{\ast }-u^{\ast }+\hat u_{k}}\right)} \\ &\quad -x^{\mathsf {T}}Qx-u^{\ast \mathsf {T}}Ru^{*} \\ &= -x^{\mathsf {T}}\left ({D_{f}+\frac {1}{2}D_{g}}\right)x+\frac {1}{2}{\left ({u^{\ast }-\hat u_{k}}\right)}^{\mathsf {T}}D_{g}{\left ({u^{\ast }-\hat u_{k}}\right)} \\ &\quad -u^{\ast \mathsf {T}}\left ({R-\frac {1}{2}D_{g}}\right) u^{\ast }-x^{\mathsf {T}}Qx \\ &\quad +u^{\ast \mathsf {T}}\left ({2R-D_{g}I}\right){\left ({u^{\ast }-\hat u_{k}}\right)}. \tag{28}\end{align*}
View Source
\begin{align*} \dot L_{2}&\le -x^{\mathsf {T}}D_{f}x+\frac {1}{2}x^{\mathsf {T}}D_{g}x+\frac {1}{2}\hat u_{k}^{\mathsf {T}}D_{g}\hat u_{k} \\ &\quad +2u^{\ast \mathsf {T}}R\left ({u^{\ast }-\hat u_{k}}\right)-x^{\mathsf {T}}Qx-u^{\ast \mathsf {T}}Ru^{*} \\ &\le -x^{\mathsf {T}}\left ({D_{f}+\frac {1}{2}D_{g}}\right)x+2u^{\ast \mathsf {T}}R\left ({u^{\ast }-\hat u_{k}}\right) \\ &\quad +\frac {1}{2}{\left ({u^{\ast }-u^{\ast }+\hat u_{k}}\right)}^{\mathsf {T}}D_{g}{\left ({u^{\ast }-u^{\ast }+\hat u_{k}}\right)} \\ &\quad -x^{\mathsf {T}}Qx-u^{\ast \mathsf {T}}Ru^{*} \\ &= -x^{\mathsf {T}}\left ({D_{f}+\frac {1}{2}D_{g}}\right)x+\frac {1}{2}{\left ({u^{\ast }-\hat u_{k}}\right)}^{\mathsf {T}}D_{g}{\left ({u^{\ast }-\hat u_{k}}\right)} \\ &\quad -u^{\ast \mathsf {T}}\left ({R-\frac {1}{2}D_{g}}\right) u^{\ast }-x^{\mathsf {T}}Qx \\ &\quad +u^{\ast \mathsf {T}}\left ({2R-D_{g}I}\right){\left ({u^{\ast }-\hat u_{k}}\right)}. \tag{28}\end{align*}
According to Assumption 4, we notice that \begin{align*} \|u^{\ast }-\hat u_{k}\| &= \|u^{\ast }-\hat u+ \hat u- \hat u_{k}\| \\ &\le \|u^{\ast }-\hat u\| + \|\hat u- \hat u_{k}\| \\ &\le \frac {1}{2}\|R^{-1}\|_{F}\|g\left ({x}\right)\|_{F}\| \tilde W_{c} \nabla \sigma \left ({x}\right)+\nabla e_{c} \left ({x}\right)\| \\ &\quad +D_{u} \|x_{e}\| \\ &\le \frac {1}{2}\|R^{-1}\|_{F}D_{g} \mu +D_{u} \|x_{e}\| \tag{29}\end{align*}
View Source
\begin{align*} \|u^{\ast }-\hat u_{k}\| &= \|u^{\ast }-\hat u+ \hat u- \hat u_{k}\| \\ &\le \|u^{\ast }-\hat u\| + \|\hat u- \hat u_{k}\| \\ &\le \frac {1}{2}\|R^{-1}\|_{F}\|g\left ({x}\right)\|_{F}\| \tilde W_{c} \nabla \sigma \left ({x}\right)+\nabla e_{c} \left ({x}\right)\| \\ &\quad +D_{u} \|x_{e}\| \\ &\le \frac {1}{2}\|R^{-1}\|_{F}D_{g} \mu +D_{u} \|x_{e}\| \tag{29}\end{align*}
where \mu =\|W_{cM} \sigma _{M} + e_{cM}\|
. Thus, we have \begin{align*} \dot L_{2}\le & -x^{\mathsf {T}}\left ({Q-\left ({D_{f}+\frac {1}{2}D_{g}}\right)I}\right)x+\frac {1}{4}R^{-2}D_{g}^{3} \mu ^{2} \\ &\quad +x_{e}^{\mathsf {T}}D_{g} D_{u}^{2} x_{e}-u^{\ast \mathsf {T}}\left ({R-\frac {1}{2}D_{g}}\right) u^{\ast } \\ &\quad +\|u^{\ast }\|\|2R-D_{g}I\|_{F}\left ({\frac {1}{2}R^{-1}D_{g} \mu + D_{u}\|x_{e}\|}\right) \\ \le & -x^{\mathsf {T}}\left ({Q-\left ({D_{f}+\frac {1}{2}D_{g}}\right)I}\right)x+\frac {1}{4}R^{-2}D_{g}^{3} \mu ^{2} \\ &\quad +x_{e}^{\mathsf {T}}D_{g} D_{u}^{2} x_{e}-u^{\ast \mathsf {T}}\left ({R-\frac {1}{2}D_{g}}\right) u^{\ast } \\ &\quad +\frac {1}{2}u^{\ast \mathsf {T}}D_{u}^{2} u^{\ast }+\frac {1}{2}\|2R-D_{g}I\|^{2}_{F} x_{e}^{\mathsf {T}} x_{e} \\ \le & -x^{\mathsf {T}}\left ({Q-\left ({D_{f}+\frac {1}{2}D_{g}}\right)I}\right)x \\ &\quad -u^{\ast \mathsf {T}}\left ({R-\frac {1}{2}\left ({D_{g}-D_{u}^{2}-D_{g}^{2}}\right)I}\right) u^{\ast } \\ &\quad +x_{e}^{\mathsf {T}}\left ({D_{g} D_{u}^{2}+\frac {1}{2}\|2R-D_{g}I\|^{2}_{F}}\right) x_{e} +\Delta \\ \le & -x^{\mathsf {T}}\left ({Q-\left ({D_{f}+\frac {1}{2}D_{g}}\right)I-N}\right)x-x^{\mathsf {T}}Nx \\ &\quad -u^{\ast \mathsf {T}}\left ({R-\frac {1}{2}\left ({D_{g}-D_{u}^{2}-D_{g}^{2}}\right)I}\right) u^{\ast } \\ &\quad +x_{e}^{\mathsf {T}}M x_{e} +\Delta \tag{30}\end{align*}
View Source
\begin{align*} \dot L_{2}\le & -x^{\mathsf {T}}\left ({Q-\left ({D_{f}+\frac {1}{2}D_{g}}\right)I}\right)x+\frac {1}{4}R^{-2}D_{g}^{3} \mu ^{2} \\ &\quad +x_{e}^{\mathsf {T}}D_{g} D_{u}^{2} x_{e}-u^{\ast \mathsf {T}}\left ({R-\frac {1}{2}D_{g}}\right) u^{\ast } \\ &\quad +\|u^{\ast }\|\|2R-D_{g}I\|_{F}\left ({\frac {1}{2}R^{-1}D_{g} \mu + D_{u}\|x_{e}\|}\right) \\ \le & -x^{\mathsf {T}}\left ({Q-\left ({D_{f}+\frac {1}{2}D_{g}}\right)I}\right)x+\frac {1}{4}R^{-2}D_{g}^{3} \mu ^{2} \\ &\quad +x_{e}^{\mathsf {T}}D_{g} D_{u}^{2} x_{e}-u^{\ast \mathsf {T}}\left ({R-\frac {1}{2}D_{g}}\right) u^{\ast } \\ &\quad +\frac {1}{2}u^{\ast \mathsf {T}}D_{u}^{2} u^{\ast }+\frac {1}{2}\|2R-D_{g}I\|^{2}_{F} x_{e}^{\mathsf {T}} x_{e} \\ \le & -x^{\mathsf {T}}\left ({Q-\left ({D_{f}+\frac {1}{2}D_{g}}\right)I}\right)x \\ &\quad -u^{\ast \mathsf {T}}\left ({R-\frac {1}{2}\left ({D_{g}-D_{u}^{2}-D_{g}^{2}}\right)I}\right) u^{\ast } \\ &\quad +x_{e}^{\mathsf {T}}\left ({D_{g} D_{u}^{2}+\frac {1}{2}\|2R-D_{g}I\|^{2}_{F}}\right) x_{e} +\Delta \\ \le & -x^{\mathsf {T}}\left ({Q-\left ({D_{f}+\frac {1}{2}D_{g}}\right)I-N}\right)x-x^{\mathsf {T}}Nx \\ &\quad -u^{\ast \mathsf {T}}\left ({R-\frac {1}{2}\left ({D_{g}-D_{u}^{2}-D_{g}^{2}}\right)I}\right) u^{\ast } \\ &\quad +x_{e}^{\mathsf {T}}M x_{e} +\Delta \tag{30}\end{align*}
where \Delta = (1/4)R^{-2}D_{g}^{3} \mu ^{2}+ (1/8)R^{-2}\mu ^{2}\|2R-D_{g}I\|^{2}_{F}
, N \in {\mathbb R}^{n \times n}
is a positive definite diagonal matrix, and M=(D_{g} D_{u}^{2}+ (1/2)\|2R-D_{g}I\|^{2}_{F})I
.
Noticing that x=x_{k}+x_{e}
, we have \begin{align*} x^{\mathsf {T}}Nx &= \left ({x_{k}+x_{e}}\right)^{\mathsf {T}}N\left ({x_{k}+x_{e}}\right) \\ &= x_{k}^{\mathsf {T}}Nx_{k}+x_{e}^{\mathsf {T}}Nx_{e}+2x_{k}^{\mathsf {T}}Nx_{e} \\ &= \left ({1-\epsilon ^{2}}\right)x_{k}^{\mathsf {T}}Nx_{k}+\left ({1-\frac {1}{\epsilon ^{2}}}\right)x_{e}^{\mathsf {T}}Nx_{e} \\ &\quad +\left ({\epsilon x_{k}+\frac {1}{\epsilon }x_{e}}\right)N\left ({\epsilon x_{k}+\frac {1}{\epsilon }x_{e}}\right) \\ &\ge \left ({1-\epsilon ^{2}}\right)x_{k}^{\mathsf {T}}Nx_{k}+\left ({1-\frac {1}{\epsilon ^{2}}}\right)x_{e}^{\mathsf {T}}Nx_{e}. \tag{31}\end{align*}
View Source
\begin{align*} x^{\mathsf {T}}Nx &= \left ({x_{k}+x_{e}}\right)^{\mathsf {T}}N\left ({x_{k}+x_{e}}\right) \\ &= x_{k}^{\mathsf {T}}Nx_{k}+x_{e}^{\mathsf {T}}Nx_{e}+2x_{k}^{\mathsf {T}}Nx_{e} \\ &= \left ({1-\epsilon ^{2}}\right)x_{k}^{\mathsf {T}}Nx_{k}+\left ({1-\frac {1}{\epsilon ^{2}}}\right)x_{e}^{\mathsf {T}}Nx_{e} \\ &\quad +\left ({\epsilon x_{k}+\frac {1}{\epsilon }x_{e}}\right)N\left ({\epsilon x_{k}+\frac {1}{\epsilon }x_{e}}\right) \\ &\ge \left ({1-\epsilon ^{2}}\right)x_{k}^{\mathsf {T}}Nx_{k}+\left ({1-\frac {1}{\epsilon ^{2}}}\right)x_{e}^{\mathsf {T}}Nx_{e}. \tag{31}\end{align*}
Thus, (30) becomes \begin{align*} \dot L_{2}&\le -x^{\mathsf {T}}Q_{0}x-x^{\mathsf {T}}Nx-u^{\ast \mathsf {T}}R_{0} u^{\ast }+\Delta \\ &\quad -\left ({1-\epsilon ^{2}}\right)x_{k}^{\mathsf {T}}Nx_{k}+x_{e}^{\mathsf {T}}\left ({\left ({1-\frac {1}{\epsilon ^{2}}}\right)N+M}\right)x_{e} \tag{32}\end{align*}
View Source
\begin{align*} \dot L_{2}&\le -x^{\mathsf {T}}Q_{0}x-x^{\mathsf {T}}Nx-u^{\ast \mathsf {T}}R_{0} u^{\ast }+\Delta \\ &\quad -\left ({1-\epsilon ^{2}}\right)x_{k}^{\mathsf {T}}Nx_{k}+x_{e}^{\mathsf {T}}\left ({\left ({1-\frac {1}{\epsilon ^{2}}}\right)N+M}\right)x_{e} \tag{32}\end{align*}
where Q_{0}=Q-(D_{f}+ (1/2)D_{g})I-N
and R_{0}=R- (1/2)(D_{g}-D_{u}^{2}-D_{g}^{2})I
. Considering the condition (25), we have \begin{align*} \dot L_{2}&\le -x^{\mathsf {T}}Q_{0}x-x^{\mathsf {T}}Nx-u^{\ast \mathsf {T}}R_{0} u^{\ast }+\Delta \\ &\le -\lambda _{\min } \left ({Q_{0}}\right)\| {x}\|^{2}-\lambda _{\min } \left ({R_{0}}\right)\| {u^{\ast }}\|^{2}+\Delta \\ &\le -\lambda _{\min } \left ({Q_{0}}\right)\| {x}\|^{2}+\Delta. \tag{33}\end{align*}
View Source
\begin{align*} \dot L_{2}&\le -x^{\mathsf {T}}Q_{0}x-x^{\mathsf {T}}Nx-u^{\ast \mathsf {T}}R_{0} u^{\ast }+\Delta \\ &\le -\lambda _{\min } \left ({Q_{0}}\right)\| {x}\|^{2}-\lambda _{\min } \left ({R_{0}}\right)\| {u^{\ast }}\|^{2}+\Delta \\ &\le -\lambda _{\min } \left ({Q_{0}}\right)\| {x}\|^{2}+\Delta. \tag{33}\end{align*}
We can conclude that \dot L_{2}\le 0
as long as Q_{0}
and R_{0}
are selected as positive definite matrices, i.e., Q>(D_{f}+ (1/2)D_{g})I+N
and R> (1/2)(D_{g}-D_{u}^{2}-D_{g}^{2})I
, as well as x
locates outside the following compact set:\begin{equation*} \Omega _{x}=\left \{{x \colon \|x\| \le \sqrt {\frac {\Delta }{\lambda _{\min }\left ({Q_{0}}\right)}} }\right \}.\end{equation*}
View Source
\begin{equation*} \Omega _{x}=\left \{{x \colon \|x\| \le \sqrt {\frac {\Delta }{\lambda _{\min }\left ({Q_{0}}\right)}} }\right \}.\end{equation*}
Thus, the closed-loop nonlinear system (1) is assured to be UUB under the present self-triggered approximate optimal neuro-control scheme. This ends the proof.
For Assumptions 1 and 2, we notice that practical systems are always modeled as continuous nonlinear differential equations. Thus, it is common that it has a unique equilibrium point x_{0}
. Furthermore, f(x)
and g(x)
denote the drift function and control input matrix in practical application systems, respectively. For example, f(x)
in robot systems consists of the inertia, Coriolis, and centripetal force, and g(x)
indicates the inertia, so it is feasible to assume them to be norm-bounded, and g(x)
cannot be zero. For Assumption 3, the ADP-based control strategy is developed based on policy iteration, which starts from a proper initial admissible control. It means that the initial admissible control can be regarded as a priori control. Meanwhile, we consider to design the control scheme in infinite horizon, so the Lipschitz continuity of control input u(x)
is satisfied. For Assumption 4, it is certain that the value of cost function is finite in practical systems, so its approximation cannot be infinite. Since \tilde W_{c}
, \nabla \sigma (x)
, and \nabla \varepsilon _{c}
denote the weight approximation error, and the partial gradients of activation function and NN approximation error, they can be ensured to be finite. Thus, Assumptions 3 and 4 can be fulfilled in practice. Moreover, these assumptions are commonly used in developing ADP-based control schemes [3], [7], [41].
D. Self-Triggered Mechanism
In this section, the inter-execution time \Delta _{k}=t_{k+1}-t_{k}, k=1,2,\ldots,\infty
, is determined as \Delta _{k} =\Phi (x_{k})
, which implies that inter-execution time is lower bounded with a positive scalar, that is to say, the self-triggered mechanism is admissible. Before the proof, we provide the definition of Zeno behavior as follows.
Definition 2 (Zeno Behavior):
The nonlinear system is Zeno if \begin{equation*} \lim _{k\to \infty } t_{k}=\sum _{k=0}^{\infty }\Delta _{k}=t_{\infty } < \infty \tag{34}\end{equation*}
View Source
\begin{equation*} \lim _{k\to \infty } t_{k}=\sum _{k=0}^{\infty }\Delta _{k}=t_{\infty } < \infty \tag{34}\end{equation*}
where \Delta _{k}
is defined as the k
th intersampling time and t_{\infty }
is called the Zeno time [42].
Theorem 3:
The self-triggered time sequence \{t_{k}\}
is determined by \begin{equation*} t_{k+1}=t_{k}+\frac {1}{\lambda } \ln \left ({\lambda \frac { \delta x^{\mathsf {T}}_{k}Nx_{k}}{\rho \left ({x_{k}}\right)} +1 }\right) \tag{35}\end{equation*}
View Source
\begin{equation*} t_{k+1}=t_{k}+\frac {1}{\lambda } \ln \left ({\lambda \frac { \delta x^{\mathsf {T}}_{k}Nx_{k}}{\rho \left ({x_{k}}\right)} +1 }\right) \tag{35}\end{equation*}
where \lambda = 2+ 2D_{f}
and \rho (x_{k})=(2D_{f}^{2}+ D_{g}^{2}D_{u}^{2})\lambda _{\max }(M)x_{k}^{\mathsf {T}}x_{k}
. Then, the self-triggering condition (35) is admissible for the nonlinear system (1). Furthermore, the inter-execution time is provided by \begin{equation*} \Delta _{k}=\frac {1}{\lambda } \ln \left ({\lambda \frac { \delta x^{\mathsf {T}}_{k}Nx_{k}}{\rho \left ({x_{k}}\right)} +1 }\right)>0. \tag{36}\end{equation*}
View Source
\begin{equation*} \Delta _{k}=\frac {1}{\lambda } \ln \left ({\lambda \frac { \delta x^{\mathsf {T}}_{k}Nx_{k}}{\rho \left ({x_{k}}\right)} +1 }\right)>0. \tag{36}\end{equation*}
Proof:
Choose a Lyapunov function candidate as L_{3}(t)=x_{e}^{\mathsf {T}}Mx_{e}, t \in [t_{k},t_{k+1})
. We have its time derivative as \begin{align*} {\dot L_{3}} &=2x_{e}^{\mathsf {T}}M\dot x_{e} \\ &=2x_{e}^{\mathsf {T}}M\left ({f\left ({x}\right)+g\left ({x}\right)\hat u_{k}}\right) \\ &\le 2x_{e}^{\mathsf {T}}Mx_{e}+ f^{\mathsf {T}}\left ({x}\right)Mf\left ({x}\right)+ \left ({g\left ({x}\right)\hat u_{k} }\right)^{\mathsf {T}}Mg\left ({x}\right)\hat u_{k} \\ &\le 2x_{e}^{\mathsf {T}}Mx_{e} + x^{\mathsf {T}}D_{f}^{2}M x + D_{g}^{2}D_{u}^{2}x_{k}^{\mathsf {T}}M x_{k} \\ &\le 2x_{e}^{\mathsf {T}}Mx_{e} + 2\left ({x_{k}^{\mathsf {T}}D_{f}^{2}M x_{k}+x_{e}^{\mathsf {T}}D_{f} M x_{e}}\right) \\ &\quad + D_{g}^{2}D_{u}^{2}x_{k}^{\mathsf {T}}M x_{k} \\ &=x_{e}^{\mathsf {T}}\left ({2+D_{f} }\right)M x_{e} +x_{k}^{\mathsf {T}}\left ({2D_{f}^{2}+ D_{g}^{2}D_{u}^{2}}\right)M x_{k} \\ &\le \lambda x_{e}^{\mathsf {T}}Mx_{e} + \rho \left ({x_{k}}\right). \tag{37}\end{align*}
View Source
\begin{align*} {\dot L_{3}} &=2x_{e}^{\mathsf {T}}M\dot x_{e} \\ &=2x_{e}^{\mathsf {T}}M\left ({f\left ({x}\right)+g\left ({x}\right)\hat u_{k}}\right) \\ &\le 2x_{e}^{\mathsf {T}}Mx_{e}+ f^{\mathsf {T}}\left ({x}\right)Mf\left ({x}\right)+ \left ({g\left ({x}\right)\hat u_{k} }\right)^{\mathsf {T}}Mg\left ({x}\right)\hat u_{k} \\ &\le 2x_{e}^{\mathsf {T}}Mx_{e} + x^{\mathsf {T}}D_{f}^{2}M x + D_{g}^{2}D_{u}^{2}x_{k}^{\mathsf {T}}M x_{k} \\ &\le 2x_{e}^{\mathsf {T}}Mx_{e} + 2\left ({x_{k}^{\mathsf {T}}D_{f}^{2}M x_{k}+x_{e}^{\mathsf {T}}D_{f} M x_{e}}\right) \\ &\quad + D_{g}^{2}D_{u}^{2}x_{k}^{\mathsf {T}}M x_{k} \\ &=x_{e}^{\mathsf {T}}\left ({2+D_{f} }\right)M x_{e} +x_{k}^{\mathsf {T}}\left ({2D_{f}^{2}+ D_{g}^{2}D_{u}^{2}}\right)M x_{k} \\ &\le \lambda x_{e}^{\mathsf {T}}Mx_{e} + \rho \left ({x_{k}}\right). \tag{37}\end{align*}
Thus, we have \begin{equation*} {\dot L_{3}} \le \lambda L_{3} +\rho \left ({x_{k}}\right). \tag{38}\end{equation*}
View Source
\begin{equation*} {\dot L_{3}} \le \lambda L_{3} +\rho \left ({x_{k}}\right). \tag{38}\end{equation*}
By using the comparison principle, it yields \begin{equation*} {L_{3}} \le \frac {\rho \left ({x_{k}}\right)}{\lambda } \left ({e^{\lambda \left ({t-t_{k}}\right)}-1 }\right), t \in \left [{t_{k}, t_{k+1}}\right). \tag{39}\end{equation*}
View Source
\begin{equation*} {L_{3}} \le \frac {\rho \left ({x_{k}}\right)}{\lambda } \left ({e^{\lambda \left ({t-t_{k}}\right)}-1 }\right), t \in \left [{t_{k}, t_{k+1}}\right). \tag{39}\end{equation*}
Based on the self-triggered mechanism shown in (35), one has \begin{equation*} \frac {\rho \left ({x_{k}}\right)}{\lambda } \left ({e^{\lambda \left ({t-t_{k}}\right)}-1 }\right) \le \delta x^{\mathsf {T}}_{k}Nx_{k}, t \in \left [{t_{k}, t_{k+1}}\right). \tag{40}\end{equation*}
View Source
\begin{equation*} \frac {\rho \left ({x_{k}}\right)}{\lambda } \left ({e^{\lambda \left ({t-t_{k}}\right)}-1 }\right) \le \delta x^{\mathsf {T}}_{k}Nx_{k}, t \in \left [{t_{k}, t_{k+1}}\right). \tag{40}\end{equation*}
According to (39) and (40), we can see that the nonlinear system (1) with the self-triggering condition (35) is ensured to be exponentially stable. Furthermore, it is obvious that \begin{equation*} \Delta _{k}=\frac {1}{\lambda } \ln \left ({\lambda \frac { \delta x^{\mathsf {T}}_{k}Nx_{k}}{\rho \left ({x_{k}}\right)} +1 }\right)>0.\end{equation*}
View Source
\begin{equation*} \Delta _{k}=\frac {1}{\lambda } \ln \left ({\lambda \frac { \delta x^{\mathsf {T}}_{k}Nx_{k}}{\rho \left ({x_{k}}\right)} +1 }\right)>0.\end{equation*}
This indicates that the inter-execution time between the current and the next predict time instant is larger than zero, which means that the Zeno behavior can be avoided. This ends the proof.
In summary, the ADP-based self-triggered approximate optimal neuro-control scheme is described in Algorithm 1.
Algorithm 1 ADP-Based Self-Triggered Approximate Optimal Neuro-Control
1:Initialization: Initialize parameters Q, R, \lambda, \delta, l_{c}, l_{\varepsilon}, N, \Gamma
, the terminal time of system operation T
, and the computation accuracy \xi >0
of cost function. Let p=0
, k=0
, t_{0}=0
, V^{(0)}(x_{0})=0
, and begin with admissible control policy u_{0} ^{(0)} (x_{0})
.
2:Policy evaluation: Let k\geq 0
, p=p+1
, based on the control policy u ^{(p)}_{k} (x_{k})
, solve the following nonlinear equation for u ^{(p)}_{k} (x_{k})
:\begin{equation*} U\left ({x_{k},u ^{\left ({p}\right)}_{k} }\right) + \nabla {V^{\left ({p}\right) \mathsf {T}}}\left ({x_{k}}\right)\left ({{f\left ({x_{k}}\right) + g\left ({x_{k}}\right)u ^{\left ({p}\right)}_{k}} }\right) = 0. \tag{41}\end{equation*}
View Source
\begin{equation*} U\left ({x_{k},u ^{\left ({p}\right)}_{k} }\right) + \nabla {V^{\left ({p}\right) \mathsf {T}}}\left ({x_{k}}\right)\left ({{f\left ({x_{k}}\right) + g\left ({x_{k}}\right)u ^{\left ({p}\right)}_{k}} }\right) = 0. \tag{41}\end{equation*}
3:Policy improvement: Update the control policy u^{(p)} _{k} (x_{k})
by \begin{equation*} u ^{\left ({p+1}\right)}_{k} \left ({x_{k} }\right)=-\frac {1}{2}R_{i}^{-1} g_{i}^{\mathsf {T}} \left ({x_{k}}\right)\nabla V^{\left ({p}\right)} \left ({x_{k} }\right). \tag{42}\end{equation*}
View Source
\begin{equation*} u ^{\left ({p+1}\right)}_{k} \left ({x_{k} }\right)=-\frac {1}{2}R_{i}^{-1} g_{i}^{\mathsf {T}} \left ({x_{k}}\right)\nabla V^{\left ({p}\right)} \left ({x_{k} }\right). \tag{42}\end{equation*}
4:If \| {V^{(p+1)} (x_{k})-V^{(p)} (x_{k})} \|\le \xi
, go to 5 and obtain the self-triggered approximated optimal neuro-control u_{k}
at time instant t_{k}
; else, return to 2.
5:Self-triggered mechasnism: Let k=k+1
, predict the next triggering time instant by \begin{equation*} t_{k+1}=t_{k}+\frac {1}{\lambda } \ln \left ({\lambda \frac { \delta x^{\mathsf {T}}_{k}Nx_{k}}{\rho \left ({x_{k}}\right)} +1 }\right). \tag{43}\end{equation*}
View Source
\begin{equation*} t_{k+1}=t_{k}+\frac {1}{\lambda } \ln \left ({\lambda \frac { \delta x^{\mathsf {T}}_{k}Nx_{k}}{\rho \left ({x_{k}}\right)} +1 }\right). \tag{43}\end{equation*}
6:If t_{k} \le T
, go to Step 2; else, stop.
E. Computational Requirements
Inspired by [43] and [44], the minimal computational requirements will be analyzed for the developed self-triggered approximate optimal neuro-control (24).
For the Veronese map \vartheta
from x=[x_{1},x_{2},\ldots,x_{n}]^{\mathsf {T}}\in {\mathbb R}^{n}
to the quadratic function x^{\mathsf {T}}x \in {\mathbb R}^{n \times n}
, we have \begin{equation*} \vartheta \colon =\left [{x_{1}^{2},x_{1}x_{2},\ldots,x_{1}x_{n},x_{2}^{2},x_{2}x_{3},\ldots,x_{n}^{2}}\right]^{\mathsf {T}}.\end{equation*}
View Source
\begin{equation*} \vartheta \colon =\left [{x_{1}^{2},x_{1}x_{2},\ldots,x_{1}x_{n},x_{2}^{2},x_{2}x_{3},\ldots,x_{n}^{2}}\right]^{\mathsf {T}}.\end{equation*}
Therefore, its computation has space complexity N(t_{s}) (n(n+1)/2)
, where t_{s}
is the discretization step, N(t_{s})= \lfloor (\Delta _{k\max }/t_{s}) \rfloor - \lfloor (\Delta _{k\min }/t_{s}) \rfloor
, \Delta _{k\max }
is the maximum time the system is allowed to run in the open loop, and \Delta _{k\min }
is the minimum time on guaranteeing the time cost for one step computation. For the time complexity, define t_{c}>0
as the time it takes to execute an instruction in a given digital platform, and the implementation requires N(t_{s}) (n(n+1)/2)t_{c}
multiplications and the same amount of additions in preprocessing step, as well as N(t_{s})t_{c}
for testing \dot L_{2} \le -\lambda _{\min } (Q_{0})\| x\|^{2} +\Delta
in the running step.
Now, turn to our concern. Noticing that the Lyapunov function candidate (26) consists of a quadratic function of x
and an optimal cost function V^{\ast }
(i.e., an integral of quadratic functions of x
and u
from the initial time t
to infinity). Thus, the implementation of the developed self-triggered control policy (24) has a space complexity as \begin{equation*} M_{s}=N\left ({t_{s}}\right)\left ({\frac {n\left ({n+1}\right)}{2}+\frac {m\left ({m+1}\right)}{2}+1}\right)\end{equation*}
View Source
\begin{equation*} M_{s}=N\left ({t_{s}}\right)\left ({\frac {n\left ({n+1}\right)}{2}+\frac {m\left ({m+1}\right)}{2}+1}\right)\end{equation*}
since it requires the space of N(t_{s}) (n(n+1)/2)
for computing x^{\mathsf {T}}x
, N(t_{s}) (m(m+1)/2)
for computing u^{\mathsf {T}}Ru
, and N(t_{s})
for storing the integral at t_{k}
. Meanwhile, it has a time complexity as \begin{equation*} M_{t}=\left ({2n\left ({n+1}\right)+m\left ({m+1}\right)+N\left ({t_{s}}\right)}\right)t_{c}\end{equation*}
View Source
\begin{equation*} M_{t}=\left ({2n\left ({n+1}\right)+m\left ({m+1}\right)+N\left ({t_{s}}\right)}\right)t_{c}\end{equation*}
since it requires preprocessing time as N(t_{s})n(n+1)t_{c}
for x^{\mathsf {T}}x
, N(t_{s})n(n+1)t_{c}
for x^{\mathsf {T}}Qx
and N(t_{s})m(m+1)t_{c}
for u^{\mathsf {T}}Qu
in the integral, and N(t_{s})t_{c}
for testing the inequality \dot L_{2} \le -\lambda _{\min } (Q_{0})\| x\|^{2} +\Delta
in the running step. It is worth pointing out that the minimum inter-execution time \Delta _{k\min }
is determined by M_{t}
, and the maximum inter-execution time \Delta _{k\max }
is determined by guaranteeing \dot L_{2} = -\lambda _{\min } (Q_{0})\| x\|^{2} + \Delta
.
By introducing the self-triggering mechanism, this article presents a novel ADP-based optimal neuro-control scheme for nonlinear systems. The superiority of this approach is emphasized as follows: 1) it reduces the computational and communication resources, as well as energy cost since the control policy (24) is updated aperiodically and 2) the updating time instants of control policy are predicted by the designed self-triggered mechanism (35). It implies that the hardware monitoring the full system state in the event-triggered control structure is no longer required.
SECTION IV.
Simulation Studies
In this section, we provide two simulation examples, including practical and numerical systems, to verify the present self-triggered approximate optimal neuro-control scheme via ADP (24) to be effective.
A. Example 1
A torsional pendulum bar system is employed with the dynamics expressed as \begin{align*}& \frac {\mathrm {d}\theta }{\mathrm {d}t}=\omega \\[-0.5em]{}\smash {\left \{{\vphantom {\begin{matrix}.\\.\\.\\.\\ \end{matrix}}}\right.}& \\[-0.5em]& \mathcal {J}\frac {\mathrm {d}\omega }{\mathrm {d}t}=u-Mgl\sin \theta -f_{d}\frac {\mathrm {d}\theta }{\mathrm {d}t} \tag{44}\end{align*}
View Source
\begin{align*}& \frac {\mathrm {d}\theta }{\mathrm {d}t}=\omega \\[-0.5em]{}\smash {\left \{{\vphantom {\begin{matrix}.\\.\\.\\.\\ \end{matrix}}}\right.}& \\[-0.5em]& \mathcal {J}\frac {\mathrm {d}\omega }{\mathrm {d}t}=u-Mgl\sin \theta -f_{d}\frac {\mathrm {d}\theta }{\mathrm {d}t} \tag{44}\end{align*}
where M=1/3\,\,\mathrm {kg}
and l=2/3\,\,\mathrm {m}
indicate the mass and length of the pendulum bar, respectively. Let the rotary inertia be \mathcal {J}=4/3 \mathrm {mL}^{2}
, the frictional factor be f_{d}=0.2
, and the gravitational acceleration be g=9.8\,\,\mathrm {m/s}^{2}
. Replacing the system state \theta
and \omega
by x_{1}
and x_{2}
, the torsional pendulum bar system is expressed in the state-space form as \begin{align*} \dot {x} = \left [{{\begin{array}{c} x_{2} \\ -\frac {Mgl}{\mathcal {J}}\sin x_{1}-\frac {f_{d}}{\mathcal {J}}x_{2} \end{array}}}\right]+\left [{{\begin{array}{c} 0 \\ \frac {1}{\mathcal {J}} \end{array}}}\right]u.\end{align*}
View Source
\begin{align*} \dot {x} = \left [{{\begin{array}{c} x_{2} \\ -\frac {Mgl}{\mathcal {J}}\sin x_{1}-\frac {f_{d}}{\mathcal {J}}x_{2} \end{array}}}\right]+\left [{{\begin{array}{c} 0 \\ \frac {1}{\mathcal {J}} \end{array}}}\right]u.\end{align*}
In the cost function, Q
and R
are selected as identity matrices with appropriate dimensions. Given the initial system state vector as x_{0}=[{1,-1}]^{\mathsf {T}}
, the activation function of critic NN is chosen as \sigma (x)=[x_{1}^{2}, x_{1}x_{2}, x_{2}^{2}]^{\mathsf {T}}
, and the weight vector is defined as \hat {W}_{c}=[\hat {W}_{c1}, \hat {W}_{c2}, \hat {W}_{c3}]^{\mathsf {T}}
with initial value \hat {W}_{c}^{0}=[{0.502, -0.489, 0.012}]^{\mathsf {T}}
. It is worth pointing out that their selection depends on the experience according to [40] and [41]. The learning rates in the nested updating policies for the critic NN are chosen as l_{c}=l_{\varepsilon} =0.4
and \Gamma =0.5
. In the self-triggered mechanism, based on the “trial and error,” we choose the parameter \lambda =0.1
and \delta =0.1,0.3,0.5,0.7,0.9,1
to test the sensitivity on the sampling frequency and show comparison results, respectively.
Taking \delta =0.7
as the representative for explaining the simulation results, simulation results are provided in Figs. 1–5 and Table I. Fig. 1 illustrates the evolution of the critic NN weights. We can see that they converge to [{0.986, -0.371, 0.407}]^{\mathsf {T}}
gradually. As illustrated in Fig. 2, by applying the developed self-triggered approximate optimal neuro-control policy computed by (24), the system states converge to zero after 29.2 s. Fig. 3 shows the self- and time-triggered control inputs. We can observe that the self-triggered approximate optimal neuro-control input is a piecewise continuous signal. It implies that the self-triggered approximate optimal neuro-control signal keeps unchanged during the time interval [t_{k}, t_{k+1}
) and is updated at t_{k}
only. Fig. 4 illustrates the comparison results for the numbers of acquired samples, and the time-triggered and the self-triggered control methods require 1000 and only 131 samples, respectively. It shows that the sampling frequency is greatly reduced. From Fig. 5, we find that the minimum inter-sampling time \Delta _{k}= 0.2
s, which implies that the Zeno behavior has not occurred. As a representative comparison, these figures also illustrate the control performance under the parameter chosen as \delta =0.1
. Together with Table I, where \Delta _{k\min }
and \Delta _{k\max }
denote the actual minimum and maximum inter-execution times, respectively, we can conclude that with the increase of \delta
, the number of required samples is decreased, and the system states take less settling time to approach the neighborhood of the equilibrium. It implies that we can choose a proper value according to the requirements of the transient response and resource cost.
B. Example 2
The overhead crane system, which transports loads from one place to another, plays an important role in industry. In contrast to Example 1, its dynamic model is more complicated and has a high order. The dynamic model of this system is formulated as that in [45], and the same parameters of the overhead crane plants are selected.
In the simulation, Q
and R
are also selected as identity matrices with appropriate dimensions. The initial system state vector as x_{0}=[{0.5,-0.5,0.8,-0.9}]^{\mathsf {T}}
. The activation function of critic NN is chosen as \sigma (x)=[x_{1}^{2}, x_{1}x_{2}, x_{1}x_{3}
, x_{1}x_{4}, x_{2}^{2}, x_{2}x_{3}, x_{2}x_{4}, x_{3}x_{4}, x_{4}^{2}]^{\mathsf {T}}
, and the weight vector is defined as \hat {W}_{c}=[\hat {W}_{c1}, \hat {W}_{c2}, \ldots, \hat {W}_{c9}]^{\mathsf {T}}
with initial value \hat {W}_{c}^{0}=[0.656,0.759,-0.892,-0.497,0.707,0.905
, -0.736,-0.896,0.092]^{\mathsf {T}}
. The parameters in the nested updating policies are chosen as l_{c}=1.2, l_{\varepsilon} =0.01
, and \Gamma =0.001
. The parameter in the self-triggered mechanism is chosen as \lambda =0.4
and \delta =0.7
.
Simulation results of Example 2 are presented in Figs. 6–10. Fig. 6 describes the convergence process of the critic NN weights, we can observe that \hat {W}_{c}
converges to [0.862,0.715,-0.889,-0.436,0.734,0.976,-0.798,-1.214, 0.013]^{\mathsf {T}}
. As displayed in Fig. 7, the system states gradually converge to the equilibrium point after 13 s. Fig. 8 describes a piecewise signal of the self-triggered control, which is updated at t_{k}
only, while the time-triggered control is a continuous signal. From Fig. 9, it shows that in contrast to the time-triggered control input, which requires 800 samples, the self-triggered one needs 320 samples only, which means that the sampling has been reduced by 60%. Fig. 10 shows that the minimum inter-sampling time \Delta _{k\min }= 0.05
s. Thus, we can declare that the developed self-triggered approximate optimal neuro-control scheme is effective to assure the closed-loop overhead crane system to be stable in a UUB manner.
A self-triggered approximate optimal neuro-control scheme is presented for nonlinear systems through ADP. By guaranteeing the asymptotic stability of the weight error dynamics, the critic NN is established to approximate the solution of the HJBE. Hereafter, the optimal neuro-control is indirectly derived in the ADP framework. By introducing the self-triggered mechanism, the updating time instants are predicted to determine the time updating the control policy. It is worth pointing out that a proper self-triggering condition, which predicts the next updating instant of the control policy, is designed to avoid the continuous monitoring of the system state by hardware devices in the event-triggered control approach, and the computation, the communication, and the energy consumption are decreased in an alternative way. Furthermore, the nested updating policies guarantee the asymptotic stability of the critic weight error dynamics, rather than UUB in most existing ADP-based optimal control methods.