Journals & Magazines >IEEE Access >Volume: 9

Saturated Output-Feedback Hybrid Reinforcement Learning Controller for Submersible Vehicles Guaranteeing Output Constraints

The submersible robot and the associated hybrid reinforcement learning control system.

Abstract:

In this brief, we propose a new neuro-fuzzy reinforcement learning-based control (NFRLC) structure that allows autonomous underwater vehicles (AUVs) to follow a desired t...Show More

Metadata

Abstract:

In this brief, we propose a new neuro-fuzzy reinforcement learning-based control (NFRLC) structure that allows autonomous underwater vehicles (AUVs) to follow a desired trajectory in large-scale complex environments precisely. The accurate tracking control problem is solved by a unique online NFRLC method designed based on actor-critic (AC) structure. Integrating the NFRLC framework including an adaptive multilayer neural network (MNN) and interval type-2 fuzzy neural network (IT2FNN) with a high-gain observer (HGO), a robust smart observer-based system is set up to estimate the velocities of the AUVs, unknown dynamic parameters containing unmodeled dynamics, nonlinearities, uncertainties and external disturbances. By employing a saturation function in the design procedure and transforming the input limitations into input saturation nonlinearities, the risk of the actuator saturation is effectively reduced together with nonlinear input saturation compensation by the NFRLC strategy. A predefined funnel-shaped performance function is designed to attain certain prescribed output performance. Finally, stability study reveals that the entire closed-loop system signals are semi-globally uniformly ultimately bounded (SGUUB) and can provide prescribed convergence rate for the tracking errors so that the tracking errors approach to the origin evolving inside the funnel-shaped performance bound at the prescribed time.

The submersible robot and the associated hybrid reinforcement learning control system.

Published in: IEEE Access ( Volume: 9)

Page(s): 136580 - 136592

Date of Publication: 15 September 2021

Electronic ISSN: 2169-3536

DOI: 10.1109/ACCESS.2021.3113080

Funding Agency:

Contents

CCBY - IEEE is not the copyright holder of this material. Please follow the instructions via https://creativecommons.org/licenses/by/4.0/ to obtain full-text articles and stipulations in the API documentation.

SECTION I.

Introduction

Robotics and autonomous system have found many applications is hazardous environments, especially when the area is inaccessible to the human or will impose tremendous danger to the operator to work on site. For example, significant investment has been made to develop next generation of dexterous manipulators for nuclear decommissioning applications [1], [2]. The use of quadcopter along with the manipulator helps to improve the situational awareness and autonomy of the manipulator when it interacts with the surrounding environment without human intervention [3], [4]. Another useful robot to operate in nuclear applications is autonomous underwater robots. They are predominantly used for inspection and monitoring of the nuclear wastes stored underwater inside the nuclear ponds. Currently, in most of the applications, a specific category of underwater robots, i.e. submersible remotely operated vehicles (ROVs) are utilised. The global offshore ROV industry is predicted to worth up to $\$ $ 3.5Bn with modern applications including civil and inshore inspections (CIIs) and decommissioning of oil and gas pipelines, nuclear storage facilities and accident sites, liquid storage tanks and tunnels [5]. Autonomous operation of such underwater vehicles introduces many challenges due to the nonlinear and uncertain behavior of the vehicle. For example, this includes but not limited to finite scape time, multiple equilibrium points, limit cycles, appearing new frequencies, bifurcation, chaos, unmodeled dynamics, uncertainties, etc. These problems have attracted a lot of interest in many real world robotic and mechanical systems such as AUVs, wheeled mobile robots, robot manipulators and aerial vehicles. Classical control approaches have not guaranteed tolerable results in tough operations of uncertain nonlinear dynamic systems. To this end, it is indispensable for automatic applications to design model-free intelligent controllers with strong robustness against uncertain nonlinearities to attain a high precision tracking performance facing complex system features, which are sustainable substitutions under these conditions. To reach appropriate model-free control aims, many interesting methods have been expanded for uncertain nonlinear systems via artificial neural networks (ANNs) and fuzzy logic systems (FLSs). For example, Elhaki & Shojaei have developed an output-feedback multi-layer neural network (MNN)-based controller for AUVs in the situation of complex nonlinearities and unmeasured states [6]. In [7], an adaptive IT2FNN was employed to estimate system’s nonlinear functions. The fuzzy set theory was firstly presented by Zadeh [8]. Since then, these systems are classified as type-1 fuzzy logic systems (T1FLSs) and they are widely used in many control applications. However, since T1FLSs are exact sets, they cannot take into account the uncertainty in the membership functions (MFs). Therefore, Zadeh established the idea of type-2 fuzzy logic systems (T2FLSs) in 1975 [9]. As a result, uncertainties have been more sufficiently handled by T2FLSs in control methods since the degree of MFs are themselves based on the fuzzy logic. On the other side, since T2FLS was computationally expensive, interval type-2 fuzzy logic systems (IT2FLSs) have been introduced by Liang and Mendel [10]. In [11], an IT2FLS was employed for compensating uncertainties in wireless sensor networks. But, the knowledge utilized to build rules in a FLS may be uncertain and inaccurate, which may result in the fact that rules have uncertain consequents/antecedents revealed in MFs of consequents/antecedents. To overcome this defect, many researches have fused IT2FLS and ANNs to create IT2FNNs. The IT2FNNs take the advantages of both interval type-2 fuzzy reasoning to prevail severe nonlinearities and the ability of ANNs to learn and discover the best action from the process [12]–[14]. The structure of IT2FNNs can enhance the accuracy of control systems and they are able to prevail over the limitations of classical controllers, which are parametric and structural uncertainties along with time-variant unmodeled dynamics that can lead to undesirable effects on the control systems.

Recently, a fuzzy optimal control scheme has been a topic of the research interest. In [15], the fuzzy optimal tracking control of hypersonic flight vehicles was addressed. The issue of the accelerated adaptive fuzzy optimal control of three coupled fractional-order chaotic electromechanical transducers was addressed in [16]. Reference [17] has proposed an adaptive fuzzy inverse optimal output feedback controller for vehicular active suspension systems. In [18], an optimal fuzzy adaptive robust controller for a lower limb exoskeleton robot system has been suggested. The adaptive fuzzy finite time control problem for a class of switched nonlinear systems has been discussed in [19], and in [20], a fuzzy swing up controller and the optimal state feedback stabilization for an inverted pendulum was debated. Lately, multi-layer neural network reinforcement learning (MNNRL) has been a topic of research interest for IT2FNNs and ANNs to improve the performance of the network [21]. The MNNRL is a combination of multi-layer neural networks (MNNs) and reinforcement learning (RL) to solve the problem of dimensional explosion in case the amount of state space increases that restricts the performance of RL [22]. The RL is an adaptive intelligent branch of machine learning. The philosophy behind RL is that the closed-loop system interacts with an approximator by determining three essential signals: the system’s state signals, which permits the controller to affect the system, the action signal, which is the output of the actor agent and influences the system, and the reinforcement signal, which is the output of the critic agent according to the evaluation of the actor’s performance that is an estimation of a cost function [23]. In [24], a RL-based controller has been developed for a flexible two-link manipulator, and in [25], an integral RL controller with input saturation for nonlinear systems was studied. However, these methods could not deal with the dimensionality problem. In contrast, references [26] and [27] have developed deep RL-based controllers for wheeled mobile robots, but these algorithms cannot guarantee the stability of the system and suffer from the input saturation. Reference [28] has proposed an adaptive RL controller for hypersonic vehicles, but saturation functions along with the compensation of the actuators saturation nonlinearity are not taken into account and the suggested controller needs the measurements of all system’s state derivatives. Here, from our opinion, some fundamental questions have arisen which are inspiring for this research work: How to design a saturated NFRLC-based method whose stability can be guaranteed via a strong Lyapunov stability analysis? Can we design a hybrid NFRLC-estimation-based framework including a MNN, which acts as the actor agent and can deal with the dimensionality problem, and an IT2FNN, which acts as the critic agent and can handle uncertain nonlinearities strongly in order to take the advantages of MNNs and IT2FNNs simultaneously? Finding answers for the aforesaid questions is one of the main contributions of this research paper.

Since unmeasured states may exist in some control structures due to the lack of sensors or the implementation cost, state feedback controllers may not be applied and a state observer should be employed in the control system. In this respect, some influential observer-based controllers have been developed such as dynamic event-triggered observer-based PID controller [29], cascade predictive observer [30], extended state observer [31], and high-gain observer (HGO) [32]. Yet, the most of cited control methods were designed to ensure the stability of the closed-loop control system by using Lyapunov stability principle that may fail to realize the desired tracking performance. In this context, the prescribed performance control (PPC) [33]–[37] is a good option to deal with this problem. This technique provides a nonlinear transformation from an unconstrained tracking problem to a prescribed constrained tracking problem, which is valid to cope with prescribed performance tracking problems and the tracking errors are also guaranteed to be within some prescribed performance bounds (PBs). In the last few years, many control methodologies have employed PPC method to improve the performance of the control systems. In [38], a PPC-based controller is designed for uncertain nonlinear systems with unknown control directions. A new data-driven model-free adaptive terminal sliding mode control for a class of discrete nonlinear systems with a prescribed performance has been designed in [39]. Reference [40] has developed a fuzzy wavelet neural controller with an improved prescribed performance for a micro-electromechanical system gyroscope. The problem of the adaptive neural output-feedback tracking control for a class of switched uncertain nonlinear systems with a prescribed performance is addressed in [41], and in [42], the problem of the finite-time prescribed performance adaptive fuzzy control for a class of unknown nonlinear systems was investigated.

The control strategy in this paper is to design a multi-objective model-free intelligent controller including AC-NFRLC, MNNs, IT2FNNs, saturation functions, PPC, and a state observer with a strong Lyapunov-based stability proof. Chiefly, main contributions of this study are listed as follows: (1) A novel hybrid framework for AC-NFRLC mechanism based on MNNs and IT2FNNs is designed. (2) The substructure of the proposed AC-NFRLC system inherits the advantages of MNNs, which is dealing with the dimensionality problem, and IT2FNNs, which are fuzzy reasoning and learning form the process. This fact leads to a strong AC-NFRLC structure for compensating uncertain nonlinearities. (3) The adaptive laws for the learning process are derived by the Lyapunov theory. (4) To deal with unmeasured states, a HGO is efficiently employed. (5) An adaptive robust controller is effectively fused with the proposed AC-NFRLC framework to compensate the effects of external disturbances and function approximation errors. (6) An efficient saturation function is utilized in the design procedure to bound the error signals and to reduce the actuator saturation risk. (7) The risk of actuator saturation is diminished by learning and compensating the actuator saturation nonlinearity via the proposed AC-NFRLC method. (8) To achieve a desired tracking performance with predefined transient and steady-state response characteristics, the PPC technique is employed. (9) The stability of the entire closed-loop control system is strongly proved by the Lyapunov theory. The integration of items (1)–(9) brings many challenges in designing the proposed controller and deriving mathematical equations. In this paper, we successfully coped with the control objectives presented in (1)–(9) along with a strong Lyapunov-based stability proof. To the best of authors’ knowledge, it is the first time that items (1)–(9) are considered in the context of submersible vehicles, leading to designing a novel saturated intelligent controller with the capability of handling external disturbances, dimensionality, and unknown nonlinearities in real-time with a prescribed performance without velocity measurements.

The arrangement of this paper is given by the following steps. In Section II, some preliminaries are given. The framework and design ideology of IT2FNNs along with MNNs and the design process of the proposed method are given in Section III. In Section IV, simulation results are provided to show the controller efficacy. Finally, the concluding statements are given in Section V.

SECTION II.

Preliminaries

A. Notations

In this paper, the following notations are used otherwise they will be indicated. $\mathrm {diag}[\bullet]$ is a diagonal matrix, $0_{n\times n}$ indicates a $n\times n$ matrix of zeros, $I_{n}$ denotes a $n\times n$ identity matrix, $\|\bullet \|_{F}$ stands for the Frobenius norm, $\|\bullet \|$ shows the Euclidean norm, $\mathrm {blkdiag}[\bullet]$ represents a block diagonal matrix, and $\mathrm {col}(A,B)$ is a column of a partitioned matrix with arbitrary sub-matrices $A$ and $B$ where $A$ and $B$ are placed in the top and bottom blocks, respectively.

B. System Description

Consider the following kinematic and dynamic motion equations of AUVs [43]:\begin{align*} \dot {q}_{1}=&J_{1}(q_{2})\nu _{1}, \\ M_{1}\dot {\nu }_{1}=&-C_{1}(\nu _{1})\nu _{2}-D_{1}\nu _{1}-u_{d1}(\nu _{1})+\tau _{s1}+\delta _{1}, \\ \dot {q}_{2}=&J_{2}(q_{2})\nu _{2}, \\ M_{2}\dot {\nu }_{2}=&-C_{1}(\nu _{1})\nu _{1}-C_{2}(\nu _{2})\nu _{2}-D_{2}\nu _{2}-u_{d2}(\nu _{2}) \\&-\,g_{1}(q_{2})+\tau _{s2}+\delta _{2},\tag{1}\end{align*} View Source where $q_{1}=[x,y,z]^{T}$ denotes the vector of AUV’s position, $q_{2}=[\phi,\theta,\psi]^{T}$ represents the orientation of the AUV, $\nu _{1}=[u,v,w]^{T}$ and $\nu _{2}=[p,q,r]^{T}$ are the velocity vectors, $\delta _{1}=[\delta _{x},\delta _{y},\delta _{z}]^{T}$ and $\delta _{2}=[\delta _{\phi },\delta _{\theta },\delta _{\psi }]^{T}$ denote the bounded external disturbance forces and torques. Eq. (1) is rewritten as \begin{align*} \begin{cases} \dot {q} =J_{t}(q)\nu,\\ M_{t}\dot {\nu } =-C_{t}(\nu)\nu -D_{t}\nu -g(q)-u_{d}(\nu)+\tau _{s}+\delta, \end{cases}\tag{2}\end{align*} View Source where $q=\big [q_{1}^{T},q_{2}^{T}\big]^{T}$ is the vector of positions and orientations, $J_{t}(q)=\mathrm {blkdiag}[J_{1},J_{2}]\in \Re ^{6\times 6}$ specifies a rotational matrix, $\nu =\big [\nu _{1}^{T},\nu _{2}^{T}\big]^{T}\in \Re ^{6}$ is the velocity vector, $M_{t}=\mathrm {blkdiag}[M_{1},M_{2}]\in \Re ^{6\times 6}$ stands for the inertia matrix that is positive-definite and symmetric, $C_{t}=\big [\mathrm {col}(0_{3\times 3},C_{1}),\mathrm {col}(C_{1},C_{2})\big]\in \Re ^{6\times 6}$ denotes the matrix of Coriolis and centripetal forces which is skew-symmetric, $D_{t}=\mathrm {blkdiag}[D_{1},D_{2}]\in \Re ^{6\times 6}$ shows the damping forces matrix, $g(q)=\big [0,0,0,g_{1}^{T}(q_{2})\big]^{T}\in \Re ^{6}$ is the gravitational forces and buoyancy moments, $u_{d}=\big [u_{d1}^{T},u_{d2}^{T}\big]^{T}\in \Re ^{6}$ is the vector of unmodeled dynamics and nonlinearities/uncertainties, $\tau _{s}=[\tau _{us},\tau _{vs},\tau _{ws},\tau _{ps},\tau _{qs},\tau _{rs}]^{T}\in \Re ^{6}$ denotes the saturated input control forces and torques, and $\delta =\big [\delta _{1}^{T},\delta _{2}^{T}\big]^{T}\in \Re ^{6}$ stands for the vector of bounded environmental disturbances. For more details on AUVs dynamics, readers are referred to [43]. The saturated input forces and torques are \begin{align*} \tau _{s}= \begin{cases} \tau _{\max },&r\tau \geq \tau _{\max }\\ r\tau,&-\tau _{\max }< r\tau < \tau _{\max }\\ -\tau _{\max },&r\tau \leq -\tau _{\max } \end{cases}\tag{3}\end{align*} View Source where $\tau _{s}$ , $\tau $ , and $\tau _{\max }$ denote the vectors of saturated inputs, non-saturated inputs, and the limit of the actuators, respectively, and $r$ is a ratio between $\tau $ and $\tau _{s}$ . The input saturation nonlinearity ${d_{\tau }}({\tau })={\tau _{s}}-{\tau }$ , that is a NLIP term and will be compensated in the control design procedure, is given by \begin{align*} {d_{\tau }}({\tau })= \begin{cases} \tau _{\max }-\tau,&r\tau \geq \tau _{\max }\\ (r-1)\tau,&-\tau _{\max }< r\tau < \tau _{\max }.\\ -\tau _{\max }-\tau,&r\tau \leq -\tau _{\max } \end{cases}\tag{4}\end{align*} View Source

C. Prescribed Performance Transformation

To obtain PPC objectives, errors (i.e. $e=q-q_{d}$ ) should evolve inside a funnel-shaped set during the time [44], [45]. A smooth limited function $\eta _{i}: \Re ^{+}\rightarrow \Re ^{+}, i=1,\ldots,6$ , is a PB, if $\eta _{i}$ is decreasing, and $\lim \limits_{t\rightarrow \infty } \eta _{i}(t)=\eta _{i\infty }$ [33]. The PPC is provided if $\eta _{li}(t)\leq {e_{i}}(t) \leq \eta _{ui}(t)$ be true where $e_{i}$ is $i$ th element of $e$ , $\eta _{li}$ and $\eta _{ui}$ are the pre-set limits of $e_{i}$ , and $\eta _{i}$ is defined as $\eta _{i}(t):=(\eta _{i0}-\eta _{i\infty })\exp (-a_{i}t)+\eta _{i\infty }$ where $a_{i}\in \Re ^{+}$ is a lower bound on the convergence rate, $\eta _{i0}>\eta _{i\infty }\in \Re ^{+}$ are design factors, $\eta _{i\infty }$ is an upper bound for the final steady-state error. The PBs can be set as $\eta _{li}=-\alpha _{i}\eta _{i},\;\eta _{ui}=\beta _{i}\eta _{i}$ where $\alpha _{i},\beta _{i}\in \Re ^{+}$ are tunable factors.

Assumption 1:

The errors meet $\eta _{li}(0)\leq {e_{i}}(0) \leq \eta _{ui}(0)$ .

The next nonlinear transformation is used here:\begin{equation*} \epsilon _{e}=\rho ^{-1}(e)=\big [\rho _{1}^{-1}(e_{1}),\ldots,\rho _{6}^{-1}(e_{6})\big]^{T}.\tag{5}\end{equation*} View Source which is invertible as $e=\rho (\epsilon _{e})$ .

Fact 1. Transformation (5) should meet the followings:\begin{align*} \begin{cases} \lim \limits _{{\epsilon _{e_{i}}}\to +\infty } {{\rho _{i}(\epsilon _{e_{i}})}}={\eta }_{ui},\\ \lim \limits _{{\epsilon _{e_{i}}}\to -\infty } {{\rho _{i}(\epsilon _{e_{i}})}}={\eta }_{li}, \end{cases}\tag{6}\end{align*} View Source where Fact 1 implies that tracking errors will settle inside the PBs and converge to a neighbourhood of the zero if Assumption 1 holds true and $\epsilon _{e}\in \mathcal {L}_\infty $ is ensured by a suitable controller. A candidate for transformation (5) that meets the properties of Fact 1 is \begin{equation*} e_{i}=\rho _{i}(\epsilon _{e_{i}})=\frac {\eta _{ui}-{\eta }_{li}}{\pi }\arctan (\epsilon _{e_{i}})+ \frac {\eta _{ui}+{\eta }_{li}}{2}.\tag{7}\end{equation*} View Source

From (7), the transformed errors $\epsilon _{e_{i}}(t)$ are \begin{equation*} \epsilon _{e_{i}}=\rho ^{-1}_{i}(e_{i})=\tan \left({\frac {\pi }{2}\times \frac {2{e_{i}}-{\eta }_{ui} - {\eta }_{li}}{\eta _{ui} - {\eta }_{li}}}\right).\tag{8}\end{equation*} View Source

Time derivation of (8) yields \begin{equation*} \dot {\epsilon }_{e_{i}}=\frac {\partial \epsilon _{e_{i}}}{\partial {e_{i}}}\dot {e}_{i}+\Psi _{i},\tag{9}\end{equation*} View Source where $\Psi _{i}=\frac {\partial \epsilon _{e_{i}}}{\partial {\eta }_{ui}} \dot {\eta }_{ui}+\frac {\partial \epsilon _{e_{i}}}{\partial {\eta }_{li}} \dot {\eta }_{li}$ . Utilizing (7) and (8) gives \begin{align*} \frac {\partial \epsilon _{e_{i}}}{\partial {e_{i}}}=&\frac {\pi }{\eta _{ui}-{\eta }_{li}} \sec ^{2}\left({\frac {\pi }{2}\times \frac {2{e_{i}}-{\eta }_{ui} - {\eta }_{li}}{\eta _{ui} - {\eta }_{li}}}\right)>0, \quad \tag{10}\\ \frac {\partial {e_{i}}}{\partial \epsilon _{e_{i}}}=&\frac {\eta _{ui}-{\eta }_{li}}{\pi \big (1+\epsilon _{e_{i}}^{2}\big)}>0,\tag{11}\end{align*} View Source which show a strictly increasing relation between $\epsilon _{e_{i}}$ and ${e_{i}}$ . Finally, using $\dot {e}=\dot {q}-\dot {q}_{d}$ , (2), and (9) gives \begin{equation*} \dot {\epsilon }_{e}=\mathfrak {D}\nu +\varkappa, \tag{12}\end{equation*} View Source where $\mathfrak {D}=TJ_{t}$ , $\varkappa (\epsilon _{e},\eta _{u},\eta _{l},\dot {\eta }_{u},\dot {\eta }_{l},q_{d},\dot {q}_{d})=\Psi -T\dot {q}_{d}$ , $T=\mathrm {diag}[{\partial \epsilon _{e_{1}}}/{\partial {e_{1}}},\ldots,{\partial \epsilon _{e_{6}}}/{\partial {e_{6}}}]$ , $\Psi =[\Psi _{1},\ldots,\Psi _{6}]^{T}$ , $\eta _{u}=[\eta _{u1},\ldots,\eta _{u6}]^{T}$ , $\eta _{l}=[\eta _{l1},\ldots,\eta _{l6}]^{T}$ , $\dot {\eta }_{u}=[\dot {\eta }_{u1},\ldots,\dot {\eta }_{u6}]^{T}$ , and $\dot {\eta }_{l}=[\dot {\eta }_{l1},\ldots,\dot {\eta }_{l6}]^{T}$ .

SECTION III.

NFRLC Methodology Design

In this section, a neuro-fuzzy reinforcement learning-based controller is designed for the accurate tracking of AUVs in which the actor agent is a multilayer neural network that can deal with the dimensionality problem, and the critic agent is an interval type-2 fuzzy neural network that can handle uncertainties strongly. By calculating velocity and acceleration from (12) and its derivative as $\nu =\mathfrak {D}^{-1}\dot {\epsilon }_{e}-\mathfrak {D}^{-1}\varkappa $ , $\dot {\nu }=\mathfrak {D}^{-1}\ddot {\epsilon }_{e}-\mathfrak {D}^{-1}\dot {\mathfrak {D}}\mathfrak {D}^{-1}\dot {\epsilon }_{e}+\mathfrak {D}^{-1}\dot {\mathfrak {D}}\mathfrak {D}^{-1}\varkappa -\mathfrak {D}^{-1}\dot {\varkappa }$ , substituting them into the second equation of (2), and the product of $\mathfrak {D}^{-T}$ in the resultant equation, one gets the following second-order error model in terms of unconstrained errors:\begin{equation*} M(\epsilon _{e})\ddot {\epsilon }_{e}+C(\epsilon _{e},\dot {\epsilon }_{e})\dot {\epsilon }_{e}+D(\epsilon _{e})\dot {\epsilon }_{e} -\varsigma =\mathfrak {D}^{-T}\tau +\delta _{p},\tag{13}\end{equation*} View Source where $M(\epsilon _{e})=\mathfrak {D}^{-T}M_{t}\mathfrak {D}^{-1}$ , $C(\epsilon _{e},\dot {\epsilon }_{e})=\mathfrak {D}^{-T}\big (C_{t}-M_{t}\mathfrak {D}^{-1}\dot {\mathfrak {D}}\big)\mathfrak {D}^{-1}$ , $D(\epsilon _{e})=\mathfrak {D}^{-T}D_{t}\mathfrak {D}^{-1}$ , $\delta _{p}=\mathfrak {D}^{-T}\delta $ where $|\delta _{p_{i}}|\leq B_{\delta _{i}},i=1,\ldots,6$ , and NLIP uncertain term $\varsigma $ is \begin{align*} \varsigma=&M(\epsilon _{e})\dot {\varkappa }+C(\epsilon _{e},\dot {\epsilon }_{e})\varkappa +\mathfrak {D}^{-T}C_{t}(\nu)\mathfrak {D}^{-1}\varkappa \\&+\,D(\epsilon _{e})\varkappa -\mathfrak {D}^{-T}g(q)\!-\!\mathfrak {D}^{-T}u_{d}(\nu)\!+\!\mathfrak {D}^{-T}d_{\tau }(\tau).\tag{14}\end{align*} View Source

Property 1:

Because $\mathfrak {D}$ is a full-rank matrix by recalling (10), the following items are valid for (14).

${M}({\epsilon }_{e})={M}^{T}({\epsilon }_{e})>0$ , $\lambda _{m}\|x\|^{2}\leq x^{T}{M}x\leq \lambda _{M}\|x\|^{2},\;\forall x\in \Re ^{6}$ where $0< \lambda _{m}< \lambda _{M}< \infty $ , and $\lambda _{m}:=\min \limits_{\forall {\epsilon }_{e}\in \Re ^{6}}\lambda _{\min }\big ({M}({\epsilon }_{e})\big)$ , $\lambda _{M}:=\max \limits_{\forall {\epsilon }_{e}\in \Re ^{6}}\,\,\lambda _{\max }\big (\,\,{M}({\epsilon }_{e})\big)$ .
${D}({\epsilon }_{e})={D}^{T}({\epsilon }_{e})>0$ where $\lambda _{d}\|x\|^{2}\leq x^{T}{D}x\leq \lambda _{D}\|x\|^{2},\;\forall x\in \Re ^{6}$ where $0< \lambda _{d}< \lambda _{D}< \infty $ , and $\lambda _{d}:=\min \limits_{\forall {\epsilon }_{e}\in \Re ^{6}}\lambda _{\min }\big ({D}({\epsilon }_{e})\big)$ , $\lambda _{D}:=\max \limits_{\forall {\epsilon }_{e}\in \Re ^{6}}\lambda _{\max }\big ({D}\,\,({\epsilon }_{e})\big)$ .
The matrix ${C}({\epsilon }_{e},\dot {\epsilon }_{e})$ has the following properties:
1. $x^{T}\big (\dot {M}({\epsilon }_{e})-2{C}({\epsilon }_{e},\dot {\epsilon }_{e})\big)x=0, \forall x\in \Re ^{6}$ ,
2. ${C}({\epsilon }_{e},x_{1})x_{2}={C}({\epsilon }_{e},x_{2})x_{1}$ ,
3. ${C}({\epsilon }_{e},x_{1}+x_{2})y={C}({\epsilon }_{e},x_{1})y+{C}({\epsilon }_{e},x_{2})y$ ,
4. $\|{C}({\epsilon }_{e},x_{1})x_{2}\|\leq B_{c}\|x_{1}\|\|x_{2}\|$ where $B_{c}\in \Re ^{+}$ .

Next, the following saturated filtered tracking error is introduced:

\begin{equation*} \varepsilon _{f}=\dot {\epsilon }_{e}+\Lambda _{\varepsilon }\frac {\epsilon _{e}}{\sqrt {1+\|\epsilon _{e}\|^{2}}},\tag{15}\end{equation*}

View Source

where

$\Lambda _{\varepsilon }$

is a gain matrix. Employing (13), (15), items (ii) and (iii) of P1.3 gives

\begin{align*}&\hspace {-1.7pc}M(\epsilon _{e})\dot {\varepsilon }_{f} \\=&-C(\epsilon _{e},\dot {\epsilon }_{e}){\varepsilon }_{f}-D(\epsilon _{e}){\varepsilon }_{f} +\varsigma +\gamma \\&+\,\mathfrak {D}^{-T}\tau +\delta _{p}, \tag{16}\\ \gamma=&M(\epsilon _{e})\Lambda _{\varepsilon }\dot {\epsilon }_{e} \big (1+\|\epsilon _{e}\|^{2}\big)^{-\frac {3}{2}}+D(\epsilon _{e})\Lambda _{\varepsilon }\epsilon _{e}\big (1+\|\epsilon _{e}\|^{2}\big)^{-\frac {1}{2}} \\&+\,C(\epsilon _{e},\dot {\epsilon }_{e})\Lambda _{\varepsilon }\epsilon _{e}\big (1+\|\epsilon _{e}\|^{2}\big)^{-\frac {1}{2}}, \tag{17}\end{align*}

View Source

where

$\gamma $

is bounded by using item (iv) of P1.3, P1.1, and P1.2 as

$\|\gamma \|\leq \iota _{1}\|x_{\varepsilon }\|+\iota _{2}\|x_{\varepsilon }\|^{2}$

where

$x_{\varepsilon }=\left[{\epsilon _{e}^{T}\big (1+\|\epsilon _{e}\|^{2}\big)^{-\frac {1}{2}},{\varepsilon }_{f}^{T}}\right]^{T}$

, and

$\iota _{1},\iota _{2}\in \Re ^{+}$

are unknown.

From a study of [46], MNNs can solve the effect of dimensionality curse (high dimensionality of the input space) and can be used to estimate unknown NLIP term $\varsigma (x)$ as \begin{equation*} \varsigma _{i}=\sum _{j=1}^{N_{h}}\left[{w_{ij}\bar {\sigma } \left({\sum _{k=1}^{N_{i}}v_{jk}x_{k}+\theta _{vj}}\right)+\theta _{wi}}\right],\tag{18}\end{equation*} View Source where $i=1,\ldots,N_{o}$ , $N_{h},N_{i}$ and $N_{o}$ are the number of hidden-layer, input-layer, and output-layer cells, respectively, $w_{ij}$ and $v_{jk}$ are the NN weights, $\theta _{vj}$ and $\theta _{wi}$ show threshold offsets, and $\bar {\sigma }(\xi)=1/\big (1+e^{-\xi }\big)$ is an activation function. Eq. (18) is rewritten as $\varsigma =W^{T}\sigma (V^{T}x)$ where $W^{T}\in \Re ^{N_{o}\times (N_{h}+1)}$ , $V^{T}\in \Re ^{N_{h}\times (N_{i}+1)}$ are the ideal ANN matrices and their first columns contain thresholds $\theta _{vj}$ and $\theta _{wi}$ , $x=\big [1,\epsilon _{e}^{T},\dot {\epsilon }_{e}^{T},\nu ^{T},\tau ^{T},\eta _{l}^{T},\dot {\eta }_{l}^{T}, \ddot {\eta }_{l}^{T},\eta _{u}^{T},\dot {\eta }_{u}^{T},\ddot {\eta }_{u}^{T},q^{T},\dot {q}_{d}^{T},\,\,\ddot {q}_{d}^{T}\big]^{T} \in \Re ^{N_{i}+1}$ , $\varsigma =[\varsigma _{1},\ldots,\varsigma _{N_{o}}]^{T}\in \Re ^{N_{o}}$ , and $\sigma (V^{T}x)=\big [1,\bar {\sigma }(V_{r_{1}}^{T}x),\ldots, \bar {\sigma }(V_{r_{N_{h}}}^{T}x)\big]^{T}\in \Re ^{N_{h}+1}$ where $V_{r_{j}}^{T}, j=1,\ldots,N_{h}$ is ${j}$ th row of $V^{T}$ . For the continuous function $\varsigma (x):\omega _{n}\rightarrow \Re ^{N_{o}}$ where $\omega _{n}\subset \Re ^{N_{i}+1}$ denotes a compact set, there exist ideal ANN weights, thresholds, and some numbers of hidden-layer cells such that $\varsigma (x)=W^{\ast T}\sigma (V^{\ast T}x)+e_{x}(x)$ where $e_{x}(x)\in \Re ^{N_{o}}$ is the ANN estimation error that is bounded on $\omega _{n}$ in a manner that $|e_{x_{i}}|\leq B_{x_{i}}, i=1,\ldots,N_{o}, \forall x\in \omega _{n}$ where $B_{x_{i}}\in \Re ^{+}$ . The ANN matrices $W^{\ast T}\in \Re ^{N_{o}\times (N_{h}+1)}$ , $V^{\ast T}\in \Re ^{N_{h}\times (N_{i}+1)}$ are given by \begin{equation*} (W^\ast,V^\ast):=\underset { (W,V)}{{\mathrm {arg\; \min }}}\bigg \{\sup _{ x\in \omega _{n}}\Big \|W^{ T}\sigma (V^{T} x)-\varsigma (x) \Big \|\bigg \}.\tag{19}\end{equation*} View Source

Assumption 2:

Matrices $W^\ast $ and $V^\ast $ are bounded on $\omega _{n}$ so that $\|W^\ast \|_{F}\leq B_{w}$ , $\|V^\ast \|_{F}\leq B_{v}$ where $B_{w},B_{v}\in \Re ^{+}$ .

However, matrices $W^\ast $ , $V^\ast $ and system derivatives are generally unknown and $\varsigma (x)$ is substituted by its approximation, i.e. $\hat {\varsigma }(\hat {x})=\hat {W}^{T}\sigma (\hat {V}^{T}\hat {x})$ , where $\hat {W}$ and $\hat {V}$ denote the estimated ANN matrices. Since it is assumed that system derivatives are immeasurable, a HGO is used based on the following lemma.

Lemma 1[47]:

Suppose that the transformed error, i.e. $\epsilon _{e}$ , up to its $n-1$ derivatives are bounded that is $\big \|\epsilon _{e}^{(k)}\big \|\leq B_{k}$ where $B_{k}\in \Re ^{+}$ , and the subsequent system is considered: \begin{align*} \hspace {-.3cm}k_{o}\dot {\ell }_{k}=&\ell _{(k+1)},\;k=1,\ldots,n-1, \\ \hspace {-.3cm}k_{o}\dot {\ell }_{n}=&-\lambda _{1}\ell _{n}\!-\!\lambda _{2}\ell _{(n-1)}-\cdots -\lambda _{(n-1)}\ell _{2}\!-\!\ell _{1}\!+\!\epsilon _{e},\tag{20}\end{align*} View Source where $k_{o}\in \Re ^{+}$ is a design parameter. The constants $\lambda _{1}$ to $\lambda _{(n-1)}$ are set to allow the polynomial $\aleph ^{n}+\lambda _{1}\aleph ^{n-1}+\cdots +\lambda _{(n-1)}\aleph +1$ be Hurwitz. Then, the following points are valid: $i$ ) ${\ell _{(j+1)}}/{k_{o}^{j}}-\epsilon _{e}^{(j)}=-k_{o}\varpi ^{(j+1)},\;j=0,1,\ldots,n-1$ where $\varpi =\ell _{n}+\lambda _{1}\ell _{(n-1)}+\cdots +\lambda _{(n-1)}\ell _{1}$ , and $\varpi ^{(j)}$ is $j$ th derivative of $\varpi $ . According to [47], $\ell _{(j+1)}/k_{o}^{j}$ has a tendency to $\epsilon _{e}^{(j)}$ asymptotically including a small error subjected to $\epsilon _{e}(t)$ and its $j$ th derivatives be bounded; $ii$ ) there exist $t_{1},G_{j}\in \Re ^{+}$ so that for $t>t_{1}$ , the fact $\big \|\varpi ^{(j)}\big \|\leq G_{j}$ is true.

Now, by employing Lemma 1, $\varepsilon _{f}$ can be estimated as $\hat {\varepsilon }_{f}=\dot {\hat {\epsilon }}_{e}+\Lambda _{\varepsilon } \epsilon _{e}\big (1+\|\epsilon _{e}\|^{2}\big)^{-\frac {1}{2}}={\ell _{2}}/{k_{o}}+\Lambda _{\varepsilon } \epsilon _{e}\big (1+\|\epsilon _{e}\|^{2}\big)^{-\frac {1}{2}}$ by \begin{align*} k_{o}\dot {\ell }_{1}=&\ell _{2}, \tag{21}\\ k_{o}\dot {\ell }_{2}=&-\lambda _{1}\ell _{2}-\ell _{1}+\epsilon _{e}(t). \tag{22}\end{align*} View Source

Now, by using item $(ii)$ of Lemma 1 and $\tilde {\varepsilon }_{f}=\hat {\varepsilon }_{f}-\varepsilon _{f}=\dot {\hat {\epsilon }}_{e}-\dot {\epsilon }_{e}$ , one has $\|\tilde {\varepsilon }_{f}\|=\|k_{o}\ddot {\varpi }\|\leq k_{o}{G_{2}}:=B_{\varepsilon }$ where $B_\varepsilon \in \Re ^{+}$ . Now, the error of ANN estimation is given by [48] \begin{align*}&\hspace {-0.5pc}\varsigma - \hat {\varsigma }=\tilde {W}^{T}\big (\sigma (\hat {V}^{T}\hat {x}) -{\sigma }^\prime (\hat {V}^{T}\hat {x})\hat {V}^{T}\hat {x}\big) \\&\qquad\qquad\qquad\quad+\,\hat {W}^{T}{\sigma }^\prime (\hat {V}^{T}\hat {x})\tilde {V}^{T}\hat {x}+r_{t}+e_{x}(x),\tag{23}\end{align*} View Source where ${\sigma }^\prime (\hat {V}^{T}{\hat {x}})=\big [0_{N_{h}\times 1},\mathrm {diag}[{\sigma }_{1}^\prime,\ldots,{\sigma }_{ N_{h}}^\prime]\big]^{T}\in \Re ^{(N_{h}+1)\times N_{h}}$ , with ${\sigma }_{i}^\prime ={d\bar {\sigma }(z)}/{dz}|_{ z=\hat {V}_{r_{i}}^{T}{\hat {x}}},\;i=1,\ldots,N_{h}$ , $\hat {x}=\big [1,\epsilon _{e}^{T},\dot {\hat {\epsilon }}_{e}^{T},\hat {\nu }^{T},\tau ^{T},\eta _{l}^{T},\dot {\eta }_{l}^{T}, \ddot {\eta }_{l}^{T},\eta _{u}^{T},\dot {\eta }_{u}^{T},\ddot {\eta }_{u}^{T},q^{T},\dot {q}_{d}^{T},\ddot {q}_{d}^{T}\big]^{T}$ , and $r_{t}$ in (23) is bounded as \begin{align*}&\hspace {-0.5pc}\|r_{t}\|\leq \|W\|_{F}\big (\|{\sigma }^\prime (\hat {V}^{T}\hat {x})\hat {V}^{T}\hat {x}\|+\|\sigma (\hat {V}^{T}\hat {x})\|\big) \\&\qquad\qquad\qquad\qquad\quad+\,\|V\|_{F}\|\hat {x}\|\|\hat {W}^{T}\sigma ^\prime (\hat {V}^{T}\hat {x})\|_{F}.\tag{24}\end{align*} View Source

To generate an efficient reinforcement signal by collecting the error metric signal for tuning the ANN system, an IT2FNN is used here [7], [49], [50]. The major part of fuzzy systems is the fuzzy rule base that can be expressed as IF-THEN rules. Presume that the fuzzy rule base includes of $N$ rules according to the following general rule:

$R_{j}\;(j=1,\ldots,N)$ : If $x_{1}$ is $\tilde {A}_{j1}$ and $\cdots $ and $x_{n}$ is $\tilde {A}_{jn}$ then $y_{1}$ is $\tilde {f}_{j1}$ and $\cdots $ and $y_{m}$ is $\tilde {f}_{jm}$ , where $x_{i}\;(i=1,\ldots,n)$ are the inputs of the fuzzy system, $y_{k}\;(k=1,\ldots,m)$ are the outputs of the fuzzy system, $\tilde {A}_{ji}=\big [\underline {\mu }_{ji}(x_{i})\; \overline {\mu }_{ji}(x_{i})\big]$ denotes the membership degree of $j$ th lower and upper MF for $i$ th input, $\tilde {f}_{jk}=[f_{jkL}\;f_{jkR}]$ is the interval consequent part of $k$ th output. The MF of $i$ th input is chosen as \begin{align*} \begin{cases} \underline {\mu }_{ji}(x_{i})=\exp \left({-0.5\left({\dfrac {x_{i}-m_{ji}}{\underline {\sigma }_{ji}}}\right)^{2}}\right),\\ \overline {\mu }_{ji}(x_{i})=\exp \left({-0.5\left({\dfrac {x_{i}-m_{ji}}{\overline {\sigma }_{ji}}}\right)^{2}}\right), \end{cases}\tag{25}\end{align*} View Source where $m_{ji}$ is mean, $\underline {\sigma }_{ji}$ and $\overline {\sigma }_{ji}$ denote the width of $j$ th lower and upper MF for $i$ th input. The output of the IT2FNN is \begin{equation*} y=0.5F^{T}\sigma _{f}(x),\tag{26}\end{equation*} View Source where $F^{T}=\big [F_{L}^{T}\;F_{R}^{T}\big]\in \Re ^{ m\times 2N}$ , $F_{L}=[f_{jkL}]\in \Re ^{N\times m}$ and $F_{R}=[f_{jkR}]\in \Re ^{N\times m}$ are the ideal weight matrices, $\sigma _{f}(x)=\big [\underline {\sigma }_{f}^{T}(x)\;\overline {\sigma }_{f}^{T}(x)\big]^{T}$ , $\underline {\sigma }_{f}(x)=\left[{\underline {\sigma }_{f_{1}}/\sum _{j=1}^{N} \underline {\sigma }_{f_{j}},\ldots,\underline {\sigma }_{f_{N}}/\sum _{j=1}^{N} \underline {\sigma }_{f_{j}}}\right]^{T}$ , $\overline {\sigma }_{f}(x)=\left[{\overline {\sigma }_{f_{1}}/\sum _{j=1}^{N} \overline {\sigma }_{f_{j}},\ldots,\overline {\sigma }_{f_{N}}/\sum _{j=1}^{N} \overline {\sigma }_{f_{j}}}\right]^{T}$ , $\underline {\sigma }_{f_{j}}=\Pi _{i=1}^{n}\underline {\mu }_{ji}(x_{i})$ , and $\overline {\sigma }_{f_{j}}=\Pi _{i=1}^{n}\overline {\mu }_{ji}(x_{i})$ . For a continuous function $g(x)$ , an optimal matrix $F$ exists so that \begin{equation*} F:=\underset { F}{{\mathrm {arg\; \min }}}\bigg \{\sup _{ x\in \omega _{f}}\Big \|0.5F^{T}\sigma _{f}(x)-g(x) \Big \|\bigg \},\tag{27}\end{equation*} View Source where $\omega _{f}\subset \Re ^{n}$ is a compact set.

Assumption 3:

$F$ is bounded on $\omega _{f}$ so that $\|F\|_{F}\leq B_{f}$ , where $B_{f}\in \Re ^{+}$ .

Now, the following critic-fuzzy-based cost function is considered [51]:\begin{equation*} S_{x}=\varepsilon _{f}+0.5\|\varepsilon _{f}\|F^{T}\sigma _{f}(x).\tag{28}\end{equation*} View Source

Since $F$ and some states of the system are unknown in general, the following adaptive critic fuzzy component is proposed here:\begin{equation*} \hat {S}_{x}=\hat {\varepsilon }_{f}+0.5\|\hat {\varepsilon }_{f}\|\hat {F}^{T}\sigma _{f}(\hat {x}),\tag{29}\end{equation*} View Source where $\hat {S}_{x}$ is used later to generate the ANN weight update rules that supports the ANN to do the estimation strategy more precisely online, and corrects the system’s output as well.

Remark 1:

Because the adaptive critic fuzzy component in (29) can be straightly seen as a form of reinforcement signal and $\hat {S}_{x}$ is more informative than the system states, an excellent control action can be resulted and a better control efficiency and performance can be obtained [52], [53].

Remark 2:

The definition objective for the critic cost function (29) is to reach an optimal control action. In other words, the action ANN ($\hat {W}^{T}\sigma (\hat {V}^{T}\hat {x})$ ) not only generates control signals to track the desired trajectory and cancel the unknown, uncertain, nonlinear dynamics but also minimises the cost function. At the same time, the critic agent ($0.5\hat {F}^{T}\sigma _{f}(\hat {x})$ ) estimates the cost function (29) and tunes the ANN weighs. Clearly, through an adaptive learning and minimisation of the adaptive critic IT2FNN-based cost function in (29), an optimal or near optimal control action will be resulted. In fact, the proposed controller, that is designed in the sequel, not only stabilises the closed-loop system but also minimises the cost function simultaneously [52], [54].

Now, take the following control rule into account:\begin{align*} \begin{cases} \tau =\mathfrak {D}^{T}\left({-K_{p}\dfrac {\ell _{2}}{k_{o}}\!-\!K_{p}\Lambda _{\varepsilon }\dfrac {\epsilon _{e}}{\sqrt {1\!+\!\|\epsilon _{e}\|^{2}}}-\hat {W}^{T}\sigma (\hat {V}^{T}\hat {x}) \!+\!\hbar }\right),\\ \hbar =-k_{1}\varrho _{1}\hat {\varepsilon }_{f}-k_{2}\varrho _{2}\hat {\varepsilon }_{f} -k_{3}\varrho _{3}\hat {\varepsilon }_{f}-k_{4}\varrho _{4}\hat {\varepsilon }_{f}-H\hat {p}, \end{cases}\!\!\!\!\!\!\!\!\!\!\! \\{}\tag{30}\end{align*} View Source where $K_{p}\in \Re ^{6\times 6}$ is a gain matrix, $k_{j},j=1,\ldots,4$ , are control parameters, $H:=\mathrm {diag}[\tanh (\hat {\varepsilon }_{f_{1}}/c_{1r}),\ldots,\tanh (\hat {\varepsilon }_{f_{6}}/c_{6r})]$ , $\hat {p}$ is an upper bound estimation of some unknown parameters that are defined in the sequel, and the auxiliary terms are \begin{align*} \begin{cases} \varrho _{1}=\big (\|{\sigma }^\prime (\hat {V}^{T}\hat {x})\hat {V}^{T}\hat {x}\|+\|\sigma (\hat {V}^{T}\hat {x})\|\big)^{2}\\ \;\;\;\;\;\;\;\;\times \|\sigma _{f}^{T}(\hat {x})\hat {F}\|^{2},\\ \varrho _{2}=\|\hat {x}\|^{2}\|\sigma _{f}^{T}(\hat {x})\hat {F}\hat {W}^{T}\sigma ^\prime (\hat {V}^{T}\hat {x})\|^{2},\\ \varrho _{3}=\|\sigma _{f}(\hat {x})\sigma ^{T}(\hat {V}^{T}{\hat {x}})\hat {W}\|_{F}^{2},\\ \varrho _{4}=\big (\|{\sigma }^\prime (\hat {V}^{T}\hat {x})\hat {V}^{T}\hat {x}\|+\|\sigma (\hat {V}^{T}\hat {x})\|\big)^{2}\\ \;\;\;\;\;\;\;\;+\|\hat {x}\|^{2}\|\hat {W}^{T}\sigma ^\prime (\hat {V}^{T}\hat {x})\|_{F}^{2}. \end{cases}\tag{31}\end{align*} View Source

Then, the adaptive laws are designed here:\begin{align*} \dot {\hat {W}}=&{\Gamma _{w}}\Big (\sigma (\hat {V}^{T}\hat {x})-\sigma ^\prime (\hat {V}^{T}\hat {x})\hat {V}^{T}\hat {x}\Big) \\&\times \unicode {0x0142},\Big ({\ell _{2}}/{k_{o}}+\Lambda _{\varepsilon } \epsilon _{e}\big (1+\|\epsilon _{e}\|^{2}\big)^{-\frac {1}{2}} \\&+\,0.5\|{\ell _{2}}/{k_{o}}+\Lambda _{\varepsilon } \epsilon _{e}\big (1+\|\epsilon _{e}\|^{2}\big)^{-\frac {1}{2}}\|\hat {F}^{T}\sigma _{f}(\hat {x})\Big)^{T} \\&-\,\delta _{w}{\Gamma _{w}}\hat {W}, \tag{32}\\ \dot {\hat {V}}=&\Gamma _{v}\hat {x}\Big ({\ell _{2}}/{k_{o}}+\Lambda _{\varepsilon } \epsilon _{e}\big (1+\|\epsilon _{e}\|^{2}\big)^{-\frac {1}{2}} \\&+\,0.5\|{\ell _{2}}/{k_{o}}+\Lambda _{\varepsilon } \epsilon _{e}\big (1+\|\epsilon _{e}\|^{2}\big)^{-\frac {1}{2}}\|\hat {F}^{T}\sigma _{f}(\hat {x})\Big)^{T} \\&\times \,\hat {W}^{T}\sigma ^\prime (\hat {V}^{T}\hat {x})-\delta _{v}\Gamma _{v}\hat {V}, \tag{33}\\ \dot {\hat {F}}=&\Gamma _{f}\Big \|{\ell _{2}}/{k_{o}}+\Lambda _{\varepsilon } \epsilon _{e}\big (1+\|\epsilon _{e}\|^{2}\big)^{-\frac {1}{2}}\Big \|\sigma _{f}(\hat {x}) \\&\times \,(\hat {W}^{T}\sigma \big (\hat {V}^{T}\hat {x})\big)^{T}-\delta _{f}\Gamma _{f}\hat {F}, \tag{34}\\ \dot {\hat {p}}=&Q\Big [H(\hat {\varepsilon }_{f})\left({{\ell _{2}}/{k_{o}}+\Lambda _{\varepsilon } \epsilon _{e}\big (1+\|\epsilon _{e}\|^{2}\big)^{-\frac {1}{2}}}\right) \\&-\,\Theta (\hat {p}-p^{0})\Big], \tag{35}\end{align*} View Source where $\Gamma _{w}\in \Re ^{(N_{h}+1)\times (N_{h}+1)}$ , $\Gamma _{v}\in \Re ^{(N_{i}+1)\times (N_{i}+1)}$ , $\Gamma _{f}\in \Re ^{2N\times 2N}$ and $Q,\Theta \in \Re ^{N_{o}\times N_{o}}$ are adaptive gains, $\delta _{w},\delta _{v},\delta _{f}\in \Re ^{+}$ are design parameters, and $p^{0}$ is a design vector. Replacing (30) into (16) and using (23) gives \begin{align*} M(\epsilon _{e})\dot {\varepsilon }_{f}=&-C(\epsilon _{e},\dot {\epsilon }_{e}){\varepsilon }_{f}-D(\epsilon _{e}){\varepsilon }_{f} +\gamma -K_{p}\hat {\varepsilon }_{f}+\chi \\&+\,\tilde {W}^{T}\big (\sigma (\hat {V}^{T}\hat {x}) -{\sigma }^\prime (\hat {V}^{T}\hat {x})\hat {V}^{T}\hat {x}\big) -H\hat {p} \\&+\,\hat {W}^{T}{\sigma }^\prime (\hat {V}^{T}\hat {x})\tilde {V}^{T}\hat {x}+r_{t}-k_{1}\varrho _{1}\hat {\varepsilon }_{f} \\&-\,k_{2}\varrho _{2}\hat {\varepsilon }_{f}-k_{3}\varrho _{3}\hat {\varepsilon }_{f}-k_{4}\varrho _{4}\hat {\varepsilon }_{f},\tag{36}\end{align*} View Source where $\chi =\delta _{p}+e_{x}(x)\in \Re ^{N_{o}}$ is bounded according to $|\chi _{i}|\leq p_{i}, i=1,\ldots,N_{o}$ , where $p_{i}\in \Re ^{+}$ and $p=[p_{1},\ldots,p_{N_{o}}]^{T}$ . Note that in the design procedure (for the example refer to (15)) and the proposed control law in (30), the saturation functions along with the compensation of the input saturation nonlinearity via the proposed hybrid RL method are utilized to handle the input constraint. Thus, the proposed controller generates input control signals with lowest possible amplitude. The schematic block diagram of the proposed control system is illustrated in Fig. 2. The equations of the system are given by (30)–(31) with the observer definition in (21) and (22) and adaptive laws (32)–(35).

FIGURE 1.

Three-dimensional configuration of a fully-actuated AUV.

Show All

FIGURE 2.

Representation of the proposed intelligent system.

Show All

A. Stability Analysis

Theorem 1:

Take into account the kinematic and dynamic of AUVs which are given by (2) and the transformed model (36). In accordance with Assumptions 1–3, the designed controller (30), adaptive rules (32)–(35) and state observer (21)–(22) with the reinforcement signal (29), ensure that all signals of the closed-loop system are bounded and the tracking errors are SGUUB which have a tendency toward the origin, and the pre-set output restrictions are never infringed.

Proof:

See the Appendix section.

SECTION IV.

Simulation Studies

In order to show the contributions of the proposed method, comparative simulations with two other controllers that are not RL-based controllers are performed which are the controller of Elhaki and Shojaei [6] and a non-prescribed performance output-feedback controller (NPPOFC) whose equations are given in [6]. Initial conditions of the AUVs are set as $q(0)=[10, 10, 10, 45\pi /180, 45\pi /180, 45\pi /180]^{T}$ . The control parameters are as follows: $\eta _{10}=\eta _{20}=\eta _{30}=2, \eta _{40}=\eta _{50}=\eta _{60}=1, \eta _{1\infty }=\eta _{2\infty }=\eta _{3\infty }=0.01, \eta _{4\infty }=\eta _{5\infty }=\eta _{6\infty }=0.02, a_{1}=a_{2}=a_{3}=0.055, a_{4}=a_{5}=a_{6}=0.1, \alpha _{1}=\alpha _{2}=5, \alpha _{3}=7, \alpha _{4}=\alpha _{5}=\alpha _{6}=1, \beta _{1}=\beta _{2}=5, \beta _{3}=7, \beta _{4}=\beta _{5}=\beta _{6}=1.2, \Lambda _{\varepsilon }=I_{6},k_{o}=\lambda _{1}=10,\bar {\sigma }=1,\underline {\sigma }=0.1, m_{1}=-10,m_{2}=-6,m_{3}=-2,m_{4}=2,m_{5}=6,m_{6}=10, N=N_{h}=6,\Gamma _{w}=0.01I_{N_{h}},\delta _{w}=1,\Gamma _{v}=2I_{N_{i}+1},\delta _{v}=0.1, \Gamma _{f}=I_{2N},\delta _{f}=0.01,Q=2\mathrm {diag}[{0.1,0.1,0.1,1,1,1}], \Theta =0.1\mathrm {diag}[{3,1.5,10,1,1,1}],\,\,p^{0}=[{1.7,1.7,1.7,1.7,1.7,1.7}]^{T}, c_{1r}=\cdots =c_{6r}=0.5,k_{1}=\cdots =k_{4}=1.5, K_{p}=8\mathrm {diag}[{1,1,1,1,1,1}]$ .

A. Scenario 1

In this scenario, Fig. 4 displays the results for the proposed method along with the comparisons. As it can be seen in Fig. 4(a), all control methods forced the AUVs to track the desired trajectory, but the AUV equipped with the proposed controller achieved a softer tracking than the controller in [6] and NPPOFC. Figs. 4(c)-4(h) show that the tracking errors of the proposed method and controller Elhaki and Shojaei [6] are easily kept inside the predefined funnel-shaped set without any concern about gain tuning. However, the tracking performance of NPPOFC seriously depends on the exact selection of control parameters and a random choice of the control parameters cannot ensure a good tracking performance. Moreover, the control signals that are shown in Figs. 4(i)-4(j) demonstrate a noticeable superiority for the proposed controller than the other two controllers due to the use of saturation functions in the design procedure and compensating the actuators saturation nonlinearity by the hybrid RL method, and the output patterns of the controller Elhaki and Shojaei [6] and NPPOFC are pretty fluctuated and saturated. This fact makes a compromise between the feasibility of the generated control signals by the proposed controller in Figs. 4(i)-4(j) and the convergence behaviour of the tracking errors in Figs. 4(c)-4(d). However, since the proposed control scheme is designed based on the PPC method, it never breaches the performance bounds. Moreover, in the proposed PPC control framework the transient performance could be improved by adjusting the PPC parameters. More simulation results on the improvement of the convergence behavior of the tracking errors are given in the next scenario. As reflected by the simulations, the amelioration in the outcomes are obviously notable which approve the use of the proposed novel NFRLC method in practice. Indeed, the strong estimation and compensation capability of the proposed estimation policy for uncertainties and nonlinearities, which is a result of the fusion between an adaptive MNN, adaptive IT2FNN, and the saturation functions, with a strong stability analysis led to generate substantially smooth and suitable output signals for the proposed controller in both transient and steady-state behaviours without the need for velocity measurements. Also, the upper bound of nonparametric uncertainties are estimated well in Fig. 4(b) and Frobenius norms of MNN matrices are depicted in Fig. 4(k). The RL signals are shown in Figs. 4(l)-4(m) and the time evolution of MNN weights are shown by Fig. 4(n) and Fig. 4(o). Fig. 4(p) presents the HGO estimation error, from which it can be realized that the designed HGO is able to present estimations of the velocities adequately well. In this simulation, the CPU elapsed time for the proposed controller, the controller of Elhaki and Shojaei [6], and NPPOFC are 62, 46, and 15 ms, respectively, in a computer with a 2.4 GHz intel Core i5 processor. This result shows that the proposed hybrid RL controller has a little more computational cost with respect to the controller in [6] and NPPOFC at the cost of generating considerably better control action.

FIGURE 3.

Representation of an IT2FNN MF with uncertain standard deviation.

Show All

$FIGURE 4. - Simulation results for Scenario 1: (a) Trajectories of the AUVs in xyz space, (b) an upper bound approximation of unknown vector $p$ ; comparison results for the tracking errors with performance bounds: (c) $x_{e}$ , (d) $y_{e}$ , (e) $z_{e}$ , (f) $\phi _{e}$ , (g) $\theta _{e}$ , (h) $\psi _{e}$ ; comparison results for the control signals: (i) input forces, (j) input torques, (k) the Frobenius norms of approximated NN weighting matrices, (l) linear reinforcement signals, (m) angular reinforcement signals; the growth of (n) $\hat {W}$ and (o) $\hat {V}$ weights, and (p) HGO estimation errors.$

FIGURE 4.

Simulation results for Scenario 1: (a) Trajectories of the AUVs in xyz space, (b) an upper bound approximation of unknown vector $p$ ; comparison results for the tracking errors with performance bounds: (c) $x_{e}$ , (d) $y_{e}$ , (e) $z_{e}$ , (f) $\phi _{e}$ , (g) $\theta _{e}$ , (h) $\psi _{e}$ ; comparison results for the control signals: (i) input forces, (j) input torques, (k) the Frobenius norms of approximated NN weighting matrices, (l) linear reinforcement signals, (m) angular reinforcement signals; the growth of (n) $\hat {W}$ and (o) $\hat {V}$ weights, and (p) HGO estimation errors.

Show All

B. Scenario 2

In order to improve the trajectory tracking performance of the proposed controller by gain tuning, another simulation is performed here. The adjusted control parameters are $a_{1}=a_{2}=a_{3}=0.1,\Lambda _{\varepsilon }= \mathrm {diag}[{12,12,12,1,1,1}]$ and other control parameters are similar to Scenario 1. Fig. 5(a) shows that the AUV equipped with the proposed controller converged considerably faster to the desired trajectory than other AUVs. As it can be seen from Figs.5(c)-(e), the convergence behavior of the tracking errors significantly improved which is the result of this proper gain tuning. Hence, the advantage of the proposed controller in both transient and steady-state behaviors of the tracking errors is crystal-clear in Figs. 5(c)-(h). However, as shown by Figs. 5(i)-(j), this improvement of the convergence for the proposed controller demands more control action than Scenario 1, but the generated control signals of the proposed controller are still more feasible than other two controllers. The other bounded signals of the suggested closed-loop system are also depicted in Figs. 5(b), (k)-(p). To put it in a nutshell, this scenario is considered to show that users may tune the convergence behavior of the closed-loop control system and can make a nice trade-off between the feasibility of the control action and convergence properties by a careful gain adjustment. More simulation results on the improvement of the convergence behavior of the tracking errors with re-tuning of PPC parameters show the better performance of our controller. Nonetheless, they are omitted here and left to the interested readers.

$FIGURE 5. - Simulation results for Scenario 2: (a) Trajectories of the AUVs in xyz space, (b) an upper bound approximation of unknown vector $p$ ; comparison results for the tracking errors with performance bounds: (c) $x_{e}$ , (d) $y_{e}$ , (e) $z_{e}$ , (f) $\phi _{e}$ , (g) $\theta _{e}$ , (h) $\psi _{e}$ ; comparison results for the control signals: (i) input forces, (j) input torques, (k) the Frobenius norms of approximated NN weighting matrices, (l) linear reinforcement signals, (m) angular reinforcement signals; the growth of (n) $\hat {W}$ and (o) $\hat {V}$ weights, and (p) HGO estimation errors.$

FIGURE 5.

Simulation results for Scenario 2: (a) Trajectories of the AUVs in xyz space, (b) an upper bound approximation of unknown vector $p$ ; comparison results for the tracking errors with performance bounds: (c) $x_{e}$ , (d) $y_{e}$ , (e) $z_{e}$ , (f) $\phi _{e}$ , (g) $\theta _{e}$ , (h) $\psi _{e}$ ; comparison results for the control signals: (i) input forces, (j) input torques, (k) the Frobenius norms of approximated NN weighting matrices, (l) linear reinforcement signals, (m) angular reinforcement signals; the growth of (n) $\hat {W}$ and (o) $\hat {V}$ weights, and (p) HGO estimation errors.

Show All

SECTION V.

Conclusion

In this paper, an intelligent saturated output feedback framework has been proposed for fully-actuated AUVs subjected to the unmodeled dynamics, actuator saturation and ocean disturbances. An adaptive robust controller has been used to reject exogenous disturbances. By designing an adaptive critic IT2FNN agent and an adaptive MNN, an adaptive hybrid NFRLC method was introduced to cope with dimensionality problem and to compensate every type of NLIP uncertainties and unmodeled dynamics. The suggested control model was independent of velocity sensors by means of a HGO, and the output constraints have been guaranteed via the PPC method. In conjunction with adaptive NFRLC and output feedback problems, a robust output feedback-based AC-NFRLC strategy was designed that did not engage with the system dynamics. In addition, by applying a saturation function into the design procedure along with the learning and compensating the actuators nonlinearity deeply, an effective saturated controller was developed. Moreover, by performing the Lyapunov stability analysis, it was proven that the tracking errors are SGUUB and all signals of the closed-loop control system were bounded and the tracking errors were also forced to evolve inside a funnel-shaped set to guarantee a prescribed performance. The performance of the proposed control structure could also be improved by employing more advanced control methods. For example, guaranteeing small overshoots and unknown initial errors are the subjects left for the future studies.

Appendix A
Proof of Theorem 1

Proof:

The following Lyapunov function is taken into account:\begin{align*} E=&\sqrt {(1+\epsilon _{e}^{T}\epsilon _{e})}-1+0.5 \varepsilon _{f}^{T}M(\epsilon _{e})\varepsilon _{f}+0.5\tilde {p}^{T}Q^{-1}\tilde {p} \\&+\,0.5tr\big \{\tilde {W}^{T}\Gamma _{w}^{-1} \tilde {W}\big \}+0.5tr\big \{\tilde {V}^{T}\Gamma _{v}^{-1}\tilde {V}\big \} \\&+\,0.5tr\big \{\tilde {F}^{T}\Gamma _{f}^{-1}\tilde {F}\big \},\tag{37}\end{align*} View Source where $\tilde {W}=W^{*}-\hat {W},\tilde {V}=V^{*}-\hat {V}, \tilde {F}=F^{*}-\hat {F}$ , and $\tilde {p}=\hat {p}-{p}$ . Eq. (37) is bounded by \begin{equation*} \lambda _{x_{\varepsilon }}\|x_{\varepsilon }\|^{2}\leq \lambda _{l}\|z_{l}\|^{2}\leq E(t)\leq \lambda _{u}\|z_{u}\|^{2},\tag{38}\end{equation*} View Source where $x_{\varepsilon }=\left[{\epsilon _{e}^{T}(1+\|\epsilon _{e}\|^{2})^{-\frac {1}{2}},{\varepsilon }_{f}^{T}}\right]^{T}$ and \begin{align*} \lambda _{x_{\varepsilon }}=&0.5\min \{1,\lambda _{m}\}, \\ \lambda _{l}=&0.5\min \big \{1,\lambda _{m},\lambda _{\min }\{\Gamma _{w}^{-1}\}, \lambda _{\min }\{\Gamma _{v}^{-1}\}, \\&\lambda _{\min }\{\Gamma _{F}^{-1}\}, \lambda _{\min }\{Q^{-1}\}\big \}, \\ \lambda _{u}=&0.5\max \big \{1,\lambda _{M},\lambda _{\max }\{\Gamma _{w}^{-1}\},\lambda _{\max }\{\Gamma _{v}^{-1}\}, \\&\lambda _{\max }\{\Gamma _{f}^{-1}\},\lambda _{\max }\{Q^{-1}\}\big \}, \\ {z_{l}}=&\big [x_{\varepsilon }^{T},\tilde {w}_{11},\ldots,\tilde {w}_{(N_{h}+1)N_{o}},\tilde {v}_{11},\ldots,\tilde {v}_{(N_{i}+1)N_{h}}, \\&\tilde {f}_{11},\ldots,\tilde {f}_{2N\times m},\tilde {p}^{T}\big]^{T}, \\ {z_{u}}=&\big [\epsilon _{e}^{T},\varepsilon _{f}^{T},\tilde {w}_{11},\ldots,\tilde {w}_{(N_{h}+1)N_{o}},\tilde {v}_{11},\ldots,\tilde {v}_{(N_{i}+1)N_{h}}, \\&\tilde {f}_{11},\ldots,\tilde {f}_{2N\times m},\tilde {p}^{T}\big]^{T}.\end{align*} View Source

The derivative of (37) by using (36), item (i) of P1.3 and $\tilde {\varepsilon }_{f}=\hat {\varepsilon }_{f}-\varepsilon _{f}$ gives \begin{align*} \dot {E}=&\big (1+\|\epsilon _{e}\|^{2}\big)^{-\frac {1}{2}}\epsilon _{e}^{T}\dot {\epsilon }_{e}+\frac {1}{2}\varepsilon _{f}^{T}\dot {M}(\epsilon _{e})\varepsilon _{f}+\varepsilon _{f}^{T}M(\epsilon _{e})\dot {\varepsilon }_{f} \\&-\,tr\big \{\tilde {W}^{T}\Gamma _{w}^{-1}\dot {\hat W}\big \}-tr\big \{\tilde {V}^{T}\Gamma _{v}^{-1}\dot {\hat V}\big \}-tr\big \{\tilde {F}^{T}\Gamma _{f}^{-1}\dot {\hat F}\big \} \\&+\,\tilde {p}^{T}Q^{-1}\dot {\hat p} \\=&-\big (1+\|\epsilon _{e}\|^{2}\big)^{-\frac {1}{2}}\epsilon _{e}^{T}\Lambda _{\varepsilon }\epsilon _{e}\big (1+\|\epsilon _{e}\|^{2}\big)^{-\frac {1}{2}} \\&+\,\big (1+\|\epsilon _{e}\|^{2}\big)^{-\frac {1}{2}}\epsilon _{e}^{T}\varepsilon _{f}-\varepsilon _{f}^{T}D(\epsilon _{e}){\varepsilon }_{f}+\varepsilon _{f}^{T}\gamma \\&+\,\varepsilon _{f}^{T}(\chi -H\hat {p})+\varepsilon _{f}^{T}\tilde {W}^{T}\big (\sigma (\hat {V}^{T}\hat {x})-{\sigma }^\prime (\hat {V}^{T}\hat {x})\hat {V}^{T}\hat {x}\big) \\&+\,\varepsilon _{f}^{T}\hat {W}^{T}{\sigma }^\prime (\hat {V}^{T}\hat {x})\tilde {V}^{T}\hat {x}+\varepsilon _{f}^{T}r_{t}-\varepsilon _{f}^{T}\sum _{j=1}^{4}k_{j}\varrho _{j}\hat {\varepsilon }_{f} \\&-\,tr\big \{\tilde {W}^{T}\Gamma _{w}^{-1}\dot {\hat W}\big \}-tr\big \{\tilde {V}^{T}\Gamma _{v}^{-1}\dot {\hat V}\big \}-tr\big \{\tilde {F}^{T}\Gamma _{f}^{-1}\dot {\hat F}\big \} \\&+\,\tilde {p}^{T}Q^{-1}\dot {\hat p}-\varepsilon _{f}^{T}K_{p}\varepsilon _{f}-\varepsilon _{f}^{T}K_{p}\tilde {\varepsilon }_{f}.\tag{39}\end{align*} View Source

Using update rules (32)–(35) yields \begin{align*} \dot {E}=&-\big (1+\|\epsilon _{e}\|^{2}\big)^{-\frac {1}{2}}\epsilon _{e}^{T}\Lambda _{\varepsilon }\epsilon _{e}\big (1+\|\epsilon _{e}\|^{2}\big)^{-\frac {1}{2}} \\&+\,\big (1+\|\epsilon _{e}\|^{2}\big)^{-\frac {1}{2}}\epsilon _{e}^{T}\varepsilon _{f}-\varepsilon _{f}^{T}D(\epsilon _{e}){\varepsilon }_{f}+\varepsilon _{f}^{T}\gamma \\&-\,tr\big \{\tilde {W}^{T}\big (\sigma (\hat {V}^{T}\hat {x})-{\sigma }^\prime (\hat {V}^{T}\hat {x})\hat {V}^{T}\hat {x}\big)\tilde {\varepsilon }_{f}^{T}\big \}+\varepsilon _{f}^{T}r_{t} \\&-\,\frac {1}{2}tr\big \{\tilde {W}^{T}\big (\sigma (\hat {V}^{T}\hat {x})-{\sigma }^\prime (\hat {V}^{T}\hat {x})\hat {V}^{T}\hat {x}\big)\|\hat {\varepsilon }_{f}\|\sigma _{f}^{T}(\hat {x})\hat {F}\big \} \\&-\,tr\big \{\tilde {V}^{T}\hat {x}\tilde {\varepsilon }_{f}^{T}\hat {W}^{T}\sigma ^\prime (\hat {V}^{T}\hat {x})\big \}-\varepsilon _{f}^{T}K_{p}\varepsilon _{f}-\varepsilon _{f}^{T}K_{p}\tilde {\varepsilon }_{f} \\&-\,\frac {1}{2}tr\big \{\tilde {V}^{T}\hat {x}\|\hat {\varepsilon }_{f}\|\sigma _{f}^{T}(\hat {x})\hat {F}\hat {W}^{T}\sigma ^\prime (\hat {V}^{T}\hat {x})\big \}+\delta _{v}tr\big \{\tilde {V}^{T}\hat {V}\big \} \\&-\,tr\big \{\tilde {F}^{T}\|\hat {\varepsilon }_{f}\|\sigma _{f}(\hat {x})\sigma ^{T}(\hat {V}^{T}\hat {x})\hat {W}\big \}+\delta _{w}tr\big \{\tilde {W}^{T}\hat {W}\big \} \\&+\,\delta _{f}tr\big \{\tilde {F}^{T}\hat {F}\big \}-\varepsilon _{f}^{T}\sum _{j=1}^{4}k_{j}\varrho _{j}\hat {\varepsilon }_{f}+\varepsilon _{f}^{T}(\chi -H\hat {p}) \\&+\,\tilde {p}^{T}[H(\hat {\varepsilon }_{f})\hat {\varepsilon }_{f}-\Theta (\hat {p}-p^{0})].\tag{40}\end{align*} View Source

By applying the next inequalities via using (24) and Lemma 1:\begin{align*}&\hspace {-2pc}\|\varepsilon _{f}^{T}r_{t}\| \\\leq&\frac {1}{2}\|W\|_{F}^{2}+\frac {1}{2}\|V\|_{F}^{2}+\frac {1}{2}\|\varepsilon _{f}\|^{2}\varrho _{4}, \\&-\,tr\big \{\tilde {W}^{T}\big (\sigma (\hat {V}^{T}\hat {x})-{\sigma }^\prime (\hat {V}^{T}\hat {x})\hat {V}^{T}\hat {x}\big)\tilde {\varepsilon }_{f}^{T}\big \} \\&-\,tr\big \{\tilde {V}^{T}\hat {x}\tilde {\varepsilon }_{f}^{T}\hat {W}^{T}\sigma ^\prime (\hat {V}^{T}\hat {x})\big \}\leq \frac {1}{2}\|\tilde {W}\|_{F}^{2}+\frac {1}{2}\|\tilde {V}\|_{F}^{2} \\&+\,\frac {1}{2}B_{\varepsilon }^{2}\varrho _{4}, \\&-\,\frac {1}{2}tr\big \{\tilde {W}^{T}\big (\sigma (\hat {V}^{T}\hat {x})-{\sigma }^\prime (\hat {V}^{T}\hat {x})\hat {V}^{T}\hat {x}\big)\|\hat {\varepsilon }_{f}\|\sigma _{f}^{T}(\hat x)\hat {F}\big \} \\\leq&\frac {1}{4}\|\tilde {W}\|_{F}^{2}+\frac {1}{2}\|\varepsilon _{f}\|^{2}\varrho _{1}+\frac {1}{2}B_{\varepsilon }^{2}\varrho _{1},\\&-\,\frac {1}{2}tr\big \{\tilde {V}^{T}\hat {x}\|\hat {\varepsilon }_{f}\|\sigma _{f}^{T}(\hat {x})\hat {F}\hat {W}^{T}\sigma ^\prime (\hat {V}^{T}\hat {x})\big \}\leq \frac {1}{4}\|\tilde {V}\|_{F}^{2} \\&+\,\frac {1}{2}\|\varepsilon _{f}\|^{2}\varrho _{2}+\frac {1}{2}B_{\varepsilon }^{2}\varrho _{2},\\&-\,tr\big \{\tilde {F}^{T}\|\hat {\varepsilon }_{f}\|\sigma _{f}(\hat {x})\sigma ^{T}(\hat {V}^{T}\hat {x})\hat {W}\big \}\!\leq \!\|\tilde {F}\|_{F}^{2}\!+\!\frac {1}{2}\|\varepsilon _{f}\|^{2}\varrho _{3} \\&+\,\frac {1}{2}B_{\varepsilon }^{2}\varrho _{3}, \\&-\,\varepsilon _{f}^{T}K_{p}\tilde {\varepsilon }_{f}\leq \frac {1}{2}\lambda _{\max }\{K_{p}\}\|\varepsilon _{f}\|^{2}+\frac {1}{2}\lambda _{\max }\{K_{p}\}B_{\varepsilon }^{2}, \\ \varepsilon _{f}^{T}\gamma\leq&\frac {\iota _{1}}{2}\|\varepsilon _{f}\|^{2}+\frac {\iota _{1}}{2}\|x_{\varepsilon }\|^{2}+\frac {\iota _{2}}{2}\|\varepsilon _{f}\|^{2}+\frac {\iota _{2}}{2}\|x_{\varepsilon }\|^{4}, \\&\big (1\!+\!\|\epsilon _{e}\|^{2}\big)^{-\frac {1}{2}}\epsilon _{e}^{T}\varepsilon _{f}\leq \frac {1}{2}\|\epsilon _{e}\|^{2}/\big (1\!+\!\|\epsilon _{e}\|^{2}\big)\!+\!\frac {1}{2}\|\varepsilon _{f}\|^{2},\end{align*} View Source

Eq. (40) is simplified as follows:\begin{align*} \dot {E}\leq&-\beta _{m}\|x_{\varepsilon }\|^{2}+\varepsilon _{f}^{T}(\chi -H\hat {p})+\sum _{j=1}^{4}\xi _{j}\varrho _{j} \\&+\,\delta _{f}tr\big \{\tilde {F}^{T}\hat {F}\big \}+\delta _{v}tr\big \{\tilde {V}^{T}\hat {V}\big \}+\delta _{w}tr\big \{\tilde {W}^{T}\hat {W}\big \} \\&+\,\frac {1}{2}\|W\|_{F}^{2}+\frac {1}{2}\|V\|_{F}^{2}+\frac {3}{4}\|\tilde {W}\|_{F}^{2}+\|\tilde {F}\|_{F}^{2} \\&+\,\frac {3}{4}\|\tilde {V}\|_{F}^{2}+\tilde {p}^{T}\big [H(\hat {\varepsilon }_{f})\hat {\varepsilon }_{f}-\Theta (\hat {p}-p^{0})\big] \\&+\,\frac {\iota _{1}}{2}\|x_{\varepsilon }\|^{2}+\frac {\iota _{2}}{2}\|x_{\varepsilon }\|^{4}+\frac {1}{2}\lambda _{\max }\{K_{p}\}B_{\varepsilon }^{2},\tag{41}\end{align*} View Source where $\beta _{m}=\min \big \{\lambda _{\min }\{\Lambda _{\varepsilon }\}-0.5,\lambda _{\min }\{K_{p}+D(\epsilon _{e})\} -0.5\lambda _{\max }\{K_{p}\}-\iota _{1}/2-\iota _{2}/2-0.5\big \}$ and $\xi _{j}\varrho _{j}=\big (-(0.5k_{j}-0.5)\|\varepsilon _{f}\|^{2}+(0.5k_{j}+0.5)B_\varepsilon ^{2}\big)\varrho _{j}$ . So, if $\|\varepsilon _{f}\|\geq B_\varepsilon \sqrt {(k_{j}+1)/{(k_{j}-1)}}$ be true with $k_{j}>1$ , one has $\xi _{j}\varrho _{j}\leq 0$ . By using the Young’s Inequality [55] and the following relation [56]:\begin{align*}&\hspace {-2pc}\tilde {p}^{T}\big [H(\hat {\varepsilon }_{f})\hat {\varepsilon }_{f}-\Theta (\hat {p}-p^{0})\big]+\varepsilon _{f}^{T}(\chi -H\hat {p}) \\\leq&1.5B_{\varepsilon }^{2} \\&+\,0.2785[c_{1r},\ldots,c_{6r}]p+\|p\|^{2}-\frac {1}{2}\lambda _{\min }\{\Theta \}\|\tilde {p}\|^{2} \\&+\,0.5\|\tilde {p}\|^{2}+0.5(p-p^{0})^{T}\Theta (p-p^{0}),\end{align*} View Source

Eq. (41) is rewritten as \begin{align*} \dot {E}\leq&-\big (\beta _{m}-0.5\iota _{1}-0.5\iota _{2}\|x_{\varepsilon }\|^{2}\big)\|x_{\varepsilon }\|^{2} \\&-\,\big (0.5\lambda _{\min }\{\Theta \}-0.5\big)\|\tilde {p}\|^{2}\!-\!\left({\delta _{w}\left({1\!-\!\frac {1}{2k^{2}}}\right)\!-\!\frac {3}{4}}\right)\|\tilde {W}\|_{F}^{2} \\&-\,\left({\delta _{v}\left({1-\frac {1}{2k^{2}}}\right)-\frac {3}{4}}\right)\|\tilde {V}\|_{F}^{2}-\left({\delta _{f}\left({1-\frac {1}{2k^{2}}}\right)-1}\right)\|\tilde {F}\|_{F}^{2} \\&+\,\Xi,\tag{42}\end{align*} View Source where $k>\sqrt {2}/2$ , $\Xi =0.2785[c_{1r},\ldots,c_{6r}]p+1.5B_{\varepsilon }^{2}+\|p\|^{2}+0.5(p-p^{0})^{T}\Theta (p-p^{0}) +0.5\lambda _{\max }\{K_{p}\}B_{\varepsilon }^{2}+0.5\delta _{w}k^{2}\|W\|_{F}^{2}+0.5\delta _{v}k^{2}\|V\|_{F}^{2}+0.5\delta _{f}k^{2}\|F\|_{F}^{2} +0.5\|W\|_{F}^{2}+0.5\|V\|_{F}^{2}$ . If the following condition be true for $\beta _{m}$ :\begin{equation*} \beta _{m}>0.5\iota _{1}+0.5\iota _{2}\|x_{\varepsilon }\|^{2},\tag{43}\end{equation*} View Source one has \begin{align*} \dot {E}\leq&-c_{1}\|x_{\varepsilon }\|^{2}-c_{2}\|\tilde {W}\|_{F}^{2}-c_{3}\|\tilde {V}\|_{F}^{2}-c_{4}\|\tilde {F}\|_{F}^{2} \\&-\,c_{5}\|\tilde {p}\|^{2}+\Xi,\tag{44}\end{align*} View Source where $c_{1}=\beta _{m}-0.5\iota _{1}-0.5\iota _{2}\|x_{\varepsilon }\|^{2},c_{2}=\delta _{w}(1-{1}/{2k^{2}})-0.75,c_{3}=\delta _{v}(1-{1}/{2k^{2}})-0.75$ , $c_{4}=\delta _{f}(1-{1}/{2k^{2}})-1$ and $c_{5}=0.5\lambda _{\min }\{\Theta \}-0.5$ . Then, one has $\dot {E}\leq -c\|z_{l}\|^{2}+\Xi $ where $c=\min \{c_{1},\ldots,c_{5}\}$ . This means that $\dot {E}$ is strictly negative when it is outside the compact set $\Omega _{z_{l}}=\big \{z_{l}|0\leq \|z_{l}\|\leq \sqrt {\Xi /c} \big \}\,\,\forall t\geq 0$ . Next, one may find that $E(t)\leq E(0)\leq \lambda _{u}\|z_{u}(0)\|^{2}$ by recalling (38). Consequently, one infers that $\|x_{\varepsilon }\|^{2}\leq {\lambda _{u}}/{\lambda _{x_{\varepsilon }}}\|z_{u}(0)\|^{2}$ . As a result, an adequate situation for (43) is $2\beta _{m}>\iota _{1}+\iota _{2}({\lambda _{u}}/{\lambda _{x_{\varepsilon }}})\|z_{u}(0)\|^{2}$ which means that the following region of attraction \begin{equation*} R_{A}=\left\{{z_{u}\in \Re ^{18+\xi _{n}}|\|z_{u}\|^{2}< \frac {\lambda _{x_{\varepsilon }}(2\beta _{m}-\iota _{1})}{\iota _{2}\lambda _{u}}}\right\},\tag{45}\end{equation*} View Source where $\xi _{n}=(N_{h}+1)N_{o}+(N_{i}+1)N_{h}+2N\times m$ , can include every initial condition by choosing control gains properly. By recalling (15), one deduces $\dot {\epsilon }_{e}\in \mathcal {L}_\infty $ . This situation implies that tracking errors, NN and fuzzy weights as well as variables estimation errors are SGUUB. Besides, by employing Assumptions 2–3 and $\|\tilde {\varepsilon }_{f}\|\leq B_{\varepsilon }$ , one concludes that $\epsilon _{e},\hat {\varepsilon }_{f},\hat {w}_{11},\ldots,\hat {w}_{(N_{h}+1)N_{o}},\hat {v}_{11},\ldots,\hat {v}_{(N_{i}+1)N_{h}}, \hat {f}_{11},\,\,\ldots,\hat {f}_{2N\times m}$ , $\hat {p}\in \mathcal {L}_\infty $ . Because $\epsilon _{e}\in \mathcal {L}_\infty $ and Assumption 1 is considered, $\epsilon _{e}$ tends to a vicinity of the origin, then, the tracking errors converge to the origin vicinity guaranteeing output constraints and the boundaries conditions on the tracking errors, i.e. $\eta _{li}(t)\leq {e_{i}}(t) \leq \eta _{ui}(t)$ are never breached as $t\rightarrow \infty $ , and the proof is accomplished here.

References is not available for this document.

Saturated Output-Feedback Hybrid Reinforcement Learning Controller for Submersible Vehicles Guaranteeing Output Constraints

Abstract:

Metadata

Abstract:

Funding Agency:

Introduction