A Hybrid Tracking Control Strategy for Nonholonomic Wheeled Mobile Robot Incorporating Deep Reinforcement Learning Approach

SECTION I.

Introduction

A. Related Works

The tracking control of wheeled mobile robot is one of the fundamental functions of robot autonomous navigation, which has been widely used in inspection, security, cleaning, planet exploration, military application and so on. It could be classified as a nonlinear system with multi-inputs and multi-outputs, it is also a underactuated system with uncertainty actually. In addition, wheeled mobile robot must be subjected to nonholonomic constraints, which make it challenging to construct a controller with desired performance. As a result, tracking control of nonholonomic wheeled mobile robot (NWMR) has always been a research focus for past decades.

Due to the existence of nonholonomic constraints of WMR, the approaches to tracking control including both kinematics control and dynamics control, which has been a basic pipeline of tracking control for NWMR. As for kinematics control, it is used to tracking desired pose with WMR’s speed commands, thus a time-varying controller based on Lyapunov theory was proposed in [1]. Kinematics mode with chained-form [2], [3] was used to transform this complex system to a convenient one. Besides, the model described in polar coordinates [4] was also reported to design a robust system. In [5], mode prediction control algorithm combined with neural-dynamics optimization was proposed by using the derived tracking-error kinematics, which effectively achieved tracking control under the velocity constraints and velocity increment constraints. More recently, in [6], nonlinear controllers using synthetic-analytic behavior-based control framework was presented to track with velocity constraints. A PID-based kinematic controller is proposed as a non-model based controller to navigate the tractor-trailer wheeled robot follows desired trajectories in [7].

As for a real WMR, it is obvious that separate kinematics controller can not perform trajectory tracking well. So various advanced dynamics controller is adopted to track the desired velocity, which is the output of kinematics controller exactly. These researches have been mainly concentrated on overcoming system uncertainties and external disturbances. Instead of classical torque-based control, a robust control approach [8] was developed based on the voltage control strategy. In [9], a robust adaptive controller is proposed with the utilization of adaptive control, backstepping and fuzzy logic techniques. Considering the robust performance of sliding mode control, in [10], a controller with finite-time convergence of the tracking errors was provided, the disturbance observer and adaptive compensator were used to enhanced the robustness of system. Similarly, an integral terminal sliding mode controller [11] was adopted in the presence of parameter uncertainties and external disturbances, and an adaptive fuzzy observer was introduced to compensate the non-measurement of velocity. In [12] a fast terminal sliding mode control scheme was proposed under known or unknown upper bound of the system uncertainty and external disturbances.

From the perspective of optimization, a controller [13] based on model prediction control was proposed to prevent sideslip and improve the performance of path tracking control. A nonlinear model predictive controller [14] was introduced by using a set of modifications to track a given trajectory. With the consideration of the uncertainties to be time varying and dynamic, a robust control strategy [15] was proposed with time delay control. In [16], the optimization-based nonlinear control laws were analytically developed using the prediction of WMR responses, the tracking precision is more increased with the integral feedback technique appending. Because neural networks(NNs) can approximate nonlinear functions well, the NNs-based method [17] was provided to approximate the unknown modeling item, the skidding and the slipping item, though it is not common in the low-level driver. Therefore, it could be concluded that kinematics control and dynamics control are two different ways to address the problem of tracking control, in which a variety of nonlinear control approaches could be employed. And both methods highly dependent on a system model, an acceptable algorithm with more accurate model may lead to more precise control accuracy. However, it is hard to describe with nonlinear formulations exactly, especially the mode uncertainties and disturbances.

Except the model-based control method mentioned above, the learning-based (reinforcement learning) methods have been become a new research focus [18], because there is no need to consider a system model. In [19], with the candidate parameters of the PD controller defined as the action space, a hierarchical reinforcement learning approach for optimal path tracking of WMR was proposed, but the state space and action space was decomposed into several subspaces, which is not amenable to the continuous control problem. Thus, the RL method with continuous space has been studied, an actor-critic goal-oriented deep RL architecture [20] was developed to achieve adaptive low-level control strategy in continuous space. In [21], an RL algorithm is designed to generate an optimal control signal for uncertain nonlinear MIMO systems. In [22], A RL-based adaptive tracking control algorithm is proposed for a time-delayed WMR system with slipping and skidding. In [23] a layered depth reinforcement learning algorithm for robot composite tasks is proposed, which is superior to common deep reinforcement learning algorithm among discrete state space. A solution for the path following problem of a quadrotor vehicle based on deep reinforcement learning theory is proposed in three different conditions [24].

Although the excellent performance with RL algorithms, it has been suffered from the disadvantages of time-consuming training and ineffective sampling with interaction between agent and environment [25]. Thus, In [26], a model-based reinforcement learning algorithm with excellent sample complexity was achieved by combining neural network dynamics models with model predictive control (MPC), which produce stable and plausible gaits that accomplish various complex locomotion tasks. And, a kernel-based dynamic model for reinforcement learning was proposed to fulfill the robotic tracking tasks [27], and the optimal control policy is searched by the model-based RL method. In [28] multi pseudo Q-learning-based deterministic policy gradient algorithm was proposed to achieve high-level tracking control accuracy of AUVs, which validated that increasing the number of the actors and critics could further improve the performance. Recently, a data-based approach for analyzing the stability of discrete-time nonlinear stochastic systems modeled by Markov decision process, by using the classic Lyapunov’s method in control theory [29]. Due to the limited exploration ability caused deterministic policy, high-speed autonomous drifting is addressed, using a closed-loop controller based on the deep RL algorithm soft actor critic (SAC) to control the steering angle and throttle of simulated vehicles in [30]. We should notice a fact that deep reinforcement learning algorithms always require time-consuming training episodes. This may be acceptable to a certain extent for simulated robots, but it is not feasible for a actual environment. So the effort should be concentrated on improving the efficiency of deep reinforcement learning algorithms.

B. Motivation of Our Approach

In general, the model-based control methods have always been preferred to develop a controller, and the performance will depend largely on the accuracy of the model. However, model uncertainty and external disturbances are objective and have to be addressed. Thus, a number of robust strategies should be adopted to obtain a controller with more precise control accuracy. Furthermore, once the control algorithm is determined, the accuracy of the controller remains unchanged. It may lose the possibility of improving itself by learning, just like what our humans doing. While the RL-based method do not need a system mode at all, and the human-level performance could be obtained with a reasonable end-to-end training process. Naturally, the synthesis of mode-based control and learning-based control could be a pretty alternative for autonomous WMR.

Considering the great tracking performance of dynamics controller at present, we prefer to control the velocity based on kinematics mode. And existing kinematics controllers are used for solving a complex nonlinear control problem. Thus, it is suboptimal and difficult to improved with model-based methods. So, the learning method can be used to optimize the existing kinematics controller to obtain a better tracking performance.

Thus, in our effort of tracking control for NWMR, the kinematics control is chose as mode-based method, just like “given talent” of human. And the actor-critic based reinforcement learning method is adopted to learn the tracking experience during the whole tracking process, just like ”acquired knowledge”. The main contribution of our proposed method are as follows.

A hybrid control strategy combining mode-based method and deep reinforcement learning method for tracking control is proposed, which shows better performance both in accuracy and efficiency.
The state is defined including current tracking errors, given control inputs and one-step errors, which is one of the keys to efficient convergence of tracking control.

The reminder of this paper is organized as follows. In Section II, the kinematics mode of NWMR with constraints and the given control law based Lyapunvo theory are described. In Section III, we elaborate our hybrid control strategy combined mode-based control and actor-critic based DRL in detail. In Section IV, we present the simulation results of our method under periodic and random disturbances. Finally, we conclude the full text in Section V.

SECTION II.

Given Control Law Based on Kinematics Model

A. Kinematics Model of NWMR With Velocity and Acceleration Constraints

As shown in Fig. 1, a NWMR with two drive wheels whose axis is connected through the geometric center of the robot body. The left and right drive wheels are respectively driven by two hub motors to realize the forward, backward and turning of the robot. Point C is midpoint between two hub motors’ axial connections, and its coordinate in the global coordinate system is (x,y), $\theta $ is orientation of mobile robot. $v$ is linear velocity of robot, and $\omega $ is angular velocity robot. What’s more, $r$ represents radius of outer ring of drive wheel, $L$ denotes vertical distance between the center of the drive wheel and the midpoint C of the shaft center line. $v_{l}$ , $v_{r}$ are the line speeds of the right and left drive wheels, respectively.

FIGURE 1.

Nonholonomic wheeled mobile robot.

A Hybrid Tracking Control Strategy for Nonholonomic Wheeled Mobile Robot Incorporating Deep Reinforcement Learning Approach

Alerts

Abstract:

Metadata

Abstract:

Funding Agency:

Introduction

A. Related Works

B. Motivation of Our Approach

Given Control Law Based on Kinematics Model

A. Kinematics Model of NWMR With Velocity and Acceleration Constraints

B. Given Control Law

Hybrid Control Strategy Incorporating Deep Reinforcement Learning Approach

A. Finite MDP

B. Acquired Control Learning With Actors-Critics Architecture

Algorithm 1 Hybird Strategy of Tracking Control for NWMR

Simulation Results

Conclusion

Appendix

Appendix

References