Zero-Sum Game (ZSG) based Integral Reinforcement Learning for Trajectory Tracking Control of Autonomous Smart Car | IEEE Conference Publication | IEEE Xplore

Zero-Sum Game (ZSG) based Integral Reinforcement Learning for Trajectory Tracking Control of Autonomous Smart Car


Abstract:

The ultimate aim of our research study is the development, practical implementation, and benchmarking of continuous-time, online reinforcement learning (RL) schemes for t...Show More

Abstract:

The ultimate aim of our research study is the development, practical implementation, and benchmarking of continuous-time, online reinforcement learning (RL) schemes for the trajectory tracking control (TTC) of fully autonomous vehicles (AVs) in real-world scenarios. The adaptive optimality and model-free nature offered by RL has a stronger promise against its model-based counterparts, such as MPC, against uncertainties related to the vehicle, road, tire-terrain and environmental dynamics. The existing studies on RL based AV control are mostly theoretical, often dealing with high-level TTC, and perform evaluations in simulations considering simplified or linear models with no disturbance and slip effects. The literature also demonstrates the lack of practical implementations in overall RL based autonomous vehicle control. Our ultimate goal is to fill these theoretical and practical gaps by designing and practically evaluating novel RL strategies that will improve the performance of TTC against uncertainties at all levels. This paper presents the simulation results of our preliminary studies in the online, longitudinal tracking control of a realistic AV (with uncertain nonlinear dynamics, as well as disturbance, and slip effects), which we treat as a Zero-Sum Game (ZSG) problem using an Integral Reinforcement Learning (IRL) approach with synchronous actor and critic updates (SyncIRL). The results are promising and motivate the practical implementation of the approach for combined longitudinal and lateral control of AV.
Date of Conference: 01-03 June 2022
Date Added to IEEE Xplore: 14 November 2022
ISBN Information:

ISSN Information:

Conference Location: Anchorage, AK, USA
References is not available for this document.

I. Introduction

Trajectory tracking control (TTC) is the motion control layer in AV operation, and plays a critical role in the safety, performance and efficiency of autonomous vehicles (AVs). TTC involves two levels of control; high-level, i.e. calculation of the required force for the desired motion of the vehicle, and low level control to calculate the wheel motor torque that will generate the required force. The high-level control faces uncertainties of the vehicle dynamics and external disturbances, while the low-level control must handle the slip effects, which is a major challenge in vehicle control.

Select All
1.
G. Klančar and I. Škrjanc, "Tracking-error model-based predictive control for mobile robots in real time", Robotics and autonomous systems, vol. 55, no. 6, pp. 460-469, 2007.
2.
I. Maurović, M. Baotić and I. Petrović, "Explicit model predictive control for trajectory tracking with mobile robots", 2011 IEEE/ASME International Conference on Advanced Intelligent Mechatronics (AIM), pp. 712-717, 2011.
3.
Al. Liniger, A. Domahidi and M. Morari, "Optimization-based autonomous racing of 1: 43 scale rc cars", Optimal Control Applications and Methods, vol. 36, no. 5, pp. 628-647, 2015.
4.
B. Kiumarsi, K. G. Vamvoudakis, H. Modares and F. L. Lewis, "Optimal and Autonomous Control Using Reinforcement Learning: A Survey", IEEE Transactions on Neural Networks and Learning Systems, vol. 29, no. 6, pp. 2042-2062, June 2018.
5.
D. Vrabie, F. L. Lewis and K. G. Vamvoudakis, "Optimal Adaptive Control and Differential Games by Reinforcement Learning Principles 2012", The Institution of Engineering and Technology (Verlag), ISBN 978-1-84919- 490-7.
6.
R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction, Cambridge:MIT press, vol. 1, no. 1, 1998.
7.
C. J. C. H. Watkins, "Learning from delayed rewards", Ph.D. dissertation King’s College Cambridge Univ. Cambridge U.K., 1989.
8.
C. J. Watkins and P. Dayan, "Q-learning", Machine learning, vol. 8, no. 3-4, pp. 279-292, 1992.
9.
P. J. Werbos, Approximate dynamic programming for real-time control and neural modeling Handbook of Intelligent Control, New York:Van Nostrand Reinhold, 1992.
10.
P. J Werbos, "Neural networks for control and system identification", IEEE Proc. CDC, vol. 1, pp. 260-265, 1989.
11.
P. J. Werbos, "Approximate dynamic programming for real-time control and neural modeling", Handbook of intelligent control, 1992.
12.
D. P. Bertsekas and J. N. Tsitsiklis, "Neuro-dynamic programming: an overview", Decision and Control 1995. Proceedings of the 34th IEEE Conference on, vol. 1, pp. 560-564, 1995.
13.
K. Doya, "Reinforcement learning in continuous time and space", Neural Comput., vol. 12, no. 1, pp. 219-245, 2000.
14.
D. Vrabie, O. Pastravanu, M. Abu-Khalaf and F. L. Lewis, "Adaptive optimal control for continuous-time linear systems based on policy iteration", Automatica, vol. 45, no. 2, pp. 477-484, Feb. 2009.
15.
B. Kiumarsi, W. Kang and F. L. Lewis, "H∞ control of nonaffine aerial systems using off-policy reinforcement learning", Unmanned Syst., vol. 4, no. 1, pp. 51-60, 2016.
16.
F. Lewis, D. Vrabie and K. Vamvoudakis, "Reinforcement learning and feedback Control", IEEE Control Systems Magazine, Dec. 2012.
17.
M. Abu-Khalaf and F. L. Lewis, "Nearly Optimal Control Laws for Nonlinear Systems with Saturating Actuators Using a Neural Network HJB Approach", Automatica, vol. 41, no. 5, pp. 779-791, 2005.
18.
D. Vrabie, "Online Adaptive Optimal Control for Continuous Time Systems", Ph.D. Thesis Dept. of Electrical Engineering, 2009.
19.
D. Vrabie, O. Pastravanu, F. L Lewis and M. Abu-Khalaf, "Adaptive Optimal Control for continuous-time linear systems based on policy iteration", Automatica, vol. 45, no. 2, pp. 477-484, 2009.
20.
D. Vrabie, K. Vamvoudakis and F Lewis, "Adaptive optimal controllers based on generalized policy iteration in a continuous-time framework", Proc. of the IEEE Mediterranean Conf. on Control and Automation, pp. 1402-1409, 2009.
21.
T. Haarnoja et al., "Soft actor-critic algorithms and applications", 2018.
22.
S. Levine, C. Finn, T. Darrell and P. Abbeel, "End-to-End Training of Deep Visuomotor Policies", Journal of Machine Learning Research, vol. 17, pp. 1-40, 2016.
23.
J. Ma, H. Xie, K. Song and H Liu, "Self-Optimizing Path Tracking Controller for Intelligent Vehicles Based on Reinforcement Learning", Symmetry, vol. 14, pp. 31, 2022.
24.
Akhil Nagariya, Dileep Kalathil and Srikanth Saripalli, "OTTR: Off-Road Trajectory Tracking using Reinforcement Learning", 2021.
25.
Q. Jiao, H. Modares, S. Xu, F. L. Lewis and K. G. Vamvoudakis, Multi-agent zero-sum differential graphical games for disturbance rejection in distributed control Automatica, vol. 69, pp. 24-34, 2016.
26.
F. Tatari, K. G. Vamvoudakis and M. Mazouchi, Optimal distributed learning for disturbance rejection in networked non-linear games under unknown dynamics, 2019, ISSN 1751-8644.
27.
K. G. Vamvoudakis, D. Vrabie and F.L. Lewis, "Online Learning Algorithm for Zero-Sum Games with Integral Reinforcement Learning", JAISCR, vol. 1, no. 4, pp. 315, 2011.
28.
K. G. Vamvoudakis and N.-M.T. Kokolakis, "Synchronous Reinforcement Learning-Based Control for Cognitive Autonomy", Foundations and Trends R in Systems and Control, vol. 8, no. 1–2, pp. 1-175, 2020.
29.
K. G. Vamvoudakis and F. L. Lewis, "Online Actor Critic Algorithm to Solve the Continuous-Time Infinite Horizon Optimal Control Problem", Proceedings of International Joint Conference on Neural Networks Atlanta Georgia USA, pp. 3180-3187, June 14-19, 2009.
30.
G. Lei and H. Zhao, "Online Adaptive Optimal Control Algorithm Based on Synchronous Integral Reinforcement Learning with Explorations", 2021.
Contact IEEE to Subscribe

References

References is not available for this document.