I. Introduction
The main components of an Autonomous Driving System [1], [2] include perception with multiple sensors, localization and mapping, planning, and control. The planning module utilizes the perception state to generate the motion-level commands based on route-level plans. Control for trajectory tracking defines speed, steering angle, and braking actions. The path planning and trajectory following tasks are typically solved by Reinforcement learning (RL) techniques [3]. In RL, an autonomous agent learns to improve its performance by interacting with its environment and receiving rewards based on its actions [4] - [6]. The agent aims to maximize its cumulative rewards over time by reaching the balance between exploration and exploitation [7].