I. Introduction
Within the robotics field, Reinforcement Learning (RL) has demon-strated significant capabilities in various applications, enabling drones for autonomous racing, allowing quadrupedal robots with robust locomotion in diverse terrains, and equipping robots with the skills to manipulate objects [1]–[3]. RL has also shown capable of addressing the challenges encountered by underwater robots due to its adaptability and learning capabilities. Its applications in this domain encompass autonomous navigation and exploration [4]–[6], position tracking [7], manipulation of underwater objects [8], [9], and adaptive control in dynamic underwater environments [10].