I. Introduction
Ship autonomous navigation technology is a crucial technology for maritime safety guarantee, which integrates intelligent perception, anti-collision, decision-making, control, and communication. In recent years, with the development of artificial intelligence technology, intelligent learning methods have been gradually applied to the fields of robots, drones, and unmanned vehicles, in the fields of intelligent optimization scheduling, decision planning, trajectory following, and forecasting [1]–[4]. RL is an artificial intelligence-based optimization learning method. Compared with traditional optimization or planning algorithms, this method does not rely on prior knowledge and supervision information, through “trial and error” interacting with the environment, balancing exploration, and utilization, learning optimization and planning are finally realized. According to this advantage, it has received more and more attention and research about autonomous ship decision-making, planning, and control [5]–[8].