I. Introduction
In the still unknown underwater world, there are some areas that need regular inspection, such as port infrastructures, which include berths, quay walls, adjacent pilings, ladders and bollards, where it is crucial to detect damage and corrosion, or even to clean and monitor the hulls and structures of large vessels. Typically, inspection of such structures is performed by divers or remotely operated vehicles (ROVs), but diving operations are often dangerous, and the use of ROVs requires an operator via a cable. Therefore, routine tasks need to be automated, and so autonomous underwater vehicles (AUVs) are increasingly being used. To perform these tasks, these vehicles must navigate in a practical manner and recognize a previously visited place (loop closure) in order to navigate successfully, as this is an essential aspect of compensating for accumulated pose deviations. However, the nature of the underwater environment can present a number of challenges for accurate vehicle positioning, and operating in close proximity to the object of inspection is still a challenge due to perceptual limitations. Typically, these scenarios have higher levels of distortion and noise caused by factors such as wind, currents, suspended particles, or physical factors related to vehicle control, so the appearance of the environment can change over time - dynamic spaces. In addition, port infrastructures are shallow waters, which means that they are strongly influenced by light irradiation (orientation of the sun’s angle), and on the other hand, they are usually also characterized by poor visibility conditions (turbidity) caused by the presence of suspended particles or even garbage. These problems make the perception of the scenario difficult, but it is crucial that vehicles are able to understand their environment and deal with all environmental conditions by selecting very robust features. Therefore, for robust close-range operations, vision-based systems are the most attractive solution for environmental sensing, as they can be deployed at ranges of less than 3 meters and offer rich information, high resolution, and simplicity [1]. Of course, there are some factors - water turbidity or light attenuation - that affect the quality of the captured images. A previous work analyzing the feasibility of a purely visual solution for underwater similarity detection has shown that visual sensors are sensitive to some severe conditions: high brightness and turbidity of the water [2]. In such situations, the vehicle may either detect some erroneous loop closures or even fail to detect some situations correctly, which is dangerous in a navigation context because there are no or incorrect adjustments and then an incorrect estimate of the vehicle position. Therefore, to take full advantage of these sensors, image enhancement techniques are increasingly used. Many algorithms have been extensively studied over the last decade to enhance underwater images, with histogram equalization (HE) standing out as a method for processing severely degraded images with poor or no contrast [3]. Therefore, this paper proposes a visual place recognition (VPR) method capable of dealing with the mentioned inherent perceptual problems in a harbour scenario. In this way, the aim is to identify possible limitations of using only cameras in these scenarios, to understand if visual data alone can overcome the challenges, and then to answer the question: "Are there conditions in which it would be interesting/required to combine other data (sensors)?". Given the lack of available data in this context, and to facilitate variation in environmental parameters, a harbour scenario was recreated using the Stonefish simulator.