I. Introduction
Monocular depth estimation is a critical process in computer vision and computer graphics that involves determining the distance from the camera to objects within a 3D scene based on a 2D image [1], [2], [3]. It is a prominent research area with applications in autonomous driving [4], VR/AR [5], and 3D reconstruction [6]. The impact of illumination on depth estimation, a crucial aspect often overlooked, presents unique challenges. Unlike other dense prediction tasks such as semantic segmentation, which have been specifically adapted to address variations in lighting (e.g., night-time semantic segmentation [7], [8], [9], [10]), depth estimation encounters added complexity due to its dependence on intricate 3D understanding. This issue is starkly evident when contrasting day and night environments. For instance, Fig. 1 showcases how a scene during the day, bathed in natural light, can clearly delineate structural features and textures. Conversely, the same scene under night-time conditions poses substantial difficulties for depth estimation, primarily due to diminished visibility and the influence of artificial lighting, which may cast deep shadows or create deceptive highlights. Such variations in illumination necessitate advanced depth estimation methods that are robust enough to consistently minimize errors and accurately interpret depth information across a broad spectrum of lighting conditions.