Loading [a11y]/accessibility-menu.js
Dusk Till Dawn: Self-supervised Nighttime Stereo Depth Estimation using Visual Foundation Models | IEEE Conference Publication | IEEE Xplore

Dusk Till Dawn: Self-supervised Nighttime Stereo Depth Estimation using Visual Foundation Models


Abstract:

Self-supervised depth estimation algorithms rely heavily on frame-warping relationships, exhibiting substantial performance degradation when applied in challenging circum...Show More

Abstract:

Self-supervised depth estimation algorithms rely heavily on frame-warping relationships, exhibiting substantial performance degradation when applied in challenging circumstances, such as low-visibility and nighttime scenarios with varying illumination conditions. Addressing this challenge, we introduce an algorithm designed to achieve accurate selfsupervised stereo depth estimation focusing on nighttime conditions. Specifically, we use pretrained visual foundation models to extract generalised features across challenging scenes and present an efficient method for matching and integrating these features from stereo frames. Moreover, to prevent pixels violating photometric consistency assumption from negatively affecting the depth predictions, we propose a novel masking approach designed to filter out such pixels. Lastly, addressing weaknesses in the evaluation of current depth estimation algorithms, we present novel evaluation metrics. Our experiments, conducted on challenging datasets including Oxford RobotCar and MultiSpectral Stereo, demonstrate the robust improvements realized by our approach.
Date of Conference: 13-17 May 2024
Date Added to IEEE Xplore: 08 August 2024
ISBN Information:
Conference Location: Yokohama, Japan

I. Introduction

Depth estimation is a pivotal subject within computer vision, with wide-ranging implications for applications such as autonomous driving, augmented and virtual reality, and robotics [1], [2]. Despite the accomplishments of supervised depth estimation algorithms, these methods typically depend on high-resolution ground truth data - a challenge that requires substantial computational resources, costly 3D LiDAR sensors, and heavy computational requirements [3], [4].

Contact IEEE to Subscribe

References

References is not available for this document.