Journals & Magazines >IEEE Transactions on Multimedia >Volume: 27

Delving Into Multi-Illumination Monocular Depth Estimation: A New Dataset and Method

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Monocular depth prediction has received significant attention in recent years. However, the impact of illumination variations, which can shift scenes to unseen domains, h...Show More

Metadata

Abstract:

Monocular depth prediction has received significant attention in recent years. However, the impact of illumination variations, which can shift scenes to unseen domains, has often been overlooked. To address this, we introduce the first indoor scene dataset featuring RGB-D images captured under multiple illumination conditions, allowing for a comprehensive exploration of indoor depth prediction. Additionally, we propose a novel method, MI-Transformer, which leverages global illumination understanding through large receptive fields to capture depth-attention contexts. This enables our network to overcome local window limitations and effectively mitigate the influence of changing illumination conditions. To evaluate the performance and robustness, we conduct extensive qualitative and quantitative analyses on both the proposed dataset and existing benchmarks, comparing our method with state-of-the-art approaches. The experimental results demonstrate the superiority of our method across various metrics, making it the first solution to achieve robust monocular depth estimation under diverse illumination conditions.

Published in: IEEE Transactions on Multimedia ( Volume: 27)

Page(s): 1018 - 1032

Date of Publication: 12 January 2024

ISSN Information:

DOI: 10.1109/TMM.2024.3353544

Funding Agency:

Contents

I. Introduction

Monocular depth estimation is a critical process in computer vision and computer graphics that involves determining the distance from the camera to objects within a 3D scene based on a 2D image [1], [2], [3]. It is a prominent research area with applications in autonomous driving [4], VR/AR [5], and 3D reconstruction [6]. The impact of illumination on depth estimation, a crucial aspect often overlooked, presents unique challenges. Unlike other dense prediction tasks such as semantic segmentation, which have been specifically adapted to address variations in lighting (e.g., night-time semantic segmentation [7], [8], [9], [10]), depth estimation encounters added complexity due to its dependence on intricate 3D understanding. This issue is starkly evident when contrasting day and night environments. For instance, Fig. 1 showcases how a scene during the day, bathed in natural light, can clearly delineate structural features and textures. Conversely, the same scene under night-time conditions poses substantial difficulties for depth estimation, primarily due to diminished visibility and the influence of artificial lighting, which may cast deep shadows or create deceptive highlights. Such variations in illumination necessitate advanced depth estimation methods that are robust enough to consistently minimize errors and accurately interpret depth information across a broad spectrum of lighting conditions.

References is not available for this document.

MIT Libraries

MIT Libraries

Delving Into Multi-Illumination Monocular Depth Estimation: A New Dataset and Method

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

I. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

MIT Libraries

MIT Libraries

Delving Into Multi-Illumination Monocular Depth Estimation: A New Dataset and Method

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

I. Introduction

References