1. Introduction
Recovering the 3D structure of a scene directly from images has been one of the most studied topics in computer vision. Depth estimation represents the first step for this purpose and a cornerstone for higher-level applications such as augmented reality, autonomous or assisted driving, robotics, and more. Although a variety of custom, active sensors exists for this task – LiDARs, Radars, Time-of-Flight (ToF), just to name a few – approaches estimating depth from one or multiple color images have gained higher and higher popularity with the advent of deep learning. Despite the steady improvements we witnessed in the last decade, estimating depth in certain conditions remains an open challenge. In particular, we identify two as the main sources of trouble.