1. Introduction
Recent progress in deep learning has enabled the prediction of fairly reliable depth maps from single RGB images [20], [31], [32], [47]. However, despite the specialized network architectures [11], [29], [31] and training strategies [32], [46] in single image depth estimation (SIDE) models, the estimated depth maps are still inadequate in the following aspects: (i) depth boundaries tend to be blurry and inaccurate; (ii) thin structures such as poles and wires are often missing; and (iii) depth values in narrow or isolated background regions (e.g., between body parts in humans) are often imprecise, as shown in the initial depth estimation in Figure 1. Addressing these issues within a single SIDE model can be very challenging due to limited model capacity and the lack of high-quality RGB-D datasets.
Our layered depth refinement result on an initial prediction by DPT [31]. Aided by a high-quality mask generated with an auto-masking tool [33], our method is able to accurately refine mask boundaries and correct depth values in isolated hole regions between body parts. Regions in and are refined and inpainted/outpainted separately with our layered approach.