1. Introduction
The ability to reconstruct 3D geometries from images or videos is crucial in various applications in robotics [16], [43], [52] and augmented/virtual reality [29], [35]. Multi-view stereo (MVS) [13], [15], [39], [47], [54], [55] is a commonly used technique for this task. A typical MVS pipeline involves multiple steps, i.e., multi-view depth estimation, filtering, and fusion [5], [13].