1. Introduction
Acquiring the 3D geometry of real-world objects is a long standing topic in computer vision and graphics research, with many practical applications, ranging from scanning small objects up to modeling complete cities. Consequently, there is an abundance of 3D reconstruction techniques, which can be roughly classified into active techniques [3] relying on illuminating the scene (e.g. by lasers or structured light), and passive techniques that analyze a multitude of images of the scene and are thus referred to as multi-view stereo or pho-togrammetry methods [33]. The latter, image-based methods have a number of benefits compared to active techniques. One main advantage is that the capture process is simple and cheap, only requiring standard imaging hardware like consumer digital cameras. Additionally, image-based methods provide color information of the scene and offer high resolution scanning thanks to the advances in image sensors. A popular approach to image-based 3D reconstruction is to first compute camera poses and then estimate per-view depth maps by finding corresponding pixels between views and triangulating depth [14]. All pixels are then projected into 3D space to obtain a point cloud, from which a surface mesh is extracted using point cloud meshing techniques [2].