1. Introduction
Image-based depth estimation, including stereo and multi-view dense reconstruction, has been widely studied in the computer vision community for decades. In conventional two-view stereo matching, deep learning methods [12, 4] have achieved drastic performance improvement recently. Besides, there are strong needs on omnidirectional or wide FOV depth sensing in autonomous driving and robot navigation to sense the obstacles and surrounding structures. Human drivers watch all directions, notjust the front, and holonomic robots need to sense all directions to move freely. However, conventional stereo rigs and algorithms cannot capture or estimate ultra wide FOV depth maps. Merging depth maps from multiple conventional stereo pairs can be one possibility, but the useful global context information cannot be propagated between the pairs and there might be a discontinuity at the seam.