Loading [MathJax]/extensions/MathZoom.js
Anytime Stereo Image Depth Estimation on Mobile Devices | IEEE Conference Publication | IEEE Xplore

Anytime Stereo Image Depth Estimation on Mobile Devices


Abstract:

Many applications of stereo depth estimation in robotics require the generation of accurate disparity maps in real time under significant computational constraints. Curre...Show More

Abstract:

Many applications of stereo depth estimation in robotics require the generation of accurate disparity maps in real time under significant computational constraints. Current state-of-the-art algorithms force a choice between either generating accurate mappings at a slow pace, or quickly generating inaccurate ones, and additionally these methods typically require far too many parameters to be usable on power- or memory-constrained devices. Motivated by these shortcomings, we propose a novel approach for disparity prediction in the anytime setting. In contrast to prior work, our end-to-end learned approach can trade off computation and accuracy at inference time. Depth estimation is performed in stages, during which the model can be queried at any time to output its current best estimate. Our final model can process 1242×375 resolution images within a range of 10-35 FPS on an NVIDIA Jetson TX2 module with only marginal increases in error - using two orders of magnitude fewer parameters than the most competitive baseline. The source code is available at https://github.com/mileyan/AnyNet.
Date of Conference: 20-24 May 2019
Date Added to IEEE Xplore: 12 August 2019
ISBN Information:

ISSN Information:

Conference Location: Montreal, QC, Canada

I. Introduction

Depth estimation from stereo camera images is an important task for 3D scene reconstruction and understanding, with numerous applications ranging from robotics [30], [51], [39], [42] to augmented reality [53], [1], [35]. High-resolution stereo cameras provide a reliable solution for 3D perception - unlike time-of-flight cameras, they work well both indoors and outdoors, and compared to LiDAR they are substantially more affordable and energy-efficient [29]. Given a rectified stereo image pair, the focal length, and the stereo baseline distance between the two cameras, depth estimation can be cast into a stereo matching problem, the goal of which is to find the disparity between corresponding pixels in the two images. Although disparity estimation from stereo images is a long-standing problem in computer vision [28], in recent years the adoption of deep convolutional neural networks (CNN) [52], [32], [20], [25], [36] has led to significant progress in the field. Deep networks can solve the matching problem via supervised learning in an end-to-end fashion, and they have the ability to incorporate local context as well as prior knowledge into the estimation process.

Contact IEEE to Subscribe

References

References is not available for this document.