A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation | IEEE Conference Publication | IEEE Xplore

A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation


Abstract:

Recent work has shown that optical flow estimation can be formulated as a supervised learning task and can be successfully solved with convolutional networks. Training of...Show More

Abstract:

Recent work has shown that optical flow estimation can be formulated as a supervised learning task and can be successfully solved with convolutional networks. Training of the so-called FlowNet was enabled by a large synthetically generated dataset. The present paper extends the concept of optical flow estimation via convolutional networks to disparity and scene flow estimation. To this end, we propose three synthetic stereo video datasets with sufficient realism, variation, and size to successfully train large networks. Our datasets are the first large-scale datasets to enable training and evaluation of scene flow methods. Besides the datasets, we present a convolutional network for real-time disparity estimation that provides state-of-the-art results. By combining a flow and disparity estimation network and training it jointly, we demonstrate the first scene flow estimation with a convolutional network.
Date of Conference: 27-30 June 2016
Date Added to IEEE Xplore: 12 December 2016
ISBN Information:
Electronic ISSN: 1063-6919
Conference Location: Las Vegas, NV, USA
References is not available for this document.

1. Introduction

Estimating scene flow means providing the depth and 3D motion vectors of all visible points in a stereo video. It is the “royal league” task when it comes to reconstruction and motion estimation and provides an important basis for numerous higher-level challenges such as advanced driver assistance and autonomous systems. Research over the last decades has focused on its subtasks, namely disparity estimation and optical flow estimation, with considerable success. The full scene flow problem has not been explored to the same extent. While partial scene flow can be simply assembled from the subtask results, it is expected that the joint estimation of all components would be advantageous with regard to both efficiency and accuracy. One reason for scene flow being less explored than its subtasks seems to be a shortage of fully annotated ground truth data.

Our datasets provide over 35000 stereo frames with dense ground truth for optical flow, disparity and disparity change, as well as other data such as object segmentation.

Select All
1.
S. Baker, D. Scharstein, J. Lewis, S. Roth, M. J. Black and R. Szeliski, "A database and evaluation methodology for optical flow", Technical Report MSR-TR-2009-179, December 2009.
2.
D. J. Butler, J. Wulff, G. B. Stanley and M. J. Black, "A naturalistic open source movie for optical flow evaluation", ECCV, pp. 611-625, Oct. 2012.
3.
J. Cech, J. Sanchez-Riera and R. P. Horaud, "Scene flow estimation by growing correspondence seeds", CVPR, 2011.
4.
A. Dosovitskiy, P. Fischer, E. Ilg, P. Häusser, C. Hazirbaş, V. Golkov, et al., "FlowNet: Learning optical flow with convolutional networks", ICCV, 2015.
5.
A. Dosovitskiy, J. T. Springenberg and T. Brox, "Learning to generate chairs with convolutional neural networks", CVPR, 2015.
6.
D. Eigen, C. Puhrsch and R. Fergus, "Depth map prediction from a single image using a multi-scale deep network", NIPS, 2014.
7.
N. Einecke and J. Eggert, "A multi-block-matching approach for stereo", Intelligent Vehicles Symposium, pp. 585-592, 2015.
8.
A. Geiger, P. Lenz, C. Stiller and R. Urtasun, "Vision meets robotics: The KITTI dataset", International Journal of Robotics Research (IJRR), 2013.
9.
J. Hays and A. A. Efros, "im2gps: estimating geographic information from a single image", CVPR, 2008.
10.
H. Hirschmüller, "Stereo processing by semiglobal matching and mutual information", PAMI, vol. 30, no. 2, pp. 328-341, 2008.
11.
F. Huguet and F. Deverney, "A variational method for scene flow estimation from stereo sequences", ICCV, 2007.
12.
Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, et al., Caffe: Convolutional architecture for fast feature embedding, 2014, [online] Available: .
13.
D. P. Kingma and J. Ba, "Adam: A method for stochastic optimization", ICLR, 2015.
14.
A. Krizhevsky, I. Sutskever and G. E. Hinton, "Imagenet classification with deep convolutional neural networks", NIPS, pp. 1106-1114, 2012.
15.
Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, et al., "Backpropagation applied to handwritten zip code recognition", Neural computation, vol. 1, no. 4, pp. 541-551, 1989.
16.
M. Menze and A. Geiger, "Object scene flow for autonomous vehicles", Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
17.
J. Quiroga, F. Devernay and J. Crowley, "Scene flow by tracking in intensity and depth data", Computer Vision and Pattern Recognition Workshops (CVPRW) 2012 IEEE Computer SocietyConference on, pp. 50-57, 2012.
18.
M. Savva, A. X. Chang and P. Hanrahan, "Semantically- Enriched 3D Models for Common-sense Knowledge", CVPR 2015 Workshop on Functionality Physics Intentionality and Causality, 2015.
19.
D. Scharstein, H. Hirschmüller, Y. Kitajima, G. Krathwohl, N. Nešić, X. Wang, et al., "High-resolution stereo datasets with subpixel-accurate ground truth" in Pattern Recognition, Springer, pp. 31-42, 2014.
20.
D. Scharstein and R. Szeliski, "A taxonomy and evaluation of dense two-frame stereo correspondence algorithms", International journal of computer vision, vol. 47, no. 1–3, pp. 7-42, 2002.
21.
N. Silberman, D. Hoiem, P. Kohli and R. Fergus, "Indoor segmentation and support inference from rgbd images", ECCV, 2012.
22.
S. Vedula, S. Baker, P. Rander, R. Collins and T. Kanade, "Three-dimensional scene flow", IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, no. 3, pp. 475-480, 2005.
23.
S. Vedula, S. Baker, P. Rander, R. T. Collins and T. Kanade, "Three-dimensional scene flow", ICCV, pp. 722-729, 1999.
24.
C. Vogel, K. Schindler and S. Roth, "3d scene flow estimation with a piecewise rigid scene model", International Journal of Computer Vision, vol. 115, no. 1, pp. 1-28, 2015.
25.
A. Wedel, T. Brox, T. Vaudrey, C. Rabe, U. Franke and D. Cremers, "Stereoscopic scene flow computation for 3d motion understanding", International Journal of Computer Vision, vol. 95, no. 1, pp. 29-51, 2010.
26.
J. Xiao, A. Owens and A. Torralba, "Sun3d: A database of big spaces reconstructed using sfm and object labels", Computer Vision (ICCV) 2013 IEEE International Conference on, pp. 1625-1632, Dec 2013.
27.
J. Žbontar and Y. LeCun, Stereo matching by training a convolutional neural network to compare image patches, 2015, [online] Available: .
28.
K. Zhang, J. Lu and G. Lafruit, "Cross-based local stereo matching using orthogonal integral images", IEEE Trans. Circuits Syst. Video Techn., vol. 19, no. 7, pp. 1073-1079, 2009.

Contact IEEE to Subscribe

References

References is not available for this document.