Loading [MathJax]/extensions/MathMenu.js
Unsupervised Learning of Depth, Optical Flow and Pose With Occlusion From 3D Geometry | IEEE Journals & Magazine | IEEE Xplore

Unsupervised Learning of Depth, Optical Flow and Pose With Occlusion From 3D Geometry


Abstract:

In autonomous driving, monocular sequences contain lots of information. Monocular depth estimation, camera ego-motion estimation and optical flow estimation in consecutiv...Show More

Abstract:

In autonomous driving, monocular sequences contain lots of information. Monocular depth estimation, camera ego-motion estimation and optical flow estimation in consecutive frames are high-profile concerns recently. By analyzing tasks above, pixels in the middle frame are modeled into three parts: the rigid region, the non-rigid region, and the occluded region. In joint unsupervised training of depth and pose, we can segment the occluded region explicitly. The occlusion information is used in unsupervised learning of depth, pose and optical flow, as the image reconstructed by depth-pose and optical flow will be invalid in occluded regions. A less-than-mean mask is designed to further exclude the mismatched pixels interfered with by motion or illumination change in the training of depth and pose networks. This method is also used to exclude some trivial mismatched pixels in the training of the optical flow network. Maximum normalization is proposed for depth smoothness term to restrain depth degradation in textureless regions. In the occluded region, as depth and camera motion can provide more reliable motion estimation, they can be used to instruct unsupervised learning of optical flow. Our experiments in KITTI dataset demonstrate that the model based on three regions, full and explicit segmentation of the occlusion region, the rigid region, and the non-rigid region with corresponding unsupervised losses can improve performance on three tasks significantly. The source code is available at: https://github.com/guangmingw/DOPlearning.
Published in: IEEE Transactions on Intelligent Transportation Systems ( Volume: 23, Issue: 1, January 2022)
Page(s): 308 - 320
Date of Publication: 29 July 2020

ISSN Information:

Funding Agency:

References is not available for this document.

I. Introduction

In autonomous driving, it’s a key issue to get the depth of the scene and its localization to construct the map. Lasers bring more accurate information [1], [2], but at the same time, it increases costs and needs calibration [3]. Using only inexpensive vision sensors can get dense information, which is also closer to the way people perceive information when driving. However, the traditional visual SLAM method relies heavily on artificial design features [4]. It is also not robust enough for changes in the environment, and it is easy to lose features in dynamic environments and fail outdoors.

Select All
1.
K. Park, S. Kim and K. Sohn, "High-precision depth estimation using uncalibrated LiDAR and stereo fusion", IEEE Trans. Intell. Transp. Syst., vol. 21, no. 1, pp. 321-335, Jan. 2020.
2.
H. Yin, Y. Wang, X. Ding, L. Tang, S. Huang and R. Xiong, "3D LiDAR-based global localization using siamese neural network", IEEE Trans. Intell. Transp. Syst., vol. 21, no. 4, pp. 1380-1392, Apr. 2020.
3.
Y. Zhuang, F. Yan and H. Hu, "Automatic extrinsic self-calibration for fusing data from monocular vision and 3-D laser scanner", IEEE Trans. Instrum. Meas., vol. 63, no. 7, pp. 1874-1876, Jul. 2014.
4.
R. Mur-Artal and J. D. Tardos, "ORB-SLAM2: An open-source SLAM system for monocular stereo and RGB-D cameras", IEEE Trans. Robot., vol. 33, no. 5, pp. 1255-1262, Oct. 2017.
5.
K. Tateno, F. Tombari, I. Laina and N. Navab, "CNN-SLAM: Real-time dense monocular SLAM with learned depth prediction", Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), pp. 6565-6574, Jul. 2017.
6.
J. Engel, T. Schöps and D. Cremers, "LSD-SLAM: LargeScale direct monocular SLAM", Proc. Eur. Conf. Comput. Vis., pp. 834-849, Sep. 2014.
7.
D. Eigen, C. Puhrsch and R. Fergus, "Depth map prediction from a single image using a multi-scale deep network", Proc. Adv. Neural Inf. Process. Syst., pp. 2366-2374, 2014.
8.
F. Liu, C. Shen, G. Lin and I. Reid, "Learning depth from single monocular images using deep convolutional neural fields", IEEE Trans. Pattern Anal. Mach. Intell., vol. 38, no. 10, pp. 2024-2039, Oct. 2016.
9.
A. Geiger, P. Lenz, C. Stiller and R. Urtasun, "Vision meets robotics: The KITTI dataset", Int. J. Robot. Res., vol. 32, no. 11, pp. 1231-1237, Sep. 2013.
10.
R. Garg, G. Carneiro and I. Reid, "Unsupervised CNN for single view depth estimation: Geometry to the rescue", Proc. Eur. Conf. Comput. Vis., pp. 740-756, 2016.
11.
C. Godard, O. M. Aodha and G. J. Brostow, "Unsupervised monocular depth estimation with left-right consistency", Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), pp. 6602-6611, Jul. 2017.
12.
H. Zhan, R. Garg, C. S. Weerasekera, K. Li, H. Agarwal and I. M. Reid, "Unsupervised learning of monocular depth estimation and visual odometry with deep feature reconstruction", Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., pp. 340-349, Jun. 2018.
13.
R. Li, S. Wang, Z. Long and D. Gu, "UnDeepVO: Monocular visual odometry through unsupervised deep learning", Proc. IEEE Int. Conf. Robot. Autom. (ICRA), pp. 7286-7291, May 2018.
14.
T. Zhou, M. Brown, N. Snavely and D. G. Lowe, "Unsupervised learning of depth and ego-motion from video", Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), pp. 6612-6619, Jul. 2017.
15.
R. Mahjourian, M. Wicke and A. Angelova, "Unsupervised learning of depth and ego-motion from monocular video using 3D geometric constraints", Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., pp. 5667-5675, Jun. 2018.
16.
C. Wang, J. M. Buenaposada, R. Zhu and S. Lucey, "Learning depth from monocular videos using direct methods", Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., pp. 2022-2030, Jun. 2018.
17.
Y. Almalioglu, M. R. U. Saputra, P. P. B. D. Gusmao, A. Markham and N. Trigoni, "GANVO: Unsupervised deep monocular visual odometry and depth estimation with generative adversarial networks", Proc. Int. Conf. Robot. Autom. (ICRA), pp. 5474-5480, May 2019.
18.
T. Shen et al., "Beyond photometric loss for self-supervised ego-motion estimation", Proc. Int. Conf. Robot. Autom. (ICRA), pp. 6359-6365, May 2019.
19.
G. Wang, H. Wang, Y. Liu and W. Chen, "Unsupervised learning of monocular depth and ego-motion using multiple masks", Proc. Int. Conf. Robot. Autom. (ICRA), pp. 4724-4730, May 2019.
20.
Z. Yang, P. Wang, Y. Wang, W. Xu and R. Nevatia, "LEGO: Learning edge with geometry all at once by watching videos", Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., pp. 225-234, Jun. 2018.
21.
Z. Yin and J. Shi, "GeoNet: Unsupervised learning of dense depth optical flow and camera pose", Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., pp. 1983-1992, Jun. 2018.
22.
Y. Zou, Z. Luo and J. B. Huang, "Df-Net: Unsupervised joint learning of depth and flow using cross-task consistency", Proc. Eur. Conf. Comput. Vis., pp. 36-53, Sep. 2018.
23.
A. Ranjan et al., "Competitive collaboration: Joint unsupervised learning of depth camera motion optical flow and motion segmentation", Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), pp. 12240-12249, Jun. 2019.
24.
M. Menze, C. Heipke and A. Geiger, "Object scene flow", ISPRS J. Photogram. Remote Sens. (JPRS), vol. 140, pp. 60-76, Jun. 2018.
25.
E. Ilg, N. Mayer, T. Saikia, M. Keuper, A. Dosovitskiy and T. Brox, "FlowNet 2.0: Evolution of optical flow estimation with deep networks", Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), pp. 2462-2470, Jul. 2017.
26.
A. Ranjan and M. J. Black, "Optical flow estimation using a spatial pyramid network", Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), pp. 4161-4170, Jul. 2017.
27.
S. Alletto, D. Abati, S. Calderara, R. Cucchiara and L. Rigazio, "Self-supervised optical flow estimation by projective bootstrap", IEEE Trans. Intell. Transp. Syst., vol. 20, no. 9, pp. 3294-3302, Sep. 2019.
28.
S. Meister, J. Hur and S. Roth, "UnFlow: Unsupervised learning of optical flow with a bidirectional census loss", Proc. AAAI Conf. Artif. Intell., pp. 7251-7259, 2018.
29.
Y. Wang, Y. Yang, Z. Yang, L. Zhao, P. Wang and W. Xu, "Occlusion aware unsupervised learning of optical flow", Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., pp. 4884-4893, Jun. 2018.
30.
J. Janai, F. Guney, A. Ranjan, M. Black and A. Geiger, "Unsupervised learning of multi-frame optical flow with occlusions", Proc. Eur. Conf. Comput. Vis., pp. 690-706, Sep. 2018.
Contact IEEE to Subscribe

References

References is not available for this document.