Journals & Magazines >IEEE Transactions on Intellig... >Volume: 23 Issue: 1

Unsupervised Learning of Depth, Optical Flow and Pose With Occlusion From 3D Geometry

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

In autonomous driving, monocular sequences contain lots of information. Monocular depth estimation, camera ego-motion estimation and optical flow estimation in consecutiv...Show More

Metadata

Abstract:

In autonomous driving, monocular sequences contain lots of information. Monocular depth estimation, camera ego-motion estimation and optical flow estimation in consecutive frames are high-profile concerns recently. By analyzing tasks above, pixels in the middle frame are modeled into three parts: the rigid region, the non-rigid region, and the occluded region. In joint unsupervised training of depth and pose, we can segment the occluded region explicitly. The occlusion information is used in unsupervised learning of depth, pose and optical flow, as the image reconstructed by depth-pose and optical flow will be invalid in occluded regions. A less-than-mean mask is designed to further exclude the mismatched pixels interfered with by motion or illumination change in the training of depth and pose networks. This method is also used to exclude some trivial mismatched pixels in the training of the optical flow network. Maximum normalization is proposed for depth smoothness term to restrain depth degradation in textureless regions. In the occluded region, as depth and camera motion can provide more reliable motion estimation, they can be used to instruct unsupervised learning of optical flow. Our experiments in KITTI dataset demonstrate that the model based on three regions, full and explicit segmentation of the occlusion region, the rigid region, and the non-rigid region with corresponding unsupervised losses can improve performance on three tasks significantly. The source code is available at: https://github.com/guangmingw/DOPlearning.

Published in: IEEE Transactions on Intelligent Transportation Systems ( Volume: 23, Issue: 1, January 2022)

Page(s): 308 - 320

Date of Publication: 29 July 2020

ISSN Information:

DOI: 10.1109/TITS.2020.3010418

Funding Agency:

Contents

I. Introduction

In autonomous driving, it’s a key issue to get the depth of the scene and its localization to construct the map. Lasers bring more accurate information [1], [2], but at the same time, it increases costs and needs calibration [3]. Using only inexpensive vision sensors can get dense information, which is also closer to the way people perceive information when driving. However, the traditional visual SLAM method relies heavily on artificial design features [4]. It is also not robust enough for changes in the environment, and it is easy to lose features in dynamic environments and fail outdoors.

References is not available for this document.

MIT Libraries

MIT Libraries

Unsupervised Learning of Depth, Optical Flow and Pose With Occlusion From 3D Geometry

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

I. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

MIT Libraries

MIT Libraries

Unsupervised Learning of Depth, Optical Flow and Pose With Occlusion From 3D Geometry

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

I. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?