Abstract:
We propose an end-to-end network that takes a single perspective RGB image of a complex road scene as input, to produce occlusion-reasoned layouts in perspective space as...Show MoreMetadata
Abstract:
We propose an end-to-end network that takes a single perspective RGB image of a complex road scene as input, to produce occlusion-reasoned layouts in perspective space as well as a parametric bird's-eye-view (BEV) space. In contrast to prior works that require dense supervision such as semantic labels in perspective view, our method only requires human annotations for parametric attributes that are cheaper and less ambiguous to obtain. To solve this challenging task, our design is comprised of modules that incorporate inductive biases to learn occlusion-reasoning, geometric transformation and semantic abstraction, where each module may be supervised by appropriately transforming the parametric annotations. We demonstrate how our design choices and proposed deep supervision help achieve meaningful representations and accurate predictions. We validate our approach on two public datasets, KITTI and NuScenes, to achieve state-of-the-art results with considerably less human supervision.
Date of Conference: 18-24 June 2022
Date Added to IEEE Xplore: 27 September 2022
ISBN Information:
ISSN Information:
Keywords assist with retrieval of results and provide a means to discovering other relevant content. Learn more.
- IEEE Keywords
- Index Terms
- Road Layout ,
- Single Image ,
- RGB Images ,
- Visual Perspective ,
- Meaningful Representation ,
- Geometric Transformation ,
- Inductive Bias ,
- Street Scenes ,
- Deep Supervision ,
- Visible Light ,
- Performance Of Method ,
- F1 Score ,
- Parametrized ,
- Hallucinations ,
- Multilayer Perceptron ,
- Semantic Segmentation ,
- Top View ,
- Depth Estimation ,
- Output Of Module ,
- Continuous Attributes ,
- KITTI Dataset ,
- Occluded Regions ,
- Intermediate Representation ,
- Scene Understanding ,
- Material For More Details ,
- Camera Intrinsics ,
- Semantic Annotation ,
- Transformation Module ,
- Prediction Module ,
- Performance Gap
- Author Keywords
Keywords assist with retrieval of results and provide a means to discovering other relevant content. Learn more.
- IEEE Keywords
- Index Terms
- Road Layout ,
- Single Image ,
- RGB Images ,
- Visual Perspective ,
- Meaningful Representation ,
- Geometric Transformation ,
- Inductive Bias ,
- Street Scenes ,
- Deep Supervision ,
- Visible Light ,
- Performance Of Method ,
- F1 Score ,
- Parametrized ,
- Hallucinations ,
- Multilayer Perceptron ,
- Semantic Segmentation ,
- Top View ,
- Depth Estimation ,
- Output Of Module ,
- Continuous Attributes ,
- KITTI Dataset ,
- Occluded Regions ,
- Intermediate Representation ,
- Scene Understanding ,
- Material For More Details ,
- Camera Intrinsics ,
- Semantic Annotation ,
- Transformation Module ,
- Prediction Module ,
- Performance Gap
- Author Keywords