1. Introduction
Understanding road layout from images is essential for real-world applications such as autonomous driving or path planning [5], [8], [13], [31], where, besides the usual perspective space outputs, top-view representations of geometry and semantics have been popular. Non-parametric representations such as pixel-level semantics [31] generally require labor-intensive and potentially ambiguous supervision in the top-view, for example, when dealing with occluded regions. On the other hand, parametric representations for top-view layouts are desirable for their interpretability, which is beneficial for higher-level reasoning and decision-making in downstream applications.