I. Introduction
Semantic segmentation, a popular task in computer vision, is gaining increasing popularity in the domain of fisheye images, particularly in the context of autonomous driving. Fisheye images possess a wide field-of-view (FOV), ranging from 100° to 180°, allowing them to capture a larger amount of information from the surrounding environment. This charac-teristic makes fisheye images widely utilized in autonomous driving [1], surveillance [2], and augmented reality (AR) [3] applications. However, the advantage of fisheye images comes at the cost of significant optical distortion caused by the highly non-linear mapping of real-world scenes captured by fisheye lenses [4]. Correcting this radial distortion introduces various drawbacks, such as reduced field-of-view and resampling of distorted features in the periphery [5]. Consequently, previous approaches that involved unwrapping fisheye images into rectilinear images for semantic segmentation have not yielded satisfactory performance [6], [7].