I. INTRODUCTION
Semantic segmentation is a fundamental task in computer vision that assigns pixel-level labels to images. Deep convolutional neural networks, starting with Fully Convolutional Networks (FCN) [1], have significantly improved semantic segmentation performance. The demand for real-time semantic segmentation has grown rapidly in applications such as autonomous driving [2], [2]–[4], video surveillance, and robot sensing [5]–[7], driving the need for efficient segmentation networks, especially in the mobile domain. While models like U-Net [8] have shown excellent accuracy, their real-time inference capability is limited. To address this, lightweight backbones and feature fusion/aggregation modules have been explored in methods like MLFNet [9] and BiSeNet [10] to balance speed and accuracy. However, reducing the input resolution can result in loss of fine details. BiSeNet tackles this by combining low-level details and high-level semantics, while HRNet [11] and DDRNet [12] propose parallelism for improved accuracy. Various approaches, including Atrous Spatial Pyramid Pooling (ASPP) [13], Pyramid Pooling Module (PPM) [14], Depthwiseconv Spatial Pyramid (DSP) [9], and Deep Aggregation Pyramid Pooling Module (DAPPM) [12], have also explored capturing semantic information at different scales using receptive fields of different sizes (n×n). However, relying solely on n×n receptive fields is deemed insufficient [15]. In this work, we propose integrating receptive fields of size n×1 and 1×n to enhance the model’s ability to capture semantic information across scales.