DAPSPNet: Deep Aggregation Pyramid Strip Pooling Network for Real-time and Accurate Segmentation | IEEE Conference Publication | IEEE Xplore

DAPSPNet: Deep Aggregation Pyramid Strip Pooling Network for Real-time and Accurate Segmentation


Abstract:

This paper introduces an efficient Convolutional Neural Networks (CNN) architecture named DAPSPNet for Real-time semantic segmentation. We propose a novel dual-resolution...Show More

Abstract:

This paper introduces an efficient Convolutional Neural Networks (CNN) architecture named DAPSPNet for Real-time semantic segmentation. We propose a novel dual-resolution network, DAPSPNet, and augment it with strip pooling in the multi-scale feature extraction module to extract strip-shaped features more effectively. The convolution kernels have lengths of 5, 9, and 17, with a width of 1. We chose strip pooling as the supplement for two reasons. First, strip pooling is a lightweight technique that reduces the number of parameters and computations involved in pooling operations. Second, there are also some strip-shaped features in the contextual information, which are in line with the needs of real-time semantic segmentation of road scenes. Extensive experimental evaluations on the Cityscapes dataset demonstrate the competitive performance of DAPSPNet compared to several state-of-the-art methods in most scenarios.
Date of Conference: 07-09 November 2023
Date Added to IEEE Xplore: 26 December 2023
ISBN Information:
Conference Location: Orlando, FL, USA

Funding Agency:


I. INTRODUCTION

Semantic segmentation is a fundamental task in computer vision that assigns pixel-level labels to images. Deep convolutional neural networks, starting with Fully Convolutional Networks (FCN) [1], have significantly improved semantic segmentation performance. The demand for real-time semantic segmentation has grown rapidly in applications such as autonomous driving [2], [2]–[4], video surveillance, and robot sensing [5]–[7], driving the need for efficient segmentation networks, especially in the mobile domain. While models like U-Net [8] have shown excellent accuracy, their real-time inference capability is limited. To address this, lightweight backbones and feature fusion/aggregation modules have been explored in methods like MLFNet [9] and BiSeNet [10] to balance speed and accuracy. However, reducing the input resolution can result in loss of fine details. BiSeNet tackles this by combining low-level details and high-level semantics, while HRNet [11] and DDRNet [12] propose parallelism for improved accuracy. Various approaches, including Atrous Spatial Pyramid Pooling (ASPP) [13], Pyramid Pooling Module (PPM) [14], Depthwiseconv Spatial Pyramid (DSP) [9], and Deep Aggregation Pyramid Pooling Module (DAPPM) [12], have also explored capturing semantic information at different scales using receptive fields of different sizes (n×n). However, relying solely on n×n receptive fields is deemed insufficient [15]. In this work, we propose integrating receptive fields of size n×1 and 1×n to enhance the model’s ability to capture semantic information across scales.

Contact IEEE to Subscribe

References

References is not available for this document.