Loading [MathJax]/extensions/MathMenu.js
Drop Sparse Convolution for 3D Object Detection | IEEE Conference Publication | IEEE Xplore

Drop Sparse Convolution for 3D Object Detection


Abstract:

3D object detection based on point clouds is crucial for the safety of autonomous vehicles. The 3D feature network significantly impacts the feature extraction of the obj...Show More

Abstract:

3D object detection based on point clouds is crucial for the safety of autonomous vehicles. The 3D feature network significantly impacts the feature extraction of the object detectors. However, the 3D feature network often exhibits poor performance due to submanifold dilation. In this paper, we propose a drop sparse convolution network, built using drop sparse convolution (Drop Conv), to mitigate the effects of submanifold dilation. Drop Conv calculates the activity of each feature by encoding itself and removes inactive features to preserve feature connectivity while suppressing dilated features. The proposed module can be easily integrated into existing 3D backbone networks. By combining it with existing sparse convolutions, it fills the gap in 3D backbone networks where sparsity cannot be enhanced. Extensive experiments on KITTI and nuScenes have demonstrated that the proposed method can effectively enhance the performance of state-of-the-art 3D object detectors using 3D feature networks.
Date of Conference: 14-19 April 2024
Date Added to IEEE Xplore: 18 March 2024
ISBN Information:

ISSN Information:

Conference Location: Seoul, Korea, Republic of

1. INTRODUCTION

3D feature encoding of point clouds is crucial and represents a significant challenge in point cloud object detection. Object detection relies on the extracted features, making the quality of these 3D features crucial as they directly impact every subsequent step, including the final detection result. Currently, there are primarily two approaches for encoding 3D features from point clouds. The first one employs statistics-based methods to extract information from point clouds, exemplified by PointPillars [1]. The second approach directly utilizes networks to encode information from point clouds. This includes point-based methods [2], [3] like PointNet [4] and PointNet++ [5], as well as voxel-based methods [6], [7] like Voxel RCNN [8], and more.

Contact IEEE to Subscribe

References

References is not available for this document.