Loading [MathJax]/extensions/MathMenu.js
V2X-BGN: Camera-based V2X-Collaborative 3D Object Detection with BEV Global Non-Maximum Suppression | IEEE Conference Publication | IEEE Xplore

V2X-BGN: Camera-based V2X-Collaborative 3D Object Detection with BEV Global Non-Maximum Suppression


Abstract:

In recent years, research on Vehicle-to-Everything (V2X) cooperative perception algorithms mainly focuses on the fusion of intermediate features from LiDAR point clouds. ...Show More

Abstract:

In recent years, research on Vehicle-to-Everything (V2X) cooperative perception algorithms mainly focuses on the fusion of intermediate features from LiDAR point clouds. Since the emergence of excellent single-vehicle visual perception models like BEVFormer, collaborative perception schemes based on camera and late-fusion have become feasible approaches. This paper proposes a V2X-collaborative 3D object detection structure in Bird's Eye View (BEV) space, based on global non-maximum suppression and late-fusion (V2X-BGN), and conducts experiments on the V2X-Set dataset. Focusing on complex road conditions with extreme occlusion, the paper compares the camera-based algorithm with the LiDAR-based algorithm, validating the effectiveness of pure visual solutions in the collaborative 3D object detection task. Additionally, this paper highlights the complementary potential of camera-based and LiDAR-based approaches and the importance of object-to-ego vehicle distance in the collaborative 3D object detection task.
Date of Conference: 02-05 June 2024
Date Added to IEEE Xplore: 15 July 2024
ISBN Information:

ISSN Information:

Conference Location: Jeju Island, Korea, Republic of

Funding Agency:


I. INTRODUCTION

Accurate 3D perception of the surrounding environment is fundamental for achieving autonomous driving. Since the rise of deep learning, research on object detection algorithms has become one of the important research focuses in the field of computer vision. Following remarkable progress in image-based 2D object detection, researchers have gradually shifted their focus to more complex 3D object detection tasks. Compared to 2D object detection, which only requires determining the pixel positions of objects in images, 3D object detection requires predicting accurate 3D spatial coordinate information of objects. Currently, mainstream solutions for 3D object detection tasks mainly fall into three categories. First, at the hardware level, sensors such as LiDAR that can directly acquire 3D spatial information of objects are utilized. Second, at the algorithm level, methods using multi-view geometry or deep neural networks are employed to extract depth information from 2D images, further inferring the 3D positional and geometrical information of objects. Third, leveraging the advantages of both hardware and algorithms, the fusion of 3D point clouds from LiDAR and 2D images from RGB cameras is achieved. This approach utilizes mature 2D image feature extraction algorithms and reliable 3D coordinate information from point clouds to achieve more accurate object detection and classification.

Contact IEEE to Subscribe

References

References is not available for this document.