Loading [MathJax]/extensions/MathZoom.js
BEVCon: Advancing Bird's Eye View Perception With Contrastive Learning | IEEE Journals & Magazine | IEEE Xplore

BEVCon: Advancing Bird's Eye View Perception With Contrastive Learning


Abstract:

We present BEVCon, a simple yet effective contrastive learning framework designed to improve Bird's Eye View (BEV) perception in autonomous driving. BEV perception offers...Show More

Abstract:

We present BEVCon, a simple yet effective contrastive learning framework designed to improve Bird's Eye View (BEV) perception in autonomous driving. BEV perception offers a top-down-view representation of the surrounding environment, making it crucial for 3D object detection, segmentation, and trajectory prediction tasks. While prior work has primarily focused on enhancing BEV encoders and task-specific heads, we address the underexplored potential of representation learning in BEV models. BEVCon introduces two contrastive learning modules: an instance feature contrast module for refining BEV features and a perspective view contrast module that enhances the image backbone. The dense contrastive learning designed on top of detection losses leads to improved feature representations across both the BEV encoder and the backbone. Extensive experiments on the nuScenes dataset demonstrate that BEVCon achieves consistent performance gains, achieving up to +2.4% mAP improvement over state-of-the-art baselines. Our results highlight the critical role of representation learning in BEV perception and offer a complementary avenue to conventional task-specific optimizations.
Published in: IEEE Robotics and Automation Letters ( Volume: 10, Issue: 4, April 2025)
Page(s): 3158 - 3165
Date of Publication: 10 February 2025

ISSN Information:

Funding Agency:

No metrics found for this document.

I. Introduction

In Recent years, Bird's Eye View (BEV) perception has emerged as a crucial component in autonomous driving and robotic systems [1], [2], [3], [4]. Its ability to aggregate multi-view data and transform the surrounding environment in a unified top-down-view representation makes it highly effective and versatile for tasks like object detection, segmentation, trajectory prediction, and planning [5], [6]. A typical BEV perception model architecture comprises an image backbone, a BEV encoder, and task-specific heads [7]. While many prior efforts have focused on optimizing the design of BEV encoders and task heads to improve performance [8], [9], much less attention has been paid to enhancing BEV perception from a representation learning perspective. We argue that learned representations are central to a model's performance, and improving them can lead to uniform gains across various BEV architectures, offering broader benefits complementary to task-specific designs.

Usage
Select a Year
2025

View as

Total usage sinceFeb 2025:218
020406080100120140JanFebMarAprMayJunJulAugSepOctNovDec094124000000000
Year Total:218
Data is updated monthly. Usage includes PDF downloads and HTML views.

Contact IEEE to Subscribe

References

References is not available for this document.