Journals & Magazines >IEEE Robotics and Automation ... >Volume: 10 Issue: 4

BEVCon: Advancing Bird's Eye View Perception With Contrastive Learning

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

We present BEVCon, a simple yet effective contrastive learning framework designed to improve Bird's Eye View (BEV) perception in autonomous driving. BEV perception offers...Show More

Metadata

Abstract:

We present BEVCon, a simple yet effective contrastive learning framework designed to improve Bird's Eye View (BEV) perception in autonomous driving. BEV perception offers a top-down-view representation of the surrounding environment, making it crucial for 3D object detection, segmentation, and trajectory prediction tasks. While prior work has primarily focused on enhancing BEV encoders and task-specific heads, we address the underexplored potential of representation learning in BEV models. BEVCon introduces two contrastive learning modules: an instance feature contrast module for refining BEV features and a perspective view contrast module that enhances the image backbone. The dense contrastive learning designed on top of detection losses leads to improved feature representations across both the BEV encoder and the backbone. Extensive experiments on the nuScenes dataset demonstrate that BEVCon achieves consistent performance gains, achieving up to +2.4% mAP improvement over state-of-the-art baselines. Our results highlight the critical role of representation learning in BEV perception and offer a complementary avenue to conventional task-specific optimizations.

Published in: IEEE Robotics and Automation Letters ( Volume: 10, Issue: 4, April 2025)

Page(s): 3158 - 3165

Date of Publication: 10 February 2025

ISSN Information:

DOI: 10.1109/LRA.2025.3540386

Funding Agency:

No metrics found for this document.

Contents

I. Introduction

In Recent years, Bird's Eye View (BEV) perception has emerged as a crucial component in autonomous driving and robotic systems [1], [2], [3], [4]. Its ability to aggregate multi-view data and transform the surrounding environment in a unified top-down-view representation makes it highly effective and versatile for tasks like object detection, segmentation, trajectory prediction, and planning [5], [6]. A typical BEV perception model architecture comprises an image backbone, a BEV encoder, and task-specific heads [7]. While many prior efforts have focused on optimizing the design of BEV encoders and task heads to improve performance [8], [9], much less attention has been paid to enhancing BEV perception from a representation learning perspective. We argue that learned representations are central to a model's performance, and improving them can lead to uniform gains across various BEV architectures, offering broader benefits complementary to task-specific designs.

Usage

Select a Year

View as

Total usage sinceFeb 2025:218

Year Total:218

Data is updated monthly. Usage includes PDF downloads and HTML views.

Citations

Search for
Citations in
Google Scholar^®

References is not available for this document.

BEVCon: Advancing Bird's Eye View Perception With Contrastive Learning

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

I. Introduction

View as

References

IEEE Account

Purchase Details

Profile Information

Need Help?

BEVCon: Advancing Bird's Eye View Perception With Contrastive Learning

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

I. Introduction

Authors

Figures

References

Keywords

Metrics

View as

Footnotes

References

IEEE Account

Purchase Details

Profile Information

Need Help?