Abstract:
Prediction, decision-making, and motion planning are essential for autonomous driving. In most contemporary works, they are considered individual modules or combined into...Show MoreMetadata
Abstract:
Prediction, decision-making, and motion planning are essential for autonomous driving. In most contemporary works, they are considered individual modules or combined into a multi-task learning paradigm. However, we argue that they should be integrated into a comprehensive framework. Although several recent approaches follow this scheme, they suffer from complicated input representations or redundant framework designs. To address these issues, we incorporate only the required modules into a minimalist framework. We propose BEVGPT, a generative pre-trained foundation model that integrates driving scenario prediction, decision-making, and motion planning. The model takes the bird's-eye-view (BEV) images as the sole input source and conducts driving decision-making. To ensure driving trajectory feasibility and smoothness, we develop an optimization-based motion planning method. We instantiate BEVGPT on the Lyft Dataset and use L5Kit for realistic driving simulation. The effectiveness and generalization ability are verified by the fact that BEVGPT outperforms previous methods in 100% decision-making and 66% motion planning metrics. Furthermore, the ability of our framework in long-term BEV generation is demonstrated through the driving scenario prediction tasks. To the best of our knowledge, this is the first generative pre-trained foundation model for autonomous driving prediction, decision-making, and motion planning with only BEV images as input.
Published in: IEEE Transactions on Intelligent Vehicles ( Early Access )