1. Introduction
Perceiving 3D shapes and poses of surrounding obstacles is an essential task in autonomous driving (AD) perception systems. The accuracy and speed performance of 3D objection detection is important for the following motion planning and control modules in AD. Many 3D object detectors [50], [14] have been proposed, mainly for depth sensors such as LiDAR [35], [45] or stereo cameras [46], [18], which can provide the distance information of the environments directly. However, LiDAR sensors are expensive and stereo rigs suffer from on-line calibration issues. Therefore, monocular camera based 3D object detection becomes a promising direction.