1. INTRODUCTION
3D feature encoding of point clouds is crucial and represents a significant challenge in point cloud object detection. Object detection relies on the extracted features, making the quality of these 3D features crucial as they directly impact every subsequent step, including the final detection result. Currently, there are primarily two approaches for encoding 3D features from point clouds. The first one employs statistics-based methods to extract information from point clouds, exemplified by PointPillars [1]. The second approach directly utilizes networks to encode information from point clouds. This includes point-based methods [2], [3] like PointNet [4] and PointNet++ [5], as well as voxel-based methods [6], [7] like Voxel RCNN [8], and more.