I. Introduction
The point-wise segmentation task aiming to determine the semantic label point-by-point in the entire point clouds is a remarkably essential process to support extensive applications, including intelligent robotics, autonomous vehicles, and digital twins. Compared to the 2-D optical images, the 3-D point clouds could more precisely and frequently monitor the spatial information, orientation, and geometric shape attributes of road objects. Most significantly, they are less sensitive to illumination conditions, shadow influence, and viewpoint variations [1]. Unlike 2-D images with a regular grid structure, 3-D point clouds captured by light detection and ranging (LiDAR) sensors are in an unorganized data format and disordered distributions, making it challenging to achieve efficient and accurate road object semantic segmentation, especially in complex and large-scale urban areas [2].