1. Introduction
Recent developments in point cloud data research have witnessed the emergence of many supervised approaches [19], [20], [33], [12], [29]. Most efforts of current research are dedicated into two tasks: point cloud shape classification (a.k.a. shape recognition) and point cloud segmentation (a.k.a. semantic segmentation). For both tasks, the success of the state-of-the-art methods is attributed mostly to the deep learning architecture [19] and the availability of large amount of labelled 3d point cloud data [16], [1]. Although the community is still focused on pushing forward in the former direction, we believe the latter issue, i.e. data annotation, is an overlooked bottleneck. In particular, it is assumed that all points for the point cloud segmentation task are provided with ground-truth labels, which is often in the range of 1k to 10k points for a 3d shape [34], [16]. The order of magnitude increases drastically to millions of points for a real indoor scene [11]. As a result, very accurate labels for billions of points are needed in a dataset to train good segmentation models. Despite the developments of modern annotation toolkits [16], [1] to facilitate large-scale annotation, exhaustive labelling is still prohibitively expensive for ever growing new datasets.
Illustration of the weak supervision concept in this work. Our approach achieves segmentation with only a fraction of labelled points‥