Introduction
Compact 3D shape representation is fundamental to 3D computer vision, which underpinning numerous downstream tasks crucial for understanding the physical world, such as shape classification [1], semantic segmentation [2], and object detection [3]. Point clouds offer an efficient data format for storing 3D structures, discretizing surfaces into unordered sets of points, each represented by Cartesian coordinates.
Since PointNet [2], neural-network-based methods have become the dominant approach for extracting 3D representation. Numerous sophisticated feature aggregators [1], [5] –[9] have been developed to capture local geometry using learnable parameters. These learnable networks share a common principle: representing point clouds with complicated feature aggregators in a multi-layered hierarchy. Lower layers extract low-level textures and structures, while higher layers model high-level relationships. However, Deep Neural Network (DNN)-based methods face limitations: they can be unstable and computationally expensive, often acting as black boxes with interpretability issues, and prone to overfitting on major categories.
This paper explores new perspectives on 3D data representation, prioritizing transparency, generalization and robustness over pursuing state-of-the-art performance. We observe that foreground 3D objects in most datasets exhibit near-rigid-body properties and share similar outlines within categories, even after transformations. This observation led us to develop ICP-Classifier, a simple classification pipeline comprising Iterative Closest Point(ICP) [10] registration, and a k-nearest-neighbor(kNN) classifier. Instead of relying on complex feature aggregators or multi-layer encoding, our method leverages a reference library of training data for comparison, eliminating the need for neural networks or training. For each test sample, we perform ICP registration with all reference samples across categories, generating similarity scores. Then kNN classifier is used to predict the category according to the similarity scores of the reference samples. Fig. 1 contrasts the structure of ICP-Classifier with DNN-based methods, which presents the simplicity of our method.
Extensive experiments demonstrate the effectiveness of our methodology. The robustness of our training-free pipeline ensures stable and reproducible results, surpassing previous learning-based methods, including convolutional and transformer-based approaches.
Our method offers several advantages. It’s simple, requiring no neural networks, or training. It demonstrates strong interpretability, providing a clear measure of shape similarity for decision-making. It remains stable and effective with out-of-distribution or perturbed data.
Our contributions are as follows:
We propose ICP-Classifier, a straightforward yet robust paradigm for point cloud classification that requires no learnable parameters, neural networks, or training.
We demonstrate the efficacy of our method by achieving commendable accuracy on ModelNet40 and state-of-the-art performance on OmniObject3D. Our approach also excels in the few-shot settings and facing perturbed data.
Our comparison against numerous DNN-based methods reveals inherent challenges in 3D shape analysis, particularly highlighting the discriminative power of global shape representation over local feature aggregation.
Related Work
a) DNN-based Methods
Deep neural networks(DNNs) have emerged as a dominant force in 3D shape analysis, starting with the seminal work of PointNet [2], which demonstrated the capability of neural networks to directly learn geometric information from global coordinates. PointNet++ [1] extended this approach by introducing fundamental local aggregation operators. Further advancements in capturing local structures include KPConv [6] and PointConv [7], which employ convolution-like operations in 3D. More recently, inspired by the success of Transformers [11] in natural language processing, methods like PCT [5] and Point Transformer [9] have integrated self-attention mechanisms into point cloud feature learning.
Training point cloud networks typically relies on supervised learning, involving feeding labeled data and updating network parameters via backpropagation. However, the limited availability of labeled 3D data has spurred extensive research into pre-training techniques to improve data efficiency. For instance, FoldingNet [12] employs a warped grid to guide an auto-encoder for point cloud representation learning. Similarly, PointContrast [13] leverages a proxy task of point cloud registration to enforce geometric detail during pre-training. Inspired by advancements in pre-training for Natural Language Processing (NLP), several analogous approaches have been proposed for the point cloud domain, with PointBERT [14] being a notable example.
b) Training-Free Methods
Beyond DNN-based approaches, several studies have explored training-free methodologies for 3D shape analysis. Recent works [15] –[19] have leveraged the pre-trained CLIP model [20] to achieve zero-shot learning capabilities, drawing inspiration from transfer learning frameworks established in 2D image analysis and natural language processing. Similarly, Point-NN [21] conducts 3D shape analysis using non-parametric modules based on farthest point sampling (FPS), k-nearest neighbors (k-NN), and pooling operations.
Approach
This work aims to develop a straightforward approach for incorporating point cloud registration into a classification framework. Our proposed paradigm consists of three key components: (1) an ICP-based registration module, (2) a similarity metric that leverages fitness and RMSE scores obtained from the registration process, and (3) a k-nearest-neighbor (kNN) classifier. This approach is motivated by the observation that foreground 3D objects in most existing datasets exhibit near-rigid-body properties. Consequently, objects within the same category tend to share similar outlines, even after undergoing various transformations.
Fig. 2 illustrates the complete pipeline of our proposed approach. Given an input test point cloud Xtest, ICP-Classifier first performs pairwise registration between Xtest and all samples within a reference library. This reference library is constructed from the training data, ensuring no overlap with the test set. The registration process generates similarity scores S(Xtest, Xi) between the test data Xtest and each reference point cloud Xi in the dataset
Iterative Closest Point
(ICP) is one of the classic algorithms for point cloud registration. Given two point clouds, ICP computes a transformation matrix that aligns them by optimizing for the highest degree of geometric overlap, often measured by minimizing a predefined error metric.
We utilize the ICP algorithm provided by the open-source library Open3D [22], which is known for its ease of use and wide applicability. Formally, given two corresponding point sets P ={p1,…,pn}, Q={q1,…,qn}, in which \begin{equation*}E({\mathbf{R}},t) = \frac{1}{n}\sum\limits_{i = 1}^n {{{\left\| {{p_i} - {\mathbf{R}}{q_i} - t} \right\|}^2}} ,\tag{1}\end{equation*}
Multi-Scale ICP
To optimize alignment, we employ Multi-Scale ICP, which downsample the point cloud using various voxel sizes. Initially, larger voxel sizes are used for coarser point clouds, facilitating efficient initial alignment with fewer iterations. Subsequently, finer point clouds are processed as voxel sizes decrease.
For a given input point cloud, we design the criterion from the output of Multi-Scale ICP algorithm to quantify the input’s similarity with reference data. Subsequently, we identify the top-matching reference data and employ the kNN Classifier to ascertain the category of the point cloud.
Open3D’s ICP algorithm outputs a transformation and two alignment metrics: fitness (ratio of inliers to total points in the target cloud, higher is better) and inlier RMSE (root mean square error of inlier correspondences, lower is better). Their difference serves as our point cloud similarity criterion. Formally, given a point cloud with n points \begin{equation*}ICP\left( {{X_{test{}}},{X_i}} \right) = F\left( {{X_{test{}}},{X_i}} \right) - E\left( {{X_{test{}}},{X_i}} \right),\tag{2}\end{equation*}
Moreover, we propose the Dual-Evaluation method to improve the performance. We start by treating test data as the source and reference data as the target, and then we reverse the roles, treating the reference data as the source and the test data as the target. The sum of these two evaluations yields the final similarity score. This bidirectional comparison enhances robustness and ensures a more comprehensive assessment of similarity.
\begin{equation*}{S_i}\left( {{X_{test{}}}} \right) = ICP\left( {{X_{test{}}},{X_i}} \right) + ICP\left( {{X_i},{X_{test{}}}} \right),\tag{3}\end{equation*}
KNN Classifier
The kNN classifier operates on the principle of plurality voting. In this case, the k nearest neighbors of an object are determined based on the similarity score, with the closest neighbor having the highest score. Then the object is classified through a plurality vote of its neighbors, the result being the class most common among its k nearest neighbors.
\begin{gather*}{N_k}\left( {{X_{test{}}}} \right) = \left\{ {\left( {{X_i},{y_i}} \right) \in D|rank\left( {{{\left\{ {{S_i}\left( {{X_{test{}}}} \right)} \right\}}_{i \in N}}} \right) \leq k} \right\},\tag{4} \\ \hat y = \arg \mathop {\max }\limits_{c \in \{ 1,2, \ldots ,C\} } \sum\limits_{\left( {{X_i},{y_i}} \right) \in {N_k}\left( {{X_{{\text{test }}}}} \right)} I \left( {{y_i} = c} \right),\tag{5}\end{gather*}
Furthermore, we introduce the average kNN to enhance the k-nearest neighbors (kNN) algorithm. In the conventional kNN approach. Conventional kNN typically selects the top-k similarity scores and makes the prediction based on the majority among them. Building upon this, we additionally calculate the average score of the top classes. Subsequently, the classes are sorted according to these averages, and the class with the highest average score will be the result. Formally, there are c different classes in the top-k similarity scores.
\begin{equation*}{\hat y^{\ast}} = max\left( {\left\{ {{{\bar S}^j}\left( {{X_{test}}} \right)} \right\}_{j = 1}^c} \right)\tag{6}\end{equation*}
Experiments
We conduct extensive experiments both on a classic synthetic dataset ModelNet40 [23] and a real-world dataset Om-niObject3D [24].
A. 3D Shape Classification
We compare our ICP-Classifier with other training-free methods for object classification on ModelNet40. The results are reported in Table I. Our method achieves SOTA performance among training-free methods by capturing globally discriminative shape information.
B. Generalization
Following [24], we evaluate ICP-Classifier on the OmniOb-ject3D dataset to assess its generalization capability on the out-of-distribution style. The results are reported in Table II. Our method surpasses all fully trained neural networks by at least 9%, as it captures the underlying structure of the data without becoming overly specialized to training data.
C. Few-shot Learning
We evaluate ICP-Classifier under the few-shot learning setting. We compute the average accuracy of ten independent runs. As reported in Table III, our paradigm achieved the top ranking across various settings. Since it does not require a training process, it can perform well with limited labeled data.
D. Robustness to Attack
We compare the performance of our method with DNN methods when confronted with realistic data or perturbed data, including rotation, 3d-adv attack, and high-drop. The experimental results show the robustness of ICP-Classifier, highlighting its advantage of being training-free.
E. Ablation Study
The results of ablation study on dual-evaluation, knn classifier and ICP threshold are reported in Table VI, Table VII and Table VII. These designs improve the performance of our method.
Comparison of different methods under high-drop attack on Mod-elNet40. Points with high contribution scores are dropped in the high-drop attack.
Conclusion
This paper presents a reevaluation of 3D shape analysis and introduces ICP-Classifier, a simple yet robust approach for 3D shape classification. Notably, ICP-Classifier is entirely free of neural networks and requires no training. Rather than aiming for high accuracy or immediate practicality, it serves as an exploratory approach to demonstrate the importance of prioritizing global shape extraction over local feature aggregation. Building upon these insights, future work will investigate integrating this principle into deep learning model architectures. This line of inquiry holds promise for advancing the field of 3D shape analysis by improving robustness and generalization ability.
ACKNOWLEDGMENT
This work is supported by the Collaborative Innovation Center of Novel Software Technology and Industrialization.