Conferences >ICASSP 2025 - 2025 IEEE Inter...

3D Shape Classification by Registration: Neural-Network-Free and Training-Free

Abstract:

Point cloud classification, crucial for discriminative 3D shape analysis, has witnessed significant progress through the application of deep learning. A significant resea...Show More

Metadata

Abstract:

Point cloud classification, crucial for discriminative 3D shape analysis, has witnessed significant progress through the application of deep learning. A significant research focus has been on aggregating local point cloud features. A key limitation of previous methods lies in their inherent opacity, making it challenging to understand the underlying reasons for their predictions. Furthermore, these methods frequently exhibit poor generalization performance, struggling to maintain their efficacy when applied to data that deviates from their training distribution. Our method aims to tackle these challenges. We propose ICP-Classifier, a neural-network-free and training-free paradigm that uses Iterative Closest Point (ICP) for classification, which is simple yet robust. The inherent transparency of our method allows for straightforward interpretation of its predictions and underlying mechanisms. By comparing test samples with a reference library, ICP-Classifier predicts labels based on overlap ratios, showing strong generalization on out-of-distribution datasets and robustness facing perturbed data. While capable of achieving high accuracy, our work is exploratory in nature, aiming to explore potential solutions for existing challenges. In contrast to the current focus on complex local feature aggregation, our findings suggest that the inherent shape of 3D objects holds significant discriminative potential, opening up new avenues for exploring robust methods and deepening our understanding of 3D shape analysis.

Published in: ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Date of Conference: 06-11 April 2025

Date Added to IEEE Xplore: 07 March 2025

ISBN Information:

ISSN Information:

DOI: 10.1109/ICASSP49660.2025.10890117

Conference Location: Hyderabad, India

Contents

SECTION I.

Introduction

Compact 3D shape representation is fundamental to 3D computer vision, which underpinning numerous downstream tasks crucial for understanding the physical world, such as shape classification [1], semantic segmentation [2], and object detection [3]. Point clouds offer an efficient data format for storing 3D structures, discretizing surfaces into unordered sets of points, each represented by Cartesian coordinates.

Since PointNet [2], neural-network-based methods have become the dominant approach for extracting 3D representation. Numerous sophisticated feature aggregators [1], [5] –[9] have been developed to capture local geometry using learnable parameters. These learnable networks share a common principle: representing point clouds with complicated feature aggregators in a multi-layered hierarchy. Lower layers extract low-level textures and structures, while higher layers model high-level relationships. However, Deep Neural Network (DNN)-based methods face limitations: they can be unstable and computationally expensive, often acting as black boxes with interpretability issues, and prone to overfitting on major categories.

Fig. 1.

A simplified structure of our ICP-Classifier compared with Point-Net [2], DGCNN [4], and PCT [5]. This illustration underscores the simplicity of our method, devoid of any learnable parameters. "SOP" denotes symmetric operations, such as max pooling.

Show All

This paper explores new perspectives on 3D data representation, prioritizing transparency, generalization and robustness over pursuing state-of-the-art performance. We observe that foreground 3D objects in most datasets exhibit near-rigid-body properties and share similar outlines within categories, even after transformations. This observation led us to develop ICP-Classifier, a simple classification pipeline comprising Iterative Closest Point(ICP) [10] registration, and a k-nearest-neighbor(kNN) classifier. Instead of relying on complex feature aggregators or multi-layer encoding, our method leverages a reference library of training data for comparison, eliminating the need for neural networks or training. For each test sample, we perform ICP registration with all reference samples across categories, generating similarity scores. Then kNN classifier is used to predict the category according to the similarity scores of the reference samples. Fig. 1 contrasts the structure of ICP-Classifier with DNN-based methods, which presents the simplicity of our method.

Extensive experiments demonstrate the effectiveness of our methodology. The robustness of our training-free pipeline ensures stable and reproducible results, surpassing previous learning-based methods, including convolutional and transformer-based approaches.

Our method offers several advantages. It’s simple, requiring no neural networks, or training. It demonstrates strong interpretability, providing a clear measure of shape similarity for decision-making. It remains stable and effective with out-of-distribution or perturbed data.

Our contributions are as follows:

We propose ICP-Classifier, a straightforward yet robust paradigm for point cloud classification that requires no learnable parameters, neural networks, or training.
We demonstrate the efficacy of our method by achieving commendable accuracy on ModelNet40 and state-of-the-art performance on OmniObject3D. Our approach also excels in the few-shot settings and facing perturbed data.
Our comparison against numerous DNN-based methods reveals inherent challenges in 3D shape analysis, particularly highlighting the discriminative power of global shape representation over local feature aggregation.

SECTION II.

Related Work

a) DNN-based Methods

Deep neural networks(DNNs) have emerged as a dominant force in 3D shape analysis, starting with the seminal work of PointNet [2], which demonstrated the capability of neural networks to directly learn geometric information from global coordinates. PointNet++ [1] extended this approach by introducing fundamental local aggregation operators. Further advancements in capturing local structures include KPConv [6] and PointConv [7], which employ convolution-like operations in 3D. More recently, inspired by the success of Transformers [11] in natural language processing, methods like PCT [5] and Point Transformer [9] have integrated self-attention mechanisms into point cloud feature learning.

Training point cloud networks typically relies on supervised learning, involving feeding labeled data and updating network parameters via backpropagation. However, the limited availability of labeled 3D data has spurred extensive research into pre-training techniques to improve data efficiency. For instance, FoldingNet [12] employs a warped grid to guide an auto-encoder for point cloud representation learning. Similarly, PointContrast [13] leverages a proxy task of point cloud registration to enforce geometric detail during pre-training. Inspired by advancements in pre-training for Natural Language Processing (NLP), several analogous approaches have been proposed for the point cloud domain, with PointBERT [14] being a notable example.

b) Training-Free Methods

Beyond DNN-based approaches, several studies have explored training-free methodologies for 3D shape analysis. Recent works [15] –[19] have leveraged the pre-trained CLIP model [20] to achieve zero-shot learning capabilities, drawing inspiration from transfer learning frameworks established in 2D image analysis and natural language processing. Similarly, Point-NN [21] conducts 3D shape analysis using non-parametric modules based on farthest point sampling (FPS), k-nearest neighbors (k-NN), and pooling operations.

SECTION III.

Approach

This work aims to develop a straightforward approach for incorporating point cloud registration into a classification framework. Our proposed paradigm consists of three key components: (1) an ICP-based registration module, (2) a similarity metric that leverages fitness and RMSE scores obtained from the registration process, and (3) a k-nearest-neighbor (kNN) classifier. This approach is motivated by the observation that foreground 3D objects in most existing datasets exhibit near-rigid-body properties. Consequently, objects within the same category tend to share similar outlines, even after undergoing various transformations.

Fig. 2.

The workflow of ICP-Classifier.

Show All

Fig. 2 illustrates the complete pipeline of our proposed approach. Given an input test point cloud X_test, ICP-Classifier first performs pairwise registration between X_test and all samples within a reference library. This reference library is constructed from the training data, ensuring no overlap with the test set. The registration process generates similarity scores S(X_test, X_i) between the test data X_test and each reference point cloud X_i in the dataset $D = \left( {{X_i},{y_i}} \right)_{i = 1}^N$ , where ${X_i} \in {{\mathbb{R}}^{n \times 3}}$ and ${y_i} \in {{\mathbb{R}}^C}$ represents the corresponding class label. Finally, a kNN classifier leverages these similarity scores to predict the class label for the input point cloud X_test. The final prediction corresponds to the most frequent class among the k nearest neighbors.

Iterative Closest Point

(ICP) is one of the classic algorithms for point cloud registration. Given two point clouds, ICP computes a transformation matrix that aligns them by optimizing for the highest degree of geometric overlap, often measured by minimizing a predefined error metric.

We utilize the ICP algorithm provided by the open-source library Open3D [22], which is known for its ease of use and wide applicability. Formally, given two corresponding point sets P ={p₁,…,p_n}, Q={q₁,…,q_n}, in which ${p_i},{q_i} \in {{\mathbb{R}}^3}$ , translation $t \in {{\mathbb{R}}^3}$ and rotation ${\mathbf{R}} \in {{\mathbb{R}}^{3 \times 3}}$ are optimized to minimize the sum of the squared error E:

$\begin{equation*}E({\mathbf{R}},t) = \frac{1}{n}\sum\limits_{i = 1}^n {{{\left\| {{p_i} - {\mathbf{R}}{q_i} - t} \right\|}^2}} ,\tag{1}\end{equation*}$ View Source

where p_i and q_i represent the corresponding points in the point sets P and Q, respectively.

Multi-Scale ICP

To optimize alignment, we employ Multi-Scale ICP, which downsample the point cloud using various voxel sizes. Initially, larger voxel sizes are used for coarser point clouds, facilitating efficient initial alignment with fewer iterations. Subsequently, finer point clouds are processed as voxel sizes decrease.

For a given input point cloud, we design the criterion from the output of Multi-Scale ICP algorithm to quantify the input’s similarity with reference data. Subsequently, we identify the top-matching reference data and employ the kNN Classifier to ascertain the category of the point cloud.

Open3D’s ICP algorithm outputs a transformation and two alignment metrics: fitness (ratio of inliers to total points in the target cloud, higher is better) and inlier RMSE (root mean square error of inlier correspondences, lower is better). Their difference serves as our point cloud similarity criterion. Formally, given a point cloud with n points ${X_{test}} \in {{\mathbb{R}}^{n \times 3}}$ and reference library consists of N objects $D = \left\{ {\left( {{X_i},{y_i}} \right)} \right\}_{i = 1}^N$ :

$\begin{equation*}ICP\left( {{X_{test{}}},{X_i}} \right) = F\left( {{X_{test{}}},{X_i}} \right) - E\left( {{X_{test{}}},{X_i}} \right),\tag{2}\end{equation*}$ View Source

where ICP gets the similarity score between X_test and X_i; F represents fitness score and E is the RMSE value.

Moreover, we propose the Dual-Evaluation method to improve the performance. We start by treating test data as the source and reference data as the target, and then we reverse the roles, treating the reference data as the source and the test data as the target. The sum of these two evaluations yields the final similarity score. This bidirectional comparison enhances robustness and ensures a more comprehensive assessment of similarity.

$\begin{equation*}{S_i}\left( {{X_{test{}}}} \right) = ICP\left( {{X_{test{}}},{X_i}} \right) + ICP\left( {{X_i},{X_{test{}}}} \right),\tag{3}\end{equation*}$ View Source

where S_i(X_test) stands for similarity score of X_test and X_i.

KNN Classifier

The kNN classifier operates on the principle of plurality voting. In this case, the k nearest neighbors of an object are determined based on the similarity score, with the closest neighbor having the highest score. Then the object is classified through a plurality vote of its neighbors, the result being the class most common among its k nearest neighbors.

$\begin{gather*}{N_k}\left( {{X_{test{}}}} \right) = \left\{ {\left( {{X_i},{y_i}} \right) \in D|rank\left( {{{\left\{ {{S_i}\left( {{X_{test{}}}} \right)} \right\}}_{i \in N}}} \right) \leq k} \right\},\tag{4} \\ \hat y = \arg \mathop {\max }\limits_{c \in \{ 1,2, \ldots ,C\} } \sum\limits_{\left( {{X_i},{y_i}} \right) \in {N_k}\left( {{X_{{\text{test }}}}} \right)} I \left( {{y_i} = c} \right),\tag{5}\end{gather*}$ View Source

where

$\hat y$

represents the predicted class of x_t; c is the possible class; I(y_i = c) is an indicator function that returns 1 if the class label of the neighbor y_i is equal to the class c we’re currently considering.

Furthermore, we introduce the average kNN to enhance the k-nearest neighbors (kNN) algorithm. In the conventional kNN approach. Conventional kNN typically selects the top-k similarity scores and makes the prediction based on the majority among them. Building upon this, we additionally calculate the average score of the top classes. Subsequently, the classes are sorted according to these averages, and the class with the highest average score will be the result. Formally, there are c different classes in the top-k similarity scores.

$\begin{equation*}{\hat y^{\ast}} = max\left( {\left\{ {{{\bar S}^j}\left( {{X_{test}}} \right)} \right\}_{j = 1}^c} \right)\tag{6}\end{equation*}$ View Source

where the average score

${\bar S^j}\left( {{X_{test}}} \right)$

of class j is calculated and compared.

SECTION IV.

Experiments

We conduct extensive experiments both on a classic synthetic dataset ModelNet40 [23] and a real-world dataset Om-niObject3D [24].

A. 3D Shape Classification

We compare our ICP-Classifier with other training-free methods for object classification on ModelNet40. The results are reported in Table I. Our method achieves SOTA performance among training-free methods by capturing globally discriminative shape information.

TABLE I Comparison of Training-free Methods on ModelNet40. None of the methods are pre-trained on ModelNet40, while some are pre-trained in other 2D or 3D datasets.

B. Generalization

Following [24], we evaluate ICP-Classifier on the OmniOb-ject3D dataset to assess its generalization capability on the out-of-distribution style. The results are reported in Table II. Our method surpasses all fully trained neural networks by at least 9%, as it captures the underlying structure of the data without becoming overly specialized to training data.

TABLE II Comparison of Classification on OmniObject3D. The experiment evaluates the generalization of the methods on out-of-distribution data.

C. Few-shot Learning

We evaluate ICP-Classifier under the few-shot learning setting. We compute the average accuracy of ten independent runs. As reported in Table III, our paradigm achieved the top ranking across various settings. Since it does not require a training process, it can perform well with limited labeled data.

TABLE III Comparison of Few-shot Classification on ModelNet40. We calculate the average accuracy(%) over 10 independent runs.

D. Robustness to Attack

We compare the performance of our method with DNN methods when confronted with realistic data or perturbed data, including rotation, 3d-adv attack, and high-drop. The experimental results show the robustness of ICP-Classifier, highlighting its advantage of being training-free.

TABLE IV Comparison of classification with randomly rotated point clouds on ModelNet40. The experiments involve training with z-rotated data and testing with SO3-rotated data.

TABLE V Comparison between our method and DNN methods under 3d-adv attack in terms of attack success rate (ASR) on ModelNet40.

E. Ablation Study

The results of ablation study on dual-evaluation, knn classifier and ICP threshold are reported in Table VI, Table VII and Table VII. These designs improve the performance of our method.

Fig. 3.

Comparison of different methods under high-drop attack on Mod-elNet40. Points with high contribution scores are dropped in the high-drop attack.

Show All

TABLE VI Ablation study of dual-evaluation. We report the accuracy(%) difference on ModelNet40.

TABLE VII Ablation study of kNN classifier and ICP threshold. We experiment with different k values and threshold values.

SECTION V.

Conclusion

This paper presents a reevaluation of 3D shape analysis and introduces ICP-Classifier, a simple yet robust approach for 3D shape classification. Notably, ICP-Classifier is entirely free of neural networks and requires no training. Rather than aiming for high accuracy or immediate practicality, it serves as an exploratory approach to demonstrate the importance of prioritizing global shape extraction over local feature aggregation. Building upon these insights, future work will investigate integrating this principle into deep learning model architectures. This line of inquiry holds promise for advancing the field of 3D shape analysis by improving robustness and generalization ability.

ACKNOWLEDGMENT

This work is supported by the Collaborative Innovation Center of Novel Software Technology and Industrialization.

References is not available for this document.

MIT Libraries

MIT Libraries

3D Shape Classification by Registration: Neural-Network-Free and Training-Free

Abstract:

Metadata

Abstract:

ISSN Information:

Introduction