Journals & Magazines >IEEE Transactions on Pattern ... >Volume: 46 Issue: 12

Deep Learning on Object-Centric 3D Neural Fields

Abstract:

In recent years, Neural Fields (NFs) have emerged as an effective tool for encoding diverse continuous signals such as images, videos, audio, and 3D shapes. When applied ...Show More

Metadata

Abstract:

In recent years, Neural Fields (NFs) have emerged as an effective tool for encoding diverse continuous signals such as images, videos, audio, and 3D shapes. When applied to 3D data, NFs offer a solution to the fragmentation and limitations associated with prevalent discrete representations. However, given that NFs are essentially neural networks, it remains unclear whether and how they can be seamlessly integrated into deep learning pipelines for solving downstream tasks. This paper addresses this research problem and introduces nf2vec, a framework capable of generating a compact latent representation for an input NF in a single inference pass. We demonstrate that nf2vec effectively embeds 3D objects represented by the input NFs and showcase how the resulting embeddings can be employed in deep learning pipelines to successfully address various tasks, all while processing exclusively NFs. We test this framework on several NFs used to represent 3D surfaces, such as unsigned/signed distance and occupancy fields. Moreover, we demonstrate the effectiveness of our approach with more complex NFs that encompass both geometry and appearance of 3D objects such as neural radiance fields.

Published in: IEEE Transactions on Pattern Analysis and Machine Intelligence ( Volume: 46, Issue: 12, December 2024)

Page(s): 9940 - 9956

Date of Publication: 17 July 2024

ISSN Information:

DOI: 10.1109/TPAMI.2024.3430101

Contents

CCBY - IEEE is not the copyright holder of this material. Please follow the instructions via https://creativecommons.org/licenses/by/4.0/ to obtain full-text articles and stipulations in the API documentation.

SECTION I.

Introduction

Computer vision has always been concerned with understanding the 3D world around us. One of the main challenges when dealing with 3D data is the representation strategy, which was addressed over the years by introducing various discrete representations, including voxel grids, point clouds, and meshes. Each representation has its advantages and disadvantages, especially when it comes to processing it through deep learning, leading to the development of a plethora of ad-hoc algorithms [1], [2], [3] for each coexisting representation. Hence, no standard way to store and process 3D data has yet emerged.

Recently, a new representation has been proposed, called Neural Fields [4] (NFs). They are continuous functions defined at all spatial coordinates, parameterized by a neural network such as a Multi-Layer Perceptron (MLP). In the context of 3D world representation, various types of NFs have been explored. Some of the most common NFs utilize the Signed/Unsigned Distance Field (SDF/UDF) [5], [6], [7], [8] and the Occupancy Field (OF) [9], [10] to represent the 3D surfaces or volumes of the objects in the scene. Alternatively, strategies seeking to capture both geometries and appearances often leverage the Radiance Field (RF), as shown in the pioneering approach NeRF [11].

Representing a 3D scene by encoding it with a continuous function parameterized as an MLP separates the memory cost of the representation from the spatial resolution. In other words, starting from the same fixed number of parameters, it is possible to reconstruct a surface with arbitrarily fine resolution or to render an image with arbitrarily high quality. Furthermore, the identical neural network architecture can be applied to learn various field functions, offering the possibility of a unified framework for representing 3D objects.

Owing to their efficacy and potential benefits, 3D NFs are garnering growing interest from the scientific community, as evidenced by the frequent publication of novel and impressive results [8], [12], [13], [14]. This leads us to speculate that, in the near future, NFs could establish themselves as a standard way to store and communicate 3D data. It is conceivable that repositories hosting digital twins of 3D objects, exclusively realized as MLPs, might become widely accessible.

The above scenario prompts an intriguing research question: can 3D NFs be directly processed using deep learning pipelines for solving downstream tasks, as it is commonly done with discrete representations such as point clouds or images? For instance, is it feasible to classify an object by directly processing the corresponding NeRF without rendering any image from it?

Since NFs are neural networks, there is no straightforward way to process them. A recent work in the field, Functa [15], fits the whole dataset with a shared network conditioned on a different embedding for each data. In this formulation, a solution could be to use such embeddings as the input for downstream tasks. Nevertheless, representing an entire dataset through a shared network poses a formidable learning challenge, as the network encounters difficulties in accurately fitting all the samples (see Section VII).

On the contrary, recent studies, including SIREN [16] and others [17], [18], [19], [20], [21], have demonstrated that it is possible to achieve high-quality reconstructions by tailoring an individual network to each input sample. This holds true even when dealing with complex 3D shapes or images. Furthermore, constructing an individual NF for each object is more adaptable to real-world deployment, as it does not require the availability of the entire dataset to fit each individual data. The increasing popularity of such methodologies suggests that adopting the practice of fitting an individual network is likely to become commonplace in learning NFs.

Therefore, in the former version of this paper [22], we explored conducting downstream tasks using deep learning pipelines on 3D data represented as individual NFs. Recently, several methods addressing this topic have been published, such as NFN [23], NFT [23], and DWSNet [24], and all of them process individual NFs, supporting this paradigm.

Using NFs as input or output data is intrinsically non-trivial, as the MLP of a single NF can encompass hundreds of thousands of parameters. However, deep models inherently present a significantly redundant parameterization of the underlying function, as shown in [25], [26]. As a result, we explore whether and how an answer to the research question mentioned earlier might be identified within a representation learning framework. We present an approach that encodes individual NFs into compact and meaningful embeddings, making them suitable for diverse downstream tasks. We name this framework nf2vec, shown in Fig. 1.

Fig. 1.

Overview of our framework. Left: NFs hold the potential to provide a unified representation of the 3D world. Center: Our framework, dubbed nf2vec, produces a compact representation for an input NF by looking only at its weights. Right: nf2vec embeddings can be used with standard deep-learning machinery to solve various downstream tasks.

Deep Learning on Object-Centric 3D Neural Fields

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Introduction

Related Work

Learning to Represent NFs

Latent Space Properties

Deep Learning on 3D Shapes

Deep Learning on NeRFs

Comparison With Recent Approaches

Using the Same Initialization for NFs

Limitations

Concluding Remarks

References

IEEE Account

Purchase Details

Profile Information

Need Help?