Journals & Magazines >Computational Visual Media >Volume: 11 Issue: 1

Hybrid Mesh-Neural Representation for 3D Transparent Object Reconstruction

Abstract:

In this study, we propose a novel method to reconstruct the 3D shapes of transparent objects using images captured by handheld cameras under natural lighting conditions. ...Show More

Metadata

Abstract:

In this study, we propose a novel method to reconstruct the 3D shapes of transparent objects using images captured by handheld cameras under natural lighting conditions. It combines the advantages of an explicit mesh and multi-layer perceptron (MLP) network as a hybrid representation to simplify the capture settings used in recent studies. After obtaining an initial shape through multi-view silhouettes, we introduced surface-based local MLPs to encode the vertex displacement field (VDF) for reconstructing surface details. The design of local MLPs allowed representation of the VDF in a piecewise manner using two-layer MLP networks to support the optimization algorithm. Defining local MLPs on the surface instead of on the volume also reduced the search space. Such a hybrid representation enabled us to relax the ray-pixel correspondences that represent the light path constraint to our designed ray-cell correspondences, which significantly simplified the implementation of a single-image-based environment-matting algorithm. We evaluated our representation and reconstruction algorithm on several transparent objects based on ground truth models. The experimental results show that our method produces high-quality reconstructions that are superior to those of state-of-the-art methods using a simplified data-acquisition setup.

Published in: Computational Visual Media ( Volume: 11, Issue: 1, February 2025)

Page(s): 123 - 140

Date of Publication: 25 February 2025

ISSN Information:

DOI: 10.26599/CVM.2025.9450328

Funding Agency:

Contents

SECTION 1

Introduction

The acquisition of 3D models is a frequent problem in computer graphics and vision. Most existing methods, such as laser scanning and multi-view reconstruction, are based on observations of surface color. Consequently, the surface is assumed to be opaque and approximately Lambertian. These methods cannot be directly applied to transparent objects because the appearance of a transparent object is indirectly observed owing to the complex refraction and reflection light paths at the interface between air and transparent materials.

A core technical challenge in 3D transparent object reconstruction is that of handling the dramatic changes in appearance that occur when observing an object in a multi-view setting. Slight changes in an object's shape can lead to nonlocal changes in appearance owing to the complexity of light paths. To address this issue, we utilized ray-pixel correspondence (i.e., the correspondence between a camera ray and a pixel on a static background pattern displayed on a monitor) and ray-ray correspondence (i.e., the correspondence between a camera ray and the incident ray from the background pattern) to provide light path constraints to facilitate 3D transparent object reconstruction [1]–[3]. A differentiable refraction-tracing technique can be applied to reduce the complexity of the capture setting, and the 3D shape can be recovered through ray-pixel correspondences as shown in Ref. [4]. However, in this method, a transparent object should be placed on a turntable under controlled lighting conditions. Li et al. [5] trained a physics-based neural network to handle complex light paths for 3D transparent objects. The network was trained on a synthetic dataset with a differentiable path tracing rendering technique. This method optimizes surface normals in a latent space; thus, it can reconstruct 3D transparent objects under natural lighting conditions when receiving an environment map and a few images as input. However, this frequently produces overly smooth reconstruction results.

In this study, we consider how to combine the advantages of explicit meshes and multilayer perceptron (MLP) networks, a hybrid representation, to address the problem of reconstructing transparent objects under natural lighting conditions using images captured with a handheld camera. This representation can be reconstructed through optimization using a differentiable path-tracing rendering technique. The key idea is to use MLP to encode a vertex displacement field (VDF) defined on a base mesh to reconstruct surface details, wherein the base mesh is created using multi-view silhouette images. Our design is motivated by two observations. First, the representation of functions using MLP has been demonstrated to be efficient in optimization and robust to noise [6]–[8]. The MLP network parameterizes the VDF with weight parameters globally. Hence, it implicitly provides global constraints on changes in VDF. Second, defining the MLP-parameterized VDF on the base mesh reduces the search space during optimization [9]. This significantly accelerates the optimization process compared to MLP-based volumetric representation.

The advantage of our hybrid representation is that it allows for relaxation of the capture setting. Because the global smoothness constraints between vertex displacements are implied in MLP weights, the ray-pixel correspondence required in the optimization can be significantly relaxed to a ray-cell correspondence in our pipeline. Consequently, we can simplify the background pattern design and develop a robust single-image environment matting (EnvMatt) algorithm for handling images captured under natural lighting conditions. Compared to the capture settings used in Wu et al. [2] and Lyu et al. [4]'s work, our handheld capture setting is low-cost and simple. Moreover, we propose to represent VDF using a small number of local MLPs. Each MLP is responsible for encoding a local VDF. This strategy enables the design of small-scale MLPs to further accelerate the optimization process. A fusion module is designed to disperse the gradient information of the displacement vectors of vertices to their neighboring local MLPs. This module helps maintain global constraints between local MLPs and produces high-quality reconstruction results.

The contributions of this study are summarized as follows.

We present a hybrid representation that employs explicit mesh and local-MLP based functions to represent the detailed surface for transparent objects. This approach enables us to design small-scale MLPs to accelerate our optimization algorithm's convergence and achieve high-quality 3D reconstruction results for transparent objects.
We propose a ray-cell correspondence as a relaxed representation of the light path constraint. The ray-cell correspondence is easier to capture, leading to a simplified capture setting under natural lighting conditions. Furthermore, it also eases the implementation of the EnvMatt algorithm.

The experimental results demonstrate that our method can produce 3D models with details for a variety of transparent objects, as illustrated in Fig. 1. With our simplified capture setting under natural light conditions, our reconstruction results were superior to those of state-of-the-art 3D reconstruction algorithms for transparent objects.

Fig. 1

Our reconstruction results paired with the associated renderings of three transparent objects. The fine surface details can be reconstructed well via our method using images captured with a handheld camera under natural lighting conditions.

Hybrid Mesh-Neural Representation for 3D Transparent Object Reconstruction

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

Introduction

Related Work

Overview

Method

4.1 Pre-Processing

4.2 Initial Shape Reconstruction

4.3 Surface Optimization Through Differentiable Rendering

4.3.1 Cluster Extraction

4.3.2 Loss Terms

Remark

4.4 Implementation Details

Experiments

5.1 Evaluations

5.1.1 Surface-Based Local MLP Representation

5.1.2 Comparison With Li Et Al.[5] and Lyu Et Al.[4]

5.1.3 Number of Clusters

5.2 Ablation Study

Limitations and Future Work

Conclusions

Declaration of Competing Interest

ACKNOWLEDGEMENTS

Authors

Figures

References

Keywords

Metrics

References