Abstract:
Reconstructing a 3D shape from a single-view image using deep learning has become increasingly popular recently. Most existing methods only focus on reconstructing the 3D...View moreMetadata
Abstract:
Reconstructing a 3D shape from a single-view image using deep learning has become increasingly popular recently. Most existing methods only focus on reconstructing the 3D shape geometry based on image constraints. The lack of explicit modeling of structure relations among shape parts yields low-quality reconstruction results for structure-rich man-made shapes. In addition, conventional 2D-3D joint embedding architecture for image-based 3D shape reconstruction often omits the specific view information from the given image, which may lead to degraded geometry and structure reconstruction. We address these problems by introducing VGSNet, an encoder-decoder architecture for view-aware joint geometry and structure learning. The key idea is to jointly learn a multimodal feature representation of 2D image, 3D shape geometry and structure so that both geometry and structure details can be reconstructed from a single-view image. To this end, we explicitly represent 3D shape structures as part relations and employ image supervision to guide the geometry and structure reconstruction. Trained with pairs of view-aligned images and 3D shapes, the VGSNet implicitly encodes the view-aware shape information in the latent feature space. Qualitative and quantitative comparisons with the state-of-the-art baseline methods as well as ablation studies demonstrate the effectiveness of the VGSNet for structure-aware single-view 3D shape reconstruction.
Published in: IEEE Transactions on Pattern Analysis and Machine Intelligence ( Volume: 44, Issue: 10, 01 October 2022)