1. Introduction
Recently, 3D generative models [5],[6],[13],[18],[19],[22],[31], [40]–[42],[59],[60],[65],[69],[74],[75] have been developed to extend 2D generative models for multi-view consistent and explic-itly pose-controlled image synthesis. Especially, some of them [5], [18], [74] combined 2D CNN generators like Style-GAN2 [28] with 3D inductive bias from the neural rendering [38], enabling efficient synthesis of high-resolution photorealistic images with remarkable view consistency and detailed 3D shapes. These 3D generative models can be trained with single-view images and then can sample infinite 3D images in real-time, while 3D scene representation as neural implicit fields using NeRF [38] and its variants [3,4, 8,10,14,17,20,32-34,36,45,47,50,53,54,64,66,70-73] require multi-view images and training for each scene.
Our DATID-3D succeeded in domain adaptation of 3D-aware generative models without additional data for the target domain while preserving diversity that is inherent in the text prompt as well as enabling high-quality pose-controlled image synthesis with excellent text-image correspondence. However, StyleGAN-NADA *, a 3d extension of the state-of-the-art StyleGAN-NADA for 2d generative models [16], yielded alike images in style with poor text-image correspondence. See the supplementary videos at gwang-kim.github.io/datid_3d.