1. Introduction
Recently, 3D generative models [20], [37], [8], [11], [23], [24], [40], [38], [5], [23], [25], [33], [10], [39], [4], [1] have been advanced to enable multi-view consistent and explicitly pose-controlled image synthesis. However, training state-of-the-art 3D generative models is challenging due to the requirement of a large number of images and knowledge about their camera pose distribution. This prerequisite has resulted in limited applications of these models to only a few domains.