1. Introduction
Portrait style transfer [30], [53] aims to transform real face images into artistic 2D portraits in desired visual styles while maintaining personal identity. However, given a sequence of portrait images captured from different view-points, existing portrait style transfer methods are typically only effective for limited forward-facing photos and fails to maintain view consistency in 3D space. Essentially, existing methods only learn a style transfer between 2D features, and have no sense to 3D representations built on real-world objects. What if we can construct and stylize underlying 3D structures from captured 2D portrait images? See Figure 1 for an example. When stylized with 3D structures (i.e., geometry and texture), we can easily render view-free stylized portraits with 3D consistency and robust artistic results. This capacity will extremely facilitate the 3D content creation process which often requires large amounts of time and special expertise, and make it accessible to a variety of novice users. As shown in Figure 1, this paper aims to address the challenging task of generating high-fidelity 3D avatar from a portrait video by following the style of a given exemplar image. We refer this task as 3D portrait stylization - a marriage between portrait style transfer and 3D recovery from monocular images.
Given a set of RGB portrait images captured by a monocular camera, our method can learn a photorealistic representation in neural implicit fields, and transfer it to artistic ones with underlying 3D structures changed. Multiple stylized results can be rendered from arbitrary novel viewpoints with consistent geometry and texture.