1. Introduction
Reconstructing the deforming 3D geometry of an object from image data is a long-standing and important problem in computer vision with many applications in the movie and game industries, as well as VR and AR. Especially interesting and the subject of this work is the 4D reconstruction from a single RGB video, as this is the most intuitive and user-friendly capture setup. Over the last decade, many monocular 4D reconstruction approaches have been proposed; they can be categorized into dense non-rigid structure from motion (NRSfM) methods, shape-from-template (SfT) approaches, and neural template-free approaches.