1. Introduction
Inferring properties of human bodies from images is a fundamental, ill-posed problem in computer vision. Historically, the primary focus has been on estimating key-points, which define the joints of the human figure in 2D and 3D, including the task of tracking keypoints over time. There is now growing interest in moving beyond the stick-figure view of the human body toward recovering richer representations of shape. For example, recent approaches estimate body segments [18], dense correspondences [20], or 3D volumetric descriptions of bodies from images via, e.g., voxels [17], [41], Gaussian density functions [35], and deformable pre-defined meshes [15]. Interest in the latter representation has coincided with the availability of standardized body shape models, such as SCAPE [3] and SMPL [29], and 3D body datasets, such as CAESAR [36], from which these models were constructed.