1. Introduction
Estimating the 6D pose of 3D objects from monocular images is a core computer vision task, with many real world applications, such as robotics manipulation [53], [54], autonomous driving [9], [48] and augmented reality [10], [32]. Although this task can be facilitated by the use of RGBD input, depth sensors are not ubiquitous, and thus 6D object pose estimation from RGB images remains an active research area.