1. Introduction
Object pose estimation is a crucial problem in computer vision and robotics. Advanced methods that focus on diverse variations of object 6D pose estimation have been introduced, such as known 3D objects (instance-level) [28], [38], category-level [18], [36], [43], few-shot [52], and zero-shot pose estimation [13], [47]. These techniques are useful for downstream applications requiring an online operation, such as robotic manipulation [6], [25], [48] and augmented reality [23], [24], [32]. Our paper focuses on the category-level object pose estimation problem since it is more broadly applicable than the instance-level problem.