1. Introduction
Object detection is one of the key problems in computer vision. While there has been significant effort and progress in detecting generic object classes (e.g. detect all the phones in an image), comparatively little attention has been devoted to detect specific object instances (e.g. detect this particular phone model). Recent approaches on this topic [30], [41], [44], [22] have achieved very good performance in detecting object instances, even under challenging occlusions. By relying on textured 3D models as a way to specify the object instances to be detected, these methods propose to train detectors tailored for these objects. Because they know the objects to be detected at training time, these approaches essentially overfit to the objects themselves: they become specialized at detecting them (and only them).