1. Introduction
The development of visual recognition algorithms has followed the evolution of recognition benchmarks. PAS-CAL VOC [13] standardizes the task of bounding box object detection and the associated IoU/Average Precision metrics. At the time, the approaches defining the state-of-the-art, DPM [14] and later the R-CNN family [17], [48], address object detection by reasoning about densely enumerated box proposals, following the sliding window classification approach of earlier detectors [52], [51]. SDS [19] expands the scope of object detection to include instance mask segmentation, and introduces early versions of the mAPbbox and mAPmask, subsequently popularized by the COCO dataset [35]. Bounding boxes, however, remain the primary vehicle for object reasoning.