1. Introduction
Object detection is a fundamental task in computer vision. Most approaches formulate the problem as that of predicting a bounding box enclosing the object of interest – this is, for example, the evaluation criteria in the popular PASCAL Visual Object Recognition Challenge (VOC) [3]. However, a bounding-box output is clearly limited. For many objects, particularly those with complex, articulated shapes, a bounding box provides a poor description of the support of the object in the image. At the other extreme, one can attempt to produce an object class label for every pixel in the image. This is usually termed multi-class segmentation.