1. Introduction
Over the past few years, we have witnessed a revolutionary advancement in semantic [44], [40], [68], [69], [10]–[12], [30], [15], [31] and instance segmentation [25], [36], [8], [60], [13], [65], [34], [4], [9], [43], [29] for different domains, such as general scenes [20], [41], [70], autonomous driving [17], [48], [21], aerial imagery [57], [16], medical diagnosis [22], [56], etc. Successful segmentation models are usually built on the shoulders of large volumes of high-quality training data. However, the process to create the pixel-level training data necessary to build these models is often expensive, laborious and time-consuming. Thus, interactive segmentation, which allows the human annotators to quickly extract the object-of-interest by providing some user inputs such as bounding boxes [66], [52], [64] or clicks [67], [38], [45], [37], appears to be an attractive and efficient way to reduce the annotation effort.
(a) User inputs of DEXTR [46]. (b) User inputs of the proposed IOG method. (c) An overview of our IOG framework.