1. Introduction
Deep networks are dramatically driving the development of computer vision, leading to a series of state-of-the-art in tasks including classification [22], [16], [35], object detection [12], [17], [32], [27], [33], [34], semantic segmentation [28], [4], [37], [18] etc. From the development of deep learning in computer vision, we can observe that the ability of deep networks is gradually growing from making image-level prediction [22] to region/box-level prediction [12], pixel-level prediction [28] and instance/mask-level prediction [15]. The ability of making fine-grained predictions requires not only more detailed labels but also more delicate network designing.