1. Introduction
Image parsing, which aims to assign a semantic label to every pixel in image, is a fundamental problem in computer vision. An effective image parsing system facilitates many high-level image understanding tasks such as object detection [1] and region-based image retrieval [2]. Conventional image parsing model is trained on images with supervised information, which means every image in the training dataset is labeled in pixel-level. However, labeling the pixels of all images is time-consuming and limits the development of image parsing.