1. Introduction
The visual appearance of two images will be distinct due to different light and scene while imaging [54], [60]. Thus, compositing an image, i.e., extracting a foreground region in one image and pasting it with the background of another image, will inevitably suffer from the inharmony problem caused by distinct appearance between the two images (see Figure 1 for example), which significantly degrades the quality of composite result [49], [12], [11]. Besides, many computer vision tasks, especially image/video synthesis, such as image editing [38], [2], [44], image inpainting [34], [55], [41], and image stitching [9], [56], [57], will also encounter this inharmony problem because of the compositing process. However, human visual system is very sensitive to the inharmony in appearance, e.g., human eyes can identify very subtle distinctions in color and contrast [30], [54]. Therefore, image harmonization, which aims to make the appearance of foreground and background in the composite image compatible [47], [49], [11], [12], is full of challenges.