I. Introduction
Due to the wilder luminance range and vivider color display, the high dynamic range (HDR) technology has been acknowledged to a wide range of applications, like cellphone photo shooting [1], [2], video games [3], movies [4], [5] and medical imaging [6]. In order to obtain the HDR image from the prevalent low dynamic range (LDR) camera, generally, the industry has developed two types of HDR image creation approaches: the multiple images fusion and the single image based HDR prediction, which is also called the inverse tone mapping (iTM). In the first approach, a series of LDR images are captured, representing the differences of the exposure [7], the ISO [8] or the noise attendence [9] of the scene, and then merged into a HDR image. The merge procedure is conducted either by deep neural networks [7], [10] or by weighted summation [11], [12]. Unfortunately, all these methods tend to deliver unpleasing ghost artifacts when the captured images contain fast-moving objects. In the second approach, the HDR image is estimated from only one LDR image using various mathematical models, like the gamma expansion [13], [14], the guided polynomial range expansion [15], [16] or the deep neural networks [17], [18]. All these methods assume that the input LDR images are in the appropriate light condition.