Introduction
HDR image includes rich detail and color information and largely preserves the luminance information of the real scene, so it can bring us a more realistic visual experience. In actual life, scenes often contain extremely highlight and dark areas at the same time, which can not be recorded truthfully because of the limited dynamic range of common sensors and the error of quantization. In order to record this kind of scenes more accurately, researchers have tried many methods of hardware, software, or combination of software and hardware, and HDR imaging technology was born from it. Now we can use expensive HDR camera to catch the native HDR image or shoot multiple different exposure images with the LDR camera and fuse them to get an artificial HDR image [1]. Although HDR technology and related hardware have developed rapidly in recent years, it still takes a considerable cost to obtain high-quality HDR images. Therefore, the HDR image reconstruction method based on single exposure image is worth exploring.
In this article, we propose a practical network structure with two branches for reconstructing HDR image from single LDR image. In the network, two branches are used to process over-exposed areas and under-exposed areas respectively. For the HDR image without corresponding LDR version, the HDR image is degenerated by simulating the formation of over- and under-exposed pixels to generate the LDR version. The LDR image is normalized and its gray image is obtained. According to the gray image, the masks of over- and under- exposed areas used in the integration step are calculated. Then, the normalized LDR image is linearized and inputted into the network. The purpose of linearizing is to eliminate the non-linearity occurring in the photographic process of digital cameras. Furthermore, the over- and under- exposed areas are restored and enhanced by the two branches respectively, and the color is corrected to obtain more consistent color saturation between the generated HDR image and the ground truth. Because of the distinct characteristics of over- and under- exposed areas of LDR images, the two branches enhance and reveal the details of them using different methods. The outputs of the two branches are weighted and integrated with the linearized LDR image to generate the reconstructed HDR image.
The main contributions of this work can be summarized in several aspects: (1) a novel end-to-end network structure with dual-branch is designed to simultaneously reconstruct the information of over- and under- exposed regions, which can not only reveal the hidden details in over-exposed region, but also suppress the hidden noise in under-exposed region. (2) a hue-loss based on the hue value in HSV(Hue, Saturation, Value) space is included in the loss function to enable the network to more accurately learn the pixel color distribution, which makes the HDR image outputted by the network are more faithful and natural in color. (3) the enhanced under-exposed region information by the dark branch is taken into consideration while integrating the restored over-exposed area with linearized LDR image to generate the HDR image, which can well balance the enhancement of the details in different regions. Comprehensive experiments show that compared to the state-of-the-art methods, the proposed method can well reconstruct the dark and bright regions and obtain HDR image approximate to the ground truth.
The remainder of this article is organized as follows. Section II introduces previous related works. Section III details the proposed model structure and the loss function used. Section IV describes the data set we used, and sorts out the parameters used in the experiment and the steps that need to be pay attention to, and Section V shows the analysis of elimination experiments, as well as subjective and objective comparisons of existing iTMOs (inverse Tone-Mapping Operators) and methods based on neural network. Finally, the conclusions of our work are presented in Section VI.
Related Works
A. Traditional Inverse Tone-Mapping
Inverse tone-mapping is a technology for converting LDR images into HDR images in order to load LDR resources on HDR applications. To a certain extent, it achieves HDR “restore” and upward compatibility with existing LDR resources. The existing iTMOs can be divided into global methods and local ones. The global iTMOs use the same conversion function which could be a linear scaling [2] or non-linear function like gamma-curve [3] to expand all pixels of the entire image. The global iTMOs are more suitable for scenes whose dynamic range is close to the dynamic range supported by the display device. However, the global transformations may excessively compress the tonal range, which results in unavoidable loss in contrast and visual detail.
For the local iTMOs, they use different conversion functions in different regions of the image. In this case, the regions with the same color before mapping may have different colors after mapping, which is related to the pixel location and the surrounding pixel values. The local algorithm can increase local contrast, which improves the detail visibility of the image. In general, local methods first expand the image to a moderate dynamic range, then deal with the improper exposed areas by a specially designed function. Banterle et al. [4] achieved HDR image reconstruction by the inverse transform of Reinhard et al.’s tone mapping operator [5] combined with an expand-map. Meylan et al. [6] used a piece-wise linear function to expand the dynamic range of the image, so that the high exposed areas can have a more natural appearance. Wang et al. [7] selected appropriate under-exposed and over-exposed areas in the image, and used interpolation to improve the brightness and texture performance of the selected area. Rempel et al. [8] used an expand-map calculated based on a Gaussian filter and an edge-stopping function to process the over-exposed regions. Kovaleski and Oliviera [9] expanded the work of Rempel et al.’s by replacing the Gaussian filter with a cross bilateral filter, and their methods have achieved good results in a wide range of exposures. Subsequently, inspired by the characteristics of the human visual system, Huo et al. [10] proposed a novel physiological approach, which could avoid artifacts occurred in most existing algorithms. Kim and Kim [11] applied guided filter to divide the original image into base layer and detail layer, extended the dynamic range of base layer with nonlinear functions, and the detail layer with a linear mapping function obtained by a learning-based algorithm.
B. HDR Reconstruction Based on Convolution Neural Networks
Due to the excellent performance of deep learning methods in various analytical learning tasks, CNNs have been extensively used in various computer vision problems including image-to-image translation. CNN has also been successfully applied to reconstruct HDR image using multiple exposure images captured with different exposure times [12]. On this basis, in order to solve the artifact problem in large-scale foreground motion dynamic scenes, Wu et al. [13] proposed the first non-fluid depth frame for high dynamic range imaging. In terms of reconstructing HDR image from one single-exposure LDR image, Eilertsen et al. [14] used the encoder-decoder network structure and proposed a very general solution to the problem specification, considering any type of saturated region. This method can reduce the generation of artifacts in similar structures because its modified U-net structure only predicts values of saturated pixels. It is worth mentioning that Endo et al. [15] used an automatic encoder architecture to predict a set of LDR images with different exposure levels from a single input image, and synthesized them using Mertens et al.’s method [16] to obtain HDR image. Marnerides et al. [17] proposed a multiscale architecture which avoids the use of upsampling layers to improve image quality, and the branches are for global, semi-local, and local feature extraction. Jang et al. [18] designed the two-stage cascade network to learn HDR image generation and HDR image color refining. Kinoshita and Kiya [19] proposed a loss function based on tone-mapped image to solve the inaccuracy problem caused by the nonlinear relationship between LDR and HDR images. Wang et al. [20] decomposed the image into high-frequency and low-frequency components, and designed two sub-networks to process them separately. Jang et al. [21] designed the network to learn the cumulative histogram of HDR images, and used the results for histogram matching of LDR images. Finally, a color learning network is cascaded to refine the image color. Recently, Generative Adversarial Networks (GAN) [22] have made significant progress in many tasks such as image generation and image restoration and have already been used for HDR image reconstruction by Lee et al. [23] and Ning et al. [24]. However, the image generated by GAN leads to reconstruction errors and unreal artifacts. Also combined with GAN, Moriwaki et al. [25] proposed that if the reconstruction error is the only loss function, the recovered image is easily blurred, so they further introduced the perceptual loss and reconstruction loss optimized for HDR, which can improve the image quality.
Dual-Branch Network Based Single Exposure HDR Image Reconstruction
A. Problem Statement
When a normal camera is used in an inappropriate exposure environment, such as a dazzling sun or a dimly lit room, an excessively large or small dynamic range will cause the camera’s photosensitive element to operate under an abnormal condition. In this case, the details of the scene will be easily blurred or even lost. However, it is worth noting that the causes of information loss are quite different in these two situations.
In an environment with high light intensity, the camera sensor is apt to receive too much light to cause image overexposure. In the over-exposed areas, the value of one or all channels of pixel is saturated, which results in loss of image detail. If we want to reconstruct the image information of these regions in HDR domain, a better method is to estimate the value of the saturated pixel based on the information of its unsaturated channel or the adjacent unsaturated pixel [14] as equation (1).\begin{equation*} \hat {Y}^{L}_{i,c}=f_{L}(X^{L}_{i\_{}adj,c}, X^{L}_{i,c}) \tag{1}\end{equation*}
Correspondingly, in a low-light environment, the camera’s photosensitive elements cannot capture too small changes, or the level of quantification is not enough to record such tiny detail information, which will result in texture blur and color drift. In addition, the noise caused by the longer shutter time in dim conditions can also significantly affect the quality of the image. From this point of view, the reasons for the decline of image quality in the under-exposed area are different from those in the over-exposed area. The reconstruction of under-exposed areas in HDR domain can be expressed in a similar form as equation (1).\begin{equation*} \hat {Y}^{D}_{i,c}=f_{D}(X^{D}_{i\_{}adj,c}, X^{D}_{i,c}) \tag{2}\end{equation*}
B. Model Structure
Fig. 1 exhibits the pipeline of our method. Firstly, the input image is mapped to the linear domain by the inverse function of equation (3).\begin{equation*} f(Y)=(1+\sigma)\frac {Y^{n}}{Y^{n}+\sigma } \tag{3}\end{equation*}
BranchNet: Fully convolutional autoencoder network with two branch. We take the
1) Light Branch
Limited by the narrow dynamic range, LDR images cannot simultaneously record objects with extremely high brightness (such as the sun and other light sources) and the dark corners without direct light. In this case, the high light areas in the image are prone to texture loss and color distortion, such as the outline of the object is covered by dazzling flare or the blue sky become brilliant white. If the dynamic range of image is extended directly, the distortion caused by over-exposed will seriously affect the visual effect. In order to avoid such problems and obtain good visual experience in over-exposed area, we built a light branch that specifically repairs the high light area. Light branch is a network of autoencoder architecture [28], its encoder maps the input image to a nonlinear space to obtain a low-dimensional abstract feature map. By performing feature processing and extraction on these feature maps, the neural network can realize operations that are difficult to implement on the original feature space. And the decoder is trained to reconstruct full dimensional data through a large number of feature maps outputted by the encoder, thus realizing the conversion of the image from the LDR domain to the HDR domain.
The structure of the Light Branch is shown in the upper half of the network in Fig. 1. The encoder and decoder of the network have the same number of convolutional blocks, both of them are five. In the encoder, the instance normalization [29] layer is set behind the last convolutional layer in each convolution block. The batch normalization layer [30] will calculate the mean and variance together with the same batch of samples entered. For HDR images, the high dynamic range attribute tends to make the statistical indicators between different images vastly different, so the calculated mean and variance are not beneficial to the individual. The IN layer calculates the mean and variance for a single sample, and is relatively more suitable for HDR image reconstruction. In addition to the fifth convolutional block, the remaining modules are finally connected to a maxpooling layer for downsampling. In the decoder, each convolution block will receive the feature map from the decoder through skip-connection to compensate for the loss of information caused by downsampling. However, the size of these feature maps is twice that of the convolution block input. Here we use bilinear interpolation to expand the size of the input, which is also the upsampling operation of the decoding process. In addition to the last convolutional block in the decoder, the last convolutional layer of the remaining blocks is connected to the instance normalization layer. The convolution kernel used in the convolution layer of the entire light branch has a shape of
2) Dark Branch
As Reibel et al. [32] concluded, in low-light situations, in addition to photon response inhomogeneity, CCD and CMOS sensors are subject to interference from a variety of noise sources, such as readings, photon emissions, dark current, and fixed-mode noise. Noise varies not only depending on the exposure setting and camera model, but also on the scene captured. For digital cameras, darker areas appear to contain more noise than bright areas, which is showed in Fig. 2. Note that noise becomes less noticeable when the image becomes brighter. This is because brighter areas have stronger signals due to more light, which results in higher signal-to-noise ratio. This means that under-exposed areas will emerge more noticeable noise when their brightness are promoted to natural level. If not suppressed, these noises will appear more abrupt in HDR images. The traditional inverse tone mapping method mainly classifies images according to different brightness levels, and then dynamically expands them by different expansion operators [33]. This can reduce the noise expansion and artifacts caused by improper expansion of dark areas to a certain extent. In the existing methods based on deep learning, the under-exposed area is rarely treated specially during the HDR image reconstruction, which affects the visual effect of these areas in HDR image.
The performance of different brightness areas under the same intensity of additive Gaussian noise. The brighter the area, the harder it is to notice the reduction in visual effects due to noise.
In addition to the noise caused by the camera hardware itself, there is noise in the image compression process. The pixel values of the under-exposed areas are often low, and the changes of texture and color are not obvious. In the JPEG encoding compression, the pixels in under-exposed area will be considered as unimportant, which causes most of the useful information in the under-exposed area to be discarded. Furthermore, because the pixels in under-exposed areas have small gradient, it is easy to lose information during the quantization and cause an unsmooth transition, which results in obvious artifact bands. When the hardware noise and the quantization loss are added together, it will seriously affect the HDR image reconstruction in the under-exposed area. Fig. 3 shows the reconstructed HDR image without noise suppression in under-exposed area. In Fig. 3, the left is the input LDR image, and the right is the HDR image reconstructed without processing on the under-exposed area. It can be found that although no obvious noise in the LDR image, in the HDR image, the color noise and artifact bands in the sky appear very abrupt, which needs to be dealt with in the HDR imaging process. Here we try to use a network branch to perform high dynamic range reconstruction on the under-exposed area of the image, and to make the dark pixels darker, thus improving the viewing effect of the entire image. We define the network that implements this function as a dark branch, and its model structure is the lower half of the network in Fig. 1. It can be seen that the
The display effect in the HDR image when the noise of the under-exposed area is not suppressed. (a) LDR image; (b) HDR image generated by the network without processing to under-exposed area.
C. Loss Function
In a variety of image generation tasks, researchers usually design an appropriate loss function according to the actual demand to ensure that the network converges in the desired direction. For the deep learning based methods of HDR image reconstruction, in addition to directly computing the mean square error or the average absolute value error between the output HDR image and the ground truth, the loss function also includes: calculating the mean square error of the image after tone mapping to ensure the quality of the tone mapped image [12]; calculating the difference of the image gradient information to facilitate the repair of the image texture information [24]; calculating the cosine similarity between the input image and the ground truth to make the color of the image more accuracy [17].
The goal of our method is to reconstruct the high-brightness and low-luminance regions of the image, thus we need to extract the region of interest from the image to get a mask so that the network can focus more on the prediction of the pixel values in the mask region. For light branch, it extracts the region with high pixel value in the image by equation (4) [14] using the threshold \begin{equation*} M_{i}^{L}=\frac {max(I_{i}-t_{l},0)}{1-t_{l}} \tag{4}\end{equation*}
Light mask acts on the light branch’s loss function as (5), and guides the light branch to focus on the repair of over-exposed areas.\begin{align*} L_{light}(\hat {Y},Y)=&\frac {\alpha ^{L}}{wh}\sum _{i}{\left |{M^{L}_{i}(\hat {H}_{i}\!-\!H_{i})}\right |^{2}} \\&+\frac {1}{3wh}\sum _{i,c}{\left |{ M^{L}_{i}(log(\hat {Y}_{i,c}\!+\!\epsilon)\!-\!log(Y_{i,c}\!+\!\epsilon))}\right |}^{2} \\ \tag{5}\end{align*}
\begin{align*} H_{i}= \begin{cases} \displaystyle \frac {Y_{i,g}-Y_{i,b}}{6\Delta _{i}}, Y_{i,r}=max_{c}(Y_{i,c}) \& Y_{i,g} \ge Y_{i,b}\\[5pt] \displaystyle \frac {Y_{i,g}-Y_{i,b}}{6\Delta _{i}}+1, Y_{i,r}=max_{c}(Y_{i,c}) \& Y_{i,g} < Y_{i,b}\\[5pt] \displaystyle \frac {Y_{i,b}-Y_{i,r}}{6\Delta _{i}}+\frac {1}{3}, Y_{i,g}=max_{c}(Y_{i,c}) \\[5pt] \displaystyle \frac {Y_{i,r}-Y_{i,g}}{6\Delta _{i}}+\frac {2}{3}, Y_{i,b}=max_{c}(Y_{i,c}) \end{cases} \tag{6}\end{align*}
The loss function used by dark branch is slightly different from that in the light branch. Since the area of interest in the dark branch is mainly the under-exposed portion of the image, the pixel value here is very small and is not suitable for operation in the logarithmic domain. So we use the L2 distance directly to calculate the loss of pixels in the dark region as equation (7).\begin{align*}&\hspace {-.5pc} L_{dark}(\hat {Y},Y)=\frac {\alpha ^{D}}{wh}\sum _{i}{\left |{M^{D}_{i}(\hat {H}_{i}-H_{i})}\right |^{2}} \\&+\frac {1}{3wh}\sum _{i,c}{\left |{ M^{D}_{i}(\hat {Y}_{i,c}-Y_{i,c})}\right |}^{2} \tag{7}\end{align*}
\begin{equation*} M_{i}^{D}=G\left({\frac {max(t_{d}-I_{i},0)}{t_{d}}}\right) \tag{8}\end{equation*}
We can train both the light branch and the dark branch at the same time, so the final loss function is shown in equation (9):\begin{equation*} L_{final}(\hat {Y},Y)=L_{light}(\hat {Y},Y)+L_{dark}(\hat {Y},Y) \tag{9}\end{equation*}
And the reconstructed HDR image \begin{align*}&\hspace {-.5pc} \hat {Y}=\sum _{i,c}(M^{L}_{i}*\hat {Y}^{L}_{i,c}+M^{D}_{i}*\hat {Y}^{D}_{i,c} \\&+\,(1-M^{L}_{i}-M^{D}_{i})*f^{-1}(X_{i,c})) \tag{10}\end{align*}
Experiments
A. Dataset
The dataset largely determines the upper limit of a network. We need a large set of well-structured training data to ensure that the network can learn abundant useful information. For the task of HDR image reconstruction, the native HDR image taken directly by a professional HDR camera is most suitable as ground truth. However, limited by the expensive price of such cameras, existing resources are scarce. Therefore, an HDR image synthesized by multi-frame images with unequal exposure time can also be considered for use as a training label. We have collected a total of 1,304 HDR images of the above categories with resolutions ranging from
With the HDR image as the ground truth, we also need to get the corresponding LDR image to form the data pair for training. Generally, HDR images have extremely high resolution. We perform cropping on the random position of the original HDR image, randomly flip the extracted area, resample them to the size of
B. Training
We use the specific loss function to guide the two branches of the network to learn different mappings. The two branches are relatively independent, so they can be trained separately or simultaneously. In fact, these two training methods require different training parameters. In order to avoid the difficulties caused by parameter adjustment, we first train two branches at the same time, and then freeze the parameters of the light branch to reduce the learning rate and fine-tune the dark branches. So we use loss function given in equation (9) directly to optimize the parameters of each layer in the network. The equation (9) consists of light branch’s loss and dark branch’s loss. The hyper-parameter
The loss minimization is performed with the ADAM optimizer [36] and the learning rate of ADAM is
Results
In this section, we compare our method with existing methods mainly through subjective perception and objective indicators, and perform ablation studies to prove the effectiveness of the modules used.
Image quality metrics are generally categorized into three classes: full-reference (FR), reduced-reference (RR), and no-reference (NR) metrics. Because HDR images have wider dynamic range than LDR images’, we tend to use several FR metrics based on Visible Differences Predictor (VDP) [37], Peak Signal to Noise Ratio (PSNR) [38] and Structural Similarity Index Measure (SSIM) [39] to evaluate the difference between predicted HDRs and reference HDRs. Researchers have improved the calculation methods for PSNR and SSIM. Before the scores calculation, they applied Perceptual Uniformity (PU) coding [40] to the predicted and reference images to make them suitable for HDR comparison. Based on the fact that distortions in darker image areas are less visible, for these areas, metrics with PU coding are generally more accurate than luminance-independent metrics. HDR-VDP-2.2 [41] is calibrated visual metric for visibility and quality predictions in all luminance conditions. It provides a PMAP (Probability MAP) to visualize the probability of detection per pixel and a VDP-Q quality score to measure the overall quality of the predicted image.
A. Comparisons With Existing Methods
1) Qualitative Comparison
We compare our method with the existing three conventional iTMOs, i.e. Akyüz et al. [2], Masia et al. [42] and Huo et al. [43], and three deep high dynamic range image construction methods, i.e. the multiscale reconstructed model ExpandNet [17], multiple exposure reconstruction model [15] and high exposure repair model [14]. The implementation of three iTMOs has been well integrated into a toolkit supplied by Banterle et al. [44].
Fig. 4 to Fig. 8 shows the results of qualitative comparison. The image above in Fig. 4 is a close-up view of Horseshoe Lake. The sun reflected in the lake caused a significant overexposure, resulting in the loss of plant stem and leaf details in this region(red box). At the same time, in the lower right corner of the image, there is a noticeable shadow area (blue box). In Fig. 4, Akyüz et al. [2], Masia et al. [42] and Endo et al.’s methods [15] ((c),(d),(g) and (l),(m),(p)) raised the overall brightness of the image, however, they failed to suppress the over-exposed for the bright region (red box). Moreover, the contrast and visibility of the output images of these algorithms are poor. Huo et al.’s method [43] ((e),(n)) suppressed the overexposure too much, which results in distortion of information in the bright area. Eilertsen et al.’s approach [14] ((f),(o)) has a certain effect on highlight suppression, and the output image looks more natural. The ExpandNet [17] ((h),(q)) and the proposed method ((i),(r)) performed well and restored more details for the over-exposed region, and produced result images closer to the ground truth. Furthermore, the proposed algorithm also revealed more texture details and retain more color information for the dark region (blue box). The small town image below mainly shows the excellent effect of our method on highlight suppression in over-exposed scenes. The instance normalization structure in the network allows us to implement regularization for each image, which helps the network to learn the personalized repair method of over exposure area information. This makes our method can effectively solve the problems of detail weakening and color whitening caused by over exposure in the image without special brightness adjustment.
Result images of all compared methods. (a),(j) input LDR images corresponding to test images Horseshoe Lake and Small Town; (b),(k) Ground Truth; (c)-(h),(l)-(q) outputs of the methods for comparison; (i),(r) our results.
Result images of all compared methods. (a),(j) input LDR images corresponding to test images Seashore and Sea Surface; (b),(k) Ground Truth; (c)-(h),(l)-(q) outputs of the methods for comparison; (i),(r) our results. In this set of images we use zoom-in windows to highlight some regions.
Result images of all compared methods. (a),(b),(c),(d) input LDR images corresponding to test images HDR008_1800, HDR007_1800, HDR006_1800, and HDR_110_Tunnel; (a1)-(d1) Ground Truth; (a2)-(d7) outputs of the methods for comparison; (a8)-(d8) our results.
The performance compared images of all methods for overexposure areas. (d),(l) input LDR images. (h),(p) our results; others are outputs of the methods for comparison.
The performance compared images of all methods for overexposure areas. (d),(l) input LDR images. (h),(p) our results; others are outputs of the methods for comparison.
The images shown in Fig. 5 are Seashore and Sea Surface, they shows the performance of the compared algorithms on over-exposed and under-exposed areas respectively. In these two sets of images, we use zoom-in windows to highlight some regions, so that we can more clearly distinguish the differences between the methods. Fig. 5 shows the similar results as Fig. 4. Akyüz et al. [2], Masia et al. [42] and Endo et al.’s methods [15] ((c),(d),(g) and (l),(m),(p)) enhanced the brightness of over-exposed and under-exposed regions, but did not suppress the overexposure and quantization noise, which causes obvious artifacts. Huo et al.’s algorithm [43] ((e),(n)) lowered the brightness of the entire image and caused artifacts in the over-exposed area. Eilertsen et al.’s approach [14] ((f),(o)) suppressed the highlight, but induced color distortion and artifact bands. The ExpandNet [17] ((h),(q)) enhanced the brightness and suppressed the quantization noise, and achieved natural visual effects for over-exposed area, but induce color shift in the under-exposed area. The proposed method ((i),(r)) made the output image natural, pleasing and closer to the ground truth. This is due to the hue loss we used in training, which can more precisely guide the network to learn the color distribution of the image.
It is worth noting that, in Fig. 4, our network shows seemingly opposite decision, i.e., enhancing the brightness of the strong texture regions. This is due to the fact that our network can be personalized for different image areas. That means the network does not increase the brightness of the smooth area, and it does not produce artifacts when highlighting strong texture areas. This is due to the fact that in strong texture regions, even though some pixels have lower values, if they are enhanced in brightness, it is not prone to artifacts. In this case, our network chose to brighten the dark pixels, which helps to obtain better visual effects.
Fig. 6 shows the performance of the comparison algorithms for images taken from dark room in sunny day. Akyüz et al. [2], Masia et al. [42] and Endo et al.’s methods [15] ((a2)-(d2), (a3)-(d3) and (a6)-(d6)) enhance the brightness of the whole image too much, resulting in the loss of contrast and blurred image. Huo et al.’s algorithm [43] ((a4)-(d4)) lowered the brightness and contrast of the entire image. Eilertsen et al.’s approach [14] ((a5)-(d5)) is better than that of the above four algorithms, but the contrast in bright area is lower than that of the ExpandNet [17] ((a7)-(d7)) and the proposed method ((a8)-(d8)). Furthermore, the proposed network restored more detail in bright areas.
Fig. 7 and Fig. 8 mainly show the performance of each method on reconstruction of over-exposed area. It can be seen that the test images are at a lower exposure level as a whole, but each of them include extremely over-exposed area. In order to better display the details of the over- exposure areas, we use zoom window in the figure to highlight the difference between the over-exposed area before and after HDR reconstruction. Compared to the existing method, the proposed network structure shows excellent performance and effectively reconstructs the detail information in over-exposed area. In general, our model has learned a more personalized processing method for different scene images.
2) Quantitative Comparison
In addition to qualitative comparisons, we also made quantitative comparisons with the six methods mentioned above to verify the effectiveness of our method. We randomly selected 210 original HDR images from the dataset as the test set. In order to ensure that each method can obtain the calculation results in a reasonable time and guarantee the data quality, we randomly crop the high-resolution image and adjust the size to a shape of
The range of pixel values of HDR images outputted by different reconstruction methods is different. Traditional HDR image generation methods tend to output absolute brightness values, while CNN-based methods tend to directly output data in the range of [0, 1]. Although the perceptual-uniformity-encoding-based HDR metrics dependent on absolute luminance values in
Our method has the best scores in six of the seven objective indicators in Table 2. They are VDP-Q, PMAP95, PMAP75, PU2-SSIM, PU2-MS-SSIM and LOE respectively. The score of the VDP-Q represents the quality of the generated image. The higher the score, the closer the predicted image is to the ground truth under the observation of the human visual system. For the target image and the reference image, PMAP95 represents the percentage of pixels in the image that can be detect differences with a probability greater than 95%, and PMAP75 means probability greater than 75%. When the target image looks closer to the reference HDR image, it will have fewer perceptible difference pixels. The scores of PU2-SSIM and PU2-MS-SSIM indicate the structure similarity of two images. The higher the score, the more similar the predicted image is to the ground truth. It is worth mentioning that VDP-Q lacks good evaluation in terms of structural features that have a significant impact on human visual perception, so the scores of PU2-SSIM and PU2-MS-SSIM are a good complementation. LOE measures the naturalness of the output image compared to the ground truth, the smaller the LOE value is, the better the lightness order is preserved. In addition, our PU-PSNR score is second out of seven algorithms, only slightly behind the first. Table 2 indicates that our method has obvious advantage in all the methods considered in the comparison. Fig. 9 and Fig. 10 show the HDR-VDP-2.2 PMAPs calculated by the prediction results of all the methods. The HDR-VDP-2.2 visibility PMAPs describe the probability that an observer perceives the difference between two images at each pixel. Red pixels indicate high probability, and blue pixels indicate low probability. Benefiting from the independent processing of over-exposed and under-exposed areas in the image by the light and dark branch network, it can be clearly seen from the figures that for over- and under- exposure areas that are prone to visual differences, our results have a lower probability of feeling the difference. This means that the HDR image generated by our algorithm is closer to the ground truth than that generated by the methods considered in comparison. Therefore, it can be inferred from the results of Fig. 9 and Fig. 10 that our method performs better than other compared methods.
Visibility PMAPs of HDR-VDP-2.2. Among them, blue indicates a difference that imperceptible by the human visual system, and red indicates perceptible by the human visual system.
Visibility PMAPs of HDR-VDP-2.2. Among them, blue indicates a difference that imperceptible by the human visual system, and red indicates perceptible by the human visual system.
B. Ablation Studies
Different network structures and loss function often lead to discrepant results. In order to verify the validity of the structure and loss function mentioned above, we conducted experiments using different modules in the network and analyzed the statistical results.
1) Branches
The proposed network is mainly composed of light and dark branches. As mentioned in the previous section, the light branch focuses on repairing over-exposed areas and restoring the contrast and blurry texture that are weakened in LDR image due to compression. However, in low light conditions, the problem of sensor noise interference becomes more prominent due to the decrease of incident photons and the increase of sensor sensitivity. The image may have blurred details, unclear texture information, and color shift [46], [47]. Therefore, we introduce a dark branch to suppress the image quality degradation caused by the stretching of the dynamic range in the low illumination area. Similar to the light branch, we use the method of calculating the mask to make the network pay attention to the areas that need to be repaired. However, if the mask is calculated directly on the gray map and expanded linearly, due to the impact of image compression on the under-exposed area, the mask is prone to obvious artifact bands as shown in Fig. 11. However, after the guided filter processing, we can smooth out the uneven weights caused by noise in the input image while ensuring that the mask area is not incorrectly extended, which is shown in Fig. 11 (d).
Figures of the smoothing results of the guided filtering on the extracted mask. (a) input LDR image; (b) HDR image after the pixel values in (a) are squared; (c) the mask calculated by (a); (d) the mask after guided filtering.
2) Color Correction Loss
The loss function is very important for reconstruction task. Because the usual L2 distance loss function cannot accurately measure the color difference between vectors, we need to constrain the color of the network output image by adding a new loss function to make it closer to the ground truth. The value of cosine similarity loss will be smaller in the case of similar image color channel ratio, making the search space have more minimum points, which is not conducive to the convergence of the network. Here, based on the fact that hue-loss can reflect the color difference more accurately, we use hue-loss instead of cosine similarity loss to guide the network to find the correct color distribution. Table 3 shows the results, where LB, DB represent the loss function of light branch and dark branch respectively, proposed means the final loss function of the proposed method. The results of our method with hue-loss are better than the results using cosine similarity loss (CS-Loss) or L2 Loss only. Furthermore, from the results of Fig. 4 and Fig. 5, we also can see that our method can reconstruct HDR images with sufficient color accuracy.
Conclusion
In this article, a deep learning-based single frame HDR image generation method is proposed. In the process of HDR image reconstruction, two key issues need to be solved: highlight suppression in over-exposed areas and noise elimination in under-exposed areas. Most of the existing algorithms focus on the problem of highlight suppression in over-exposed areas. We think that LDR images that need to be processed generally do not have excessively severe over-exposed problems. Therefore, in the design of experiments, we are more inclined to let the network learn to repair pixel information on the edges of saturate values. These repaired pixels are neither generated out of thin air by the network nor obtained through over-fitting, and they can be traced. In practice, the texture information that we cannot see clearly on the LDR image may just be “hidden” in the image, which does not mean that they do not exist. Thus, the main goal of our algorithm is to find and enhance these “hidden” details according to the existing information in the image. Similarly, the under-exposed areas may also hide lots of noise that is difficult to perceive in the LDR image. When the image is converted to the HDR domain, these hidden noises become apparent, and if not suppressed, the quality of the generated HDR image will be seriously affected. Thus, we propose a novel dual-branch network structure, which can not only reveal the hidden information in the over-exposed areas, but also suppress noise hidden in the under-exposed areas, so as to ensure that the reconstructed HDR image has excellent quality. In addition, we also introduce hue-loss to enable the network to more accurately learn the pixel color distribution in the image, making the HDR image generated by the network more accurate and natural in color. The experimental results show that our model has obvious advantages in both subjective analysis and objective score comparison.
HDR image generation contains many different sub-tasks. It is difficult to solve these problems with a single network directly. If we can divide the entire pipeline into different sub-tasks, and design special modules to guide the model to learn the corresponding function, maybe we can get better results. In future, we will attempt to further divide the over-exposed area into light source area and reflection area, or the under-exposed area into strong noise area and under-brightness area, etc.; and design network to do pertinent processing.