Introduction
Haze is an atmospheric phenomenon where dust, smoke, or dry particles obscure the clarity of a scene. Hazes usually degrade the quality of photos by reducing the total contrast in color and occluding objects in photos. It is necessary to reduce the haze effect on these photos to make them visually pleasing and appealing. Moreover, many vision tasks in practical life depend on clean images, such as face detection and automatic license plate recognition from monitoring images, scene analysis from satellite images, etc. However, real captured images even on sunny days often suffer from low visual quality with haze effect. It is therefore essential for a vision system to firstly remove hazes from the captured images and then conduct detection or recognition.
In hazy images, only a portion of the reflected light reaches the observer because of the absorption in the atmosphere. Based on this observation, the captured image \begin{equation*} I(x) = J(x)T(x) + A(1-T(x)),\tag{1}\end{equation*}
The traditional image dehazing methods [1]–[12] have investigated various haze-related image priors. Tan [2] assumes that the contrast of hazy images is lower than haze-free images and propose to maximize the contrast of hazy images under the MRF framework. Fattal [3] uses independent component analysis for estimating the transmission in hazy scenes assuming that the transmission and surface shading are locally uncorrelated. He et al. [4] propose the dark channel prior to estimate the transmission map based on the observation that the local minimums of color channels of haze-free images are close to zero. Liu et al. [12] propose the rank-one prior based on the observation that in most of the regions except the light source area, the imaging scene is covered by spatially homogenous light. Polarization-based methods [13], [14] are also effective for haze removal, which are based on the fact that the airlight scattered by atmospheric particles is partially polarized. These methods are effective in image dehazing due to the full investigation of image prior knowledge and the understanding of physical mechanism of hazes. However, these priors are mainly based on human observations that would not always hold for diverse real-world images. For example, dark channel prior [4] is effective for most outdoor images but usually fails for those containing large areas of white scenery such as white walls or clouds in the sky, as shown in Figure 1 (b). Despite the effectiveness, polarization-based dehazing requires professional optical equipment and multiple image fusion, which is not suitable for single image dehazing.
Single image dehazing results. (a) Input hazy image. (b) Recovered haze-free image using DCP. (c) Dark channel of the input image. (d) Transmission map by DCP. (f)~(h) Recovered haze-free image, dark channel and transmission map by our network. (e) Comparisons on image blocks. Our method can better deal with sky regions.
A comparison of the extended PDN model and the original PDN model. We can see that the extended PDN model achieves dramatic improvement in PSNR (17.77 to 28.78) and removes haze more effectively.
Dehazing results on a real-world hazy image of our energy-based method as well as four energy minimization based methods. We also show the results of the original dark channel prior and our proximal dehaze-net as comparisons. Ours-EM can achieve satisfactory result while our learning-based model further improves the visual effect and processing speed.
The architecture of proximal dehaze-net. We first estimate the atmospheric light
Sub-network structure. We adopt residual encoder-decoder (RED) as the base architecture of sub-networks in our model. We use 2-stride convolution for down-sampling and 2-stride transposed convolution for up-sampling. For A-Net and T-Net, the last layer is Sigmoid. For U-Net and Q-Net, the last layer is Tanh. The bottleneck of RED is stacked residual blocks.
Computation graph of GIF block. Based on [56], we construct its differentiable computation graph. GIF block takes image
Intermediate results of transmission maps produced by our proposed data-preprocessing.
In recent years, deep learning methods [15]–[29] have overwhelmed single image dehazing area. The early learning-based methods usually learn transmission maps or other haze-related variables. For instance, Ren et al. [16] and Cai et al. [17] design CNNs to learn transmission maps and then recover clean images, and Li et al. [18] propose to learn a
Our main motivation is to combine haze imaging mechanism and deep learning in a novel model-driven deep learning framework [33], which takes the advantages of both the prior-based methods and the deep learning-based methods. Compared with the traditional prior-based methods, our approach is free of parameter tuning after training and runs at a higher speed. Most importantly, combined with the data-driven method, the dehazing performance is significantly improved. Compared with the deep learning-based methods, our approach integrates physical mechanism constraint into the deep learning framework and can produce stable dehazing results (see Figure 10). Moreover, our method learns to explicitly predict atmospheric light and transmission maps. They may be helpful for model explanation and downstream tasks such as semantic segmentation on hazy images.
We build our model-driven learning approach by the following steps. First, based on the haze imaging model, we formulate the inverse problem of single image dehazing as an energy model with physical constraints in image space and haze-related feature space (dark channel space in this paper), regularized by haze-related image priors. Second, we design an iterative optimization algorithm for minimizing the dehazing energy function using the half-quadratic splitting algorithm, with proximal operators for modeling the regularization terms. Third, we propose a deep neural network based on the iterative algorithm, dubbed as proximal dehaze-net, to implicitly learn these image priors by learning their corresponding proximal operators using convolutional neural networks.
In summary, our work makes three main contributions. First, we propose a novel energy model for single image dehazing, which investigates the haze imaging model constraints in both image space and haze-related feature space. Second, based on the iterative algorithm for minimizing the energy model, we design a multi-stage deep neural network, by discriminatively learning haze-related image priors, including dark channel prior, transmission prior and clean image prior, saving the effort of manually designing them. We also learn to predict the atmospheric light instead of estimating it with traditional methods. Third, extensive experiments show the effectiveness of learning haze-related image priors, and the proposed proximal dehaze-net achieves promising results on both synthetic and real-world hazy images.
This paper is an extension of our previous work [22]. In this paper, we extend the contents of our work in the following three aspects. First, we reformulated our dehazing model by introducing an additional clean image prior learning module. We also learned to predict the atmospheric light instead of estimating it with the traditional methods. Second, we improved the performance of our model by replacing lightweight sub-networks with more powerful CNN backbones. Moreover, we treated hyper-parameters that need manual adjusting as learnable variables. Third, for fair comparisons, we evaluated our method on multiple public datasets, including SOTS and NTIRE 2018 datasets, and our method achieves promising results on both synthetic images and real-world images. As a comparison between the original PDN model [22] (denoted as PDN-ECCV) and the current extended PDN model, we show in Figure 2 two examples of synthetic and real hazy image dehazing. We can see that the extended PDN model improves the dehazing ability both quantitatively and qualitatively.
Related Work
A. Haze-Related Image Priors
Most traditional dehazing methods assume an image prior on haze-free images or latent transmission maps based on human experiences. Researchers have proposed various effective priors for single image dehazing. The most related work to ours is the dark channel prior (DCP) [4]. The dark channel of a color image is defined as the minimum of local image patches:\begin{equation*} I^{d}(x)=\min _{c\in \{r,g,b\}}\Big (\min _{y\in \rm {\Omega }(x)}\big (I^{c}(y)\big)\Big),\tag{2}\end{equation*}
\begin{equation*} T(x)=1-\omega \min _{c\in \{r,g,b\}}\left({\min _{y\in \rm {\Omega }(x)}\left({\frac {I^{c}(y)}{A^{c}}}\right)}\right),\tag{3}\end{equation*}
Since image priors usually rely on human observations and experiences, the traditional dehazing methods are not always applicable to diverse scenes. For instance, DCP is effective for dehazing but may fail when the scene color is close to the atmospheric light, e.g., sky regions in the wild environment and light color walls in cityscapes. Instead of constraining dark channel to be close to zero as in DCP, we learn dark channel prior by learning its corresponding proximal mapping from training data using a convolutional neural network, potentially being able to well approximate the dark channels of haze-free images as shown in Figure 1.
B. Dehazing by Energy Minimization
It is a common practice to build energy functions for various image restoration and reconstruction problems. For the problem of single image dehazing, there are also several methods proposed based on image priors and energy minimization [2], [7], [10], [34]–[38]. The main idea of these methods is to firstly find effective image priors for describing transmission maps or haze-free images, and then build energy functions with these image priors as regularization terms. The energy function is then minimized for optimal transmission maps or haze-free images in an iterative manner.
However, it highly relies on expert experiences to find or design effective dehazing priors. It is often time-consuming for most energy-minimization methods to dehaze a single image due to a quantity of optimization steps. There also exist hyper-parameters in the energy models that usually have to be carefully adjusted in order to get the best visual effect for each individual image. As a comparison, our proposed model can adaptively learn haze-related image priors. Through discriminative learning, we can reduce the number of iterations of the optimization algorithm to only 2 or 3, which greatly reduces the time cost. In the meanwhile, hyper-parameters in the energy model are also learned during training, saving the effort of manually tuning.
C. Deep Unfolding Networks
Recently, there have been several works to solve image inverse problems under the iterative deep learning framework [22], [24], [33], [39]–[45]. Zhang et al. [40] train a set of effective denoisers and plug them in the scheme of the half-quadratic splitting algorithm as modules. Meinhardt et al. [41] solve the inverse problem in image processing using the primal-dual hybrid gradient method, and replace the proximal operator with a denoising neural network. In [39], [42], [43], the linear inverse problems are solved by learning proximal operators in the scheme of iterative optimization algorithms. These methods can well solve linear inverse problems such as denoising, super-resolution, non-blind deconvolution, compressive sensing MRI, etc.
Compared with these works, we focus on single image dehazing, which is a challenging inverse problem with more unknown variables in the imaging model. Instead of using common linear inverse models in these works, we specify single image dehazing as a non-linear inverse problem with regularization terms for haze-related features. We propose to discriminatively learn effective image priors by learning proximal mappings for the regularization terms using CNNs. The most related works to ours are [24], [45]. In [24], the authors learned deep priors for single image dehazing, but did not investigate image priors on haze-related features. In [45], the authors learned image priors for deraining problem based on the model-driven approach. To the best of our knowledge, our work [22] is the first to learn haze-related priors for image dehazing task.
Dehazing as an Inverse Problem
In this section, we first build an energy function with physical model constraints in both image space and feature space, and then design an iterative algorithm for energy minimization based on the half-quadratic splitting (HQS) algorithm.
A. Dehazing Energy Model
Considering the haze imaging model in Eqn. (1), given a hazy image \begin{equation*} I^{c}(x) - A^{c} = (J^{c}(x) - A^{c}) T(x),~c \in \{r,g,b\},\tag{4}\end{equation*}
\begin{equation*} P^{c} = Q^{c} \circ T,~c \in \{r,g,b\},\tag{5}\end{equation*}
Now we consider physical constraint in haze-related feature space. Let \begin{equation*} \Phi _{h}(P) = \Phi _{h}(Q\circ T),\tag{6}\end{equation*}
\begin{equation*} P^{d}= Q^{d} \circ T,\tag{7}\end{equation*}
By enforcing Eqns. (5) and (7) as data fidelity terms, we design a dehazing energy function:\begin{align*}&\hspace {-0.5pc}E(Q,T) = \frac {\alpha }{2} \sum _{c}\|P^{c} - Q^{c} \circ T\|_{F}^{2} + \frac {\beta }{2} \|P^{d} - Q^{d} \circ T\|_{F}^{2} \\& \qquad\qquad\qquad\qquad\qquad\qquad + f(Q^{d}) + g(T) + h(Q),\tag{8}\end{align*}
\begin{equation*} \left \{{Q^{*},T^{*}}\right \} = \mathop {\mathrm {arg\,min}} _{Q,T}~E(Q,T).\tag{9}\end{equation*}
Regularization Terms: We have three regularization terms
B. Model Optimization
It is non-trivial to directly solve optimization problem Eqn. (9), so we turn to the half-quadratic splitting (HQS) algorithm to break it into easier sub-problems. The HQS algorithm has been widely used to solve image inverse problems [50]–[54]. By introducing an auxiliary variable \begin{align*}&\hspace {-0.5pc}E(Q,T,U) = \frac {\alpha }{2} \sum _{c} \|Q^{c} \circ T - P^{c}\|_{F}^{2} + \frac {\beta }{2} \|U \circ T - P^{d}\|_{F}^{2} \\&+\, \frac {\gamma }{2} \|U - Q^{d}\|_{F}^{2} + f(U) + g(T) + h(Q),\tag{10}\end{align*}
Update \begin{align*}&\hspace {-0.5pc}U_{n} = \mathop {\mathrm {arg\,min}} _{U}~\frac {\beta }{2}\|U\circ T_{n-1}-P^{d}\|_{F}^{2} \\&+\, \frac {\gamma }{2}\|U-Q^{d}_{n-1}\|_{F}^{2} + f(U),\tag{11}\end{align*}
\begin{equation*} U_{n} = \mathrm {prox}_{\frac {1}{u_{n}}f}(\hat {U}_{n}),\tag{12}\end{equation*}
\begin{equation*} \hat {U}_{n} = \frac {\beta T_{n-1} \circ P^{d} + \gamma Q_{n-1}^{d}}{\beta T_{n-1}\circ T_{n-1} + \gamma },\tag{13}\end{equation*}
\begin{equation*} \mathrm {prox}_{\lambda f}(V) = \mathop {\mathrm {arg\,min}} _{X}\frac {1}{2}\|X-V\|_{F}^{2} + \lambda f(X),\tag{14}\end{equation*}
Update \begin{align*}&\hspace {-0.5pc}T_{n} = \mathop {\mathrm {arg\,min}} _{T}~\frac {\alpha }{2}\sum _{c}\|Q^{c}_{n-1}\circ T - P^{c}\|_{F}^{2} \\&+\, \frac {\beta }{2}\| U_{n}\circ T - P^{d} \|_{F}^{2} + g(T).\tag{15}\end{align*}
Then we can derive \begin{equation*} T_{n} = \mathrm {prox}_{\frac {1}{t_{n}}g}\left ({\hat {T}_{n} }\right),\tag{16}\end{equation*}
\begin{equation*} \hat {T}_{n} = \frac {\alpha \sum _{c} Q_{n-1}^{c}\circ P^{c} + \beta U_{n}\circ P^{d}}{\alpha \sum _{c} Q_{n-1}^{c}\circ Q_{n-1}^{c} + \beta U_{n}\circ U_{n}},\tag{17}\end{equation*}
Update \begin{align*}&\hspace {-0.5pc}Q_{n} = \mathop {\mathrm {arg\,min}} _{Q} \frac {\alpha }{2}\sum _{c}\|Q^{c} \circ T_{n} -P^{c}\|_{F}^{2} \\& \qquad\qquad\qquad\qquad\quad +\, \frac {\gamma }{2}\|Q^{d} - U_{n}\|_{F}^{2} + h(Q).\tag{18}\end{align*}
Since computing the dark channel of an image is to extract the smallest value from the local color patch around each pixel, the second term of Eqn (18) only constrains on limited pixels in the original image \begin{equation*} Q_{n} = \mathrm {prox}_{\frac {1}{q_{n}}h}(\hat {Q}_{n}),\tag{19}\end{equation*}
After
Algorithm 1 Energy Minimization for Single Image Dehazing With Half-Quadratic Splitting
Hazy image
Transmission
Estimate atmospheric light
Let
Let
for
Update
Update
Update
Update
end for
return
For an instance of our energy model, we concretize these regularization terms in Eqn (10). Specifically, for dark channel \begin{align*} f(U)=&\|U\|_{1}, \\ g(T)=&\text {ATGV}(T), \\ h(Q)=&0.\end{align*}
ATGV is the anisotropic total generalized variation:\begin{equation*} \min _{T,V}~\alpha _{1} \|D^{\frac {1}{2}}(\nabla T - V)\|_{1} + \alpha _{0} \|\nabla V\|_{1} + \mathbf {w}\|T-\hat {T}\|_{1},\end{equation*}
Proximal Dehaze-Net
Although our energy-based method can remove hazes effectively, we can further improve it in visual performance and processing speed by learning haze-related image priors. Instead of designing image priors by hand according to human experiences, we model haze-related priors with convolutional neural networks via learning proximal mappings appeared in Section III-B. Note that the above introduced optimization process requires the atmospheric light
The core part of this framework is an architecture with
As mentioned above, instead of designing image priors by hand, we model them by using CNNs to learn their corresponding proximal operators \begin{align*} {U}_{n}=&\mathrm {prox}_{\frac {1}{u_{n}}f}(\hat {U}_{n}) \triangleq \mathcal {F}_{n}(\hat {U}_{n}), \\[-2pt] {T}_{n}=&\mathrm {prox}_{\frac {1}{t_{n}}g}(\hat {T}_{n}) \triangleq \mathcal {G}_{n}(\hat {T}_{n}), \\[-2pt] {Q}_{n}=&\mathrm {prox}_{\frac {1}{q_{n}}h}(\hat {Q}_{n}) \triangleq \mathcal {H}_{n}(\hat {Q}_{n}),\tag{20}\end{align*}
At the \begin{equation*} U_{n} = \mathcal {F}_{n}(\hat {U}_{n}) \triangleq \text {U-Net}_{n}([\hat {U}_{n}, P]),\tag{21}\end{equation*}
Similarly, \begin{align*} \hat {T}_{n'}=&\text {T-Net}_{n}\big ([\hat {T}_{n}, P] \big), \\ T_{n}=&\text {GIF}\big (\hat {T}_{n'}, P \big),\tag{22}\end{align*}
\begin{equation*} T_{n} = \mathcal {G}_{n}(\hat {T}_{n}) \triangleq \text {GIF} \big ({\text {T-Net}_{n}}([\hat {T}_{n}, P]), P \big).\tag{23}\end{equation*}
Finally, for \begin{equation*} Q_{n} = \mathcal {H}_{n}(\hat {Q}_{n}) \triangleq {\text {Q-Net}_{n}}([\hat {Q}_{n}, P]),\tag{24}\end{equation*}
After
Sub-Networks: Our proximal dehaze-net includes four kinds of sub-networks, i.e., A-Net, U-Net, T-Net and Q-Net, and they share similar structures. As shown in Figure 5, we adopt the commonly used residual encoder-decoder (RED) as the base architecture for these sub-networks. Accordingly, these sub-networks consist of several stacked down-sampling convolution blocks for the encoder part and up-sampling convolution blocks for the decoder part. The bottleneck is made up of stacked residual blocks [57], [58]. Skip connections between the encoder and the decoder are used to prevent from losing spatial information. For the last layer of an RED, we use Sigmoid for A-Net and T-Net, since the outputs are within the range of [0, 1], and we use Tanh for U-Net and Q-Net since their outputs are within the range of
GIF Block: GIF block stands for guided filtering [56] computation block within our proximal dehaze-net. GIF block enforces the transmission map learned by T-Net to be well aligned with the image in edges. It takes the hazy image
Loss Functions: To train our proximal dehaze-net, we introduce commonly used loss functions, including \begin{equation*} L_{A} = \|A^{*} - A^{gt}\|_{1} + \varepsilon _{1}~TV(A^{*}) + \varepsilon _{2}~L_{s}(A^{*}, A^{gt}),\end{equation*}
\begin{equation*} L_{z} = \sum _{n=1}^{N} \|z_{n} - z^{gt}\|_{1} + \varepsilon _{1}~TV(z_{n}) + \varepsilon _{2}~L_{s}(z_{n}, z^{gt}),\end{equation*}
\begin{equation*} L = \lambda _{A} L_{A} + \lambda _{U} L_{U} + \lambda _{T} L_{T} + \lambda _{Q} L_{Q},\end{equation*}
Experiments
To verify the effectiveness of the proposed proximal dehaze-net, we evaluate our method on different datasets and compare it with other single image dehazing methods.
A. Datasets
We evaluate our proximal dehaze-net on multiple benchmark datasets for single image dehazing, including RESIDE dataset [60] and NTIRE 2018 single image dehazing challenge [61]. Both RESIDE and NTIRE 2018 datasets consist of indoor and outdoor subsets.
1) Reside Dataset
RESIDE dataset consists of indoor and outdoor datasets. The indoor dataset contains 13990 generated hazy/clean training images and 500 test images. The outdoor dataset contains about 8400 haze-free images with depth maps that can be used to synthesize training pairs and 500 hazy images as the test set. Note that for the outdoor dataset, we first remove redundant images from the training set that are overlapped with the test set. For each dataset, we randomly crop 64000 patches of
2) NTIRE 2018 Dataset
NTIRE 2018 single image dehazing challenge also consists of two subsets, I-Hazy (indoor hazy images) and O-Hazy (outdoor hazy images). I-Hazy contains 25 training images and 5 test images while O-Hazy contains 35 training images and 5 test images. Similar to RESIDE, we train two models separately for indoor and outdoor images.
3) Data Pre-Processing
To effectively train our PDN model, we need ground truths \begin{align*} E(A,T) = \frac {1}{2}\sum _{c}\| J^{c} \circ T + A^{c} (1 - T) - I^{c} \|_{F}^{2} + \lambda \text {TV}(T), \\\tag{25}\end{align*}
B. Implementation Details
We implement and train our PDN model with PyTorch [62] framework. Note that hyper-parameters
C. Results on Synthetic Datasets
We first evaluate our proximal dehaze-net on synthetic datasets and compare it with other single image dehazing methods. We select some representative single image dehazing methods, including DCP [4], CAP [8], NLD [9], GRM [10], IDE [11], MSCNN [16], DehazeNet [17], AODNet [18], DcGAN [21], GDN [23], DADN [28] and FDGAN [27]. Among these methods, DCP, CAP, NLD, GRM and IDE are traditional image processing methods based on image priors. MSCNN, DehazeNet, AODNet, DcGAN, GDN, DADN and FDGAN are deep learning methods that predict either transmission maps or clean images. For fair comparisons, we retrained these learning-based methods on the corresponding datasets if the training codes are provided by authors. We evaluate these methods on SOTS and NTIRE 2018 datasets. Both datasets consist of indoor and outdoor subsets. We report the peak-signal-to-noise ratio (PSNR) and structural similarity (SSIM) as performance metrics.
As shown in Table 1, the learning-based methods are usually quantitatively superior to the traditional prior-based methods. Our proposed PDN model surpasses most methods and achieves competitive PSNR and SSIM values on multiple benchmark datasets. Specifically, on SOTS indoor and SOTS outdoor datasets, we achieve the highest PSNRs and comparable SSIMs with state-of-the-art method GDN [23]. However, on realistic dehazing datasets, NTIRE 2018 (I-Hazy and O-Hazy in Table 1), we achieve the best PSNRs and SSIMs among compared methods on both indoor and outdoor cases. The dehazing results on NTIRE datasets also verify the effectiveness and rationality of the proposed data pre-processing algorithm to generate atmospheric lights and transmission maps from given hazy/clean image pairs.
As a visualization, we show some dehazing examples from these benchmark datasets in Figure 8. The 1st and 2nd examples are from SOTS indoor dataset, the 3rd and 4th examples are from SOTS outdoor dataset, and the last two examples are from I-Hazy and O-Hazy datasets respectively. From Figure 8, we can see that learning-based methods usually behave better than the traditional prior-based methods in visual effect. The traditional methods (DCP, CAP) sometimes over dehaze the images and cause darkened images and color distortion. On the other hand, the learning-based methods can produce dehazed images that are closer to ground truths. However, the learning-based methods sometimes fail to effectively remove all hazes on an image, such as DehazeNet and AODNet. Recent methods, such as GDN, are powerful in removing hazes from synthetic images but behave not that well on realistic images, e.g., the last two examples in Figure 8. As a comparison, our method can remove hazes effectively while keeping dehazed images visually pleasing.
D. Results on Real Datasets
In Figures 9 and 10, we respectively show the dehazing results of our method on real-world hazy images by comparing with the traditional methods and the learning-based methods. On one hand, as we can see from Figure 9, the traditional methods are usually effective in removing hazes due to the investigation of useful haze-related image priors. However, they tend to overly enhance the hazy images, which causes over-saturation or color distortion, such as BCCR [7], CAP [8], NLD [9] and IDE [11]. DCP [4] are likely to produce undesirable artifacts, especially in the sky regions. GRM [10] can well suppress artifacts but will lose detailed textures due to its smooth regularization term. On the other hand, the learning-based methods are usually able to produce more visually pleasing results, as shown in Figure 10. However, MSCNN [16] and DehazeNet [17] are not always as effective as the traditional methods in haze removal. AODNet [18] often produces darken images. DCPDN [20] predicts inexact transmission maps in areas with high intensity. GFN [19] and DcGAN [21] directly predict haze-free images and sometimes cause unexpected dark artifacts due to the lack of haze imaging model constraint. FDGAN [27] and DADN [28] can produce more visual-pleasing results, but they both rely on photo-realistic synthetic training data. By learning haze-related image priors, our method combines the advantages of the traditional methods and the learning-based methods, being able to effectively remove hazes and keep the dehazed images natural in the meantime.
Discussion and Analysis
We now discuss the parameter selection, effectiveness of learning haze-related image priors as well as the necessity of GIF block. We then analyze the effect of stage numbers on the performance of our model. We also compare the running speed with the recent dehazing methods.
A. Effects of Parameters for Loss Terms
As we mentioned in Section IV, we utilize a combined loss function for training our network, introducing multiple hyper-parameters for different loss terms, i.e.,
Performance of our PDN model on validation dataset with different parameter settings.
B. Learning Image Priors
Our network learns multiple haze-related image priors. In this section, we discuss the effect of learning each image prior. To do so, we respectively remove dark channel prior
Effectiveness of learning haze-related priors. The full PDN model with learning all haze related priors achieves the best performance.
As a visualization, we show an example of the dehazing results of these variants of our model in Figure 13. We can see that our full model achieves the highest PSNR, and learning all these image priors will help to remove hazes more effectively. Without learning dark channel prior or transmission prior, our model can also effectively dehaze images, but some slight hazes are still observable. Without learning clean image prior or the atmospheric light, the remaining hazes in the image are still quite obvious.
Dehazing results of the variants of our model measured in PSNR. Omitting these prior learning modules causes that the hazes are not removed effectively, and the full model achieves the highest PSNR.
To illustrate what are learned for the proximal mappings
The learned hazed related image priors by our proximal dehaze-net.
C. Necessity of Gif Block
GIF block is a part of the learned proximal mapping
Effectiveness of using GIF block. (a) Input hazy image. (b) Without using GIF block, there are obvious halo artifacts along image edges. (c) Using GIF block, our proximal dehaze-net produces haze-free image without artifacts. Please zoom in for better visualization.
D. Effect of Multi-Stage Network
Our proximal dehaze-net model is a multi-stage learning architecture based on dehazing optimization algorithm, and more network stages are supposed to achieve higher performance on the benchmarks. However, since we use residual encoder-decoder (RED) backbone as sub-networks, which is powerful in universal image restoration tasks, we are able to achieve satisfactory results with only one stage both on quantitative performances and visual effects. As shown in Table 1 and Figure 12, we use a two-stage PDN to achieve the best PSNR on SOTS dataset. Compared with the one-stage PDN, the two-stage PDN improves the PSNR values on indoor and outdoor datasets by 1.02 dB and 0.42 dB respectively. The improvements of SSIM values are less than 0.01. On NTIRE I-Hazy and O-Hazy datasets, we achieve the best performances with a one-stage PDN. Adding more stages will not continue to improve qualitative results significantly. Furthermore, the visual effects are similar among PDN models with different stages. For simplicity, we use a two-stage PDN (PDN-S2) for SOTS and a one-stage PDN (PDN-S1) for NTIRE respectively in our paper.
E. Running Time
The complex part of our model in implementation is the dark channel computation and its backward process. In the conference version of our work [22], we realize this with low-level CUDA language for high-speed computation. In [32], dark channel and its backward are realized with the look-up table technique. In [63], dark channel is calculated by extracting local patches and finding the minimum. In this paper, we find that dark channel can be simply computed as the negative one-stride max-pooling operation, which densely extracts the local maximum of a negative image, i.e.,
Other Applications and Limitations
A. Extensions to More Applications
Though our network is trained for image dehazing, we can apply it to other similar tasks without the need to retrain on their corresponding datasets. Figure 16 (a)-(c) show an example of anti-halation enhancement using our method. Although halation has a different imaging model, it brings haze-like effects to image [17], [18]. Our proximal dehaze-net can be directly applied to anti-halation image enhancement. In Figure 16 (d)-(f), we show an example of underwater image enhancement. Ignoring the forward scattering component, the simplified underwater optical model has a similar formulation with haze imaging model [64], [65]. Compared with methods that are specifically designed for this task such as [64], our network can also effectively remove haze-like effects in this underwater image.
Extension to similar tasks. The first row shows an example of anti-halation enhancement. The second row shows an example of underwater image enhancement.
B. Failed Cases
While our method behaves well on most natural images, it has limitations in certain situations where the photo is taken in the night or under heavy hazy weather. For night-time hazy images, there are usually multiple light and color sources, and the image quality is degraded by low light conditions. The commonly used haze imaging model is not sufficient to describe the night-time haze phenomenon, and our PDN model has no access to such kind of training data. As shown in Figure 17 (a)-(c), our method fails to remove all the hazes in the image. On the other hand, the method [66] designed for night-time dehazing is capable of removing hazes more adequately, but the visual quality is also lowered. As for images with very thick fog, too many details are overwhelmed by hazes, and it becomes quite difficult for current methods to achieve satisfactory results. As shown in Figure 17 (d)-(f), there still remains visible hazes in the dehazed image by our method. As a comparison, iPal-DH [67], trained on Dense Haze dataset [68], is able to dehaze images with dense haze more effectively, but it also fails to recover lost details covered by hazes.
Failure cases of our method. The first row shows an example of night-time image dehazing. The second row shows an example of heavy fog image dehazing.
Conclusion
In this paper, we propose a model-driven deep learning approach, proximal dehaze-net, for single image dehazing. We first build an energy function based on haze imaging model constraints in both image space and haze-related feature space, and then design an iterative algorithm for solving the energy model. We unfold the iterative algorithm into a multi-stage network by learning proximal operators using CNNs. The proposed proximal dehaze-net achieves promising results for single image dehazing.
Although the proposed PDN model is effective in most cases, our method shares some inherent drawbacks with other learning-based methods. First, the learning scheme is built upon the haze imaging model, which may limit the applications to more complex real-world situations, such as non-uniform haze or night-time haze. This problem can be solved by considering imaging models that can better describe real situations and better techniques to simulate realistic datasets. Second, our method has difficulty in handling cross-domain image dehazing problems. As shown in Table 1, we have to train separate models to achieve the best performance on different benchmarks, and a model trained on one dataset may behave poorly on another dataset. This problem can be solved by the simple early stopping strategy to prevent over-fitting to a specific dataset but will decrease the performance on this dataset. There exists a trade-off between the performance on one dataset and the generalization ability of the trained model. In the future, we will consider domain adaptation to better handle this problem.