Loading web-font TeX/Math/Italic
A Model-Driven Deep Dehazing Approach by Learning Deep Priors | IEEE Journals & Magazine | IEEE Xplore

A Model-Driven Deep Dehazing Approach by Learning Deep Priors


We construct an end-to-end model-driven framework for single image dehazing. The haze-related priors are discriminatively learned by learning corresponding proximal opera...

Abstract:

Photos taken in hazy weather are usually covered with white masks and lose important details. Haze removal is a fundamental task and a prerequisite to many other vision t...Show More

Abstract:

Photos taken in hazy weather are usually covered with white masks and lose important details. Haze removal is a fundamental task and a prerequisite to many other vision tasks. Single image dehazing is an ill-posed inverse problem that has attracted much attention in recent years. Generally, current single dehazing methods can be categorized into the traditional prior-based methods and the data-driven deep learning methods that respectively investigate haze-related image priors and deep architectures. In this paper, we propose a novel model-driven deep learning approach that combines the advantages of both kinds of methods. First, we build an energy model for single image dehazing with physical constraints in both color image space and haze-related feature space (implemented as dark channel space in this work), regularized by haze-related image priors. Then, we design an iterative optimization algorithm for solving the proposed dehazing energy model based on the half-quadratic splitting algorithm, and the priors are transformed to their corresponding proximal operators. Finally, inspired by the optimization algorithm, we design a deep dehazing neural network, dubbed as proximal dehaze-net, by learning the proximal operators for haze-related image priors using CNNs. Our network incorporates physical model constraints of hazes and haze-related prior learning into a novel deep architecture. Extensive experiments show that our method achieves promising performance for single image dehazing.
We construct an end-to-end model-driven framework for single image dehazing. The haze-related priors are discriminatively learned by learning corresponding proximal opera...
Published in: IEEE Access ( Volume: 9)
Page(s): 108542 - 108556
Date of Publication: 30 July 2021
Electronic ISSN: 2169-3536

Funding Agency:

No metrics found for this document.

CCBY - IEEE is not the copyright holder of this material. Please follow the instructions via https://creativecommons.org/licenses/by/4.0/ to obtain full-text articles and stipulations in the API documentation.
SECTION I.

Introduction

Haze is an atmospheric phenomenon where dust, smoke, or dry particles obscure the clarity of a scene. Hazes usually degrade the quality of photos by reducing the total contrast in color and occluding objects in photos. It is necessary to reduce the haze effect on these photos to make them visually pleasing and appealing. Moreover, many vision tasks in practical life depend on clean images, such as face detection and automatic license plate recognition from monitoring images, scene analysis from satellite images, etc. However, real captured images even on sunny days often suffer from low visual quality with haze effect. It is therefore essential for a vision system to firstly remove hazes from the captured images and then conduct detection or recognition.

In hazy images, only a portion of the reflected light reaches the observer because of the absorption in the atmosphere. Based on this observation, the captured image I of a hazy scene can be modeled as a linear combination of the direct attenuation and the airlight [1]–​[4]:\begin{equation*} I(x) = J(x)T(x) + A(1-T(x)),\tag{1}\end{equation*}

View SourceRight-click on figure for MathML and additional features. where x is the spatial coordinate, I is the image degraded by hazes, J is the scene radiance or haze-free image, A is the global atmospheric light, and T(x)=\exp (-\eta d(x)) is the media transmission along the cone of vision which depends on scattering coefficient \eta and scene depth d(x) . Depending on the number of given hazy images, the image dehazing task can be divided into single image dehazing and multi-image dehazing. In this paper, we focus on single image dehazing that requires to recover the unknown haze-free image J , atmospheric light A , and transmission T from a single input hazy image I . Single image dehazing is more challenging compared with multi-image dehazing, and it is essential to investigate effective haze-related priors to regularize this inverse problem. Previous works can be roughly categorized into the traditional methods that investigate various haze-related priors and the learning-based methods that build learning systems for dehazing.

The traditional image dehazing methods [1]–​[12] have investigated various haze-related image priors. Tan [2] assumes that the contrast of hazy images is lower than haze-free images and propose to maximize the contrast of hazy images under the MRF framework. Fattal [3] uses independent component analysis for estimating the transmission in hazy scenes assuming that the transmission and surface shading are locally uncorrelated. He et al. [4] propose the dark channel prior to estimate the transmission map based on the observation that the local minimums of color channels of haze-free images are close to zero. Liu et al. [12] propose the rank-one prior based on the observation that in most of the regions except the light source area, the imaging scene is covered by spatially homogenous light. Polarization-based methods [13], [14] are also effective for haze removal, which are based on the fact that the airlight scattered by atmospheric particles is partially polarized. These methods are effective in image dehazing due to the full investigation of image prior knowledge and the understanding of physical mechanism of hazes. However, these priors are mainly based on human observations that would not always hold for diverse real-world images. For example, dark channel prior [4] is effective for most outdoor images but usually fails for those containing large areas of white scenery such as white walls or clouds in the sky, as shown in Figure 1 (b). Despite the effectiveness, polarization-based dehazing requires professional optical equipment and multiple image fusion, which is not suitable for single image dehazing.

FIGURE 1. - Single image dehazing results. (a) Input hazy image. (b) Recovered haze-free image using DCP. (c) Dark channel of the input image. (d) Transmission map by DCP. (f)~(h) Recovered haze-free image, dark channel and transmission map by our network. (e) Comparisons on image blocks. Our method can better deal with sky regions.
FIGURE 1.

Single image dehazing results. (a) Input hazy image. (b) Recovered haze-free image using DCP. (c) Dark channel of the input image. (d) Transmission map by DCP. (f)~(h) Recovered haze-free image, dark channel and transmission map by our network. (e) Comparisons on image blocks. Our method can better deal with sky regions.

FIGURE 2. - A comparison of the extended PDN model and the original PDN model. We can see that the extended PDN model achieves dramatic improvement in PSNR (17.77 to 28.78) and removes haze more effectively.
FIGURE 2.

A comparison of the extended PDN model and the original PDN model. We can see that the extended PDN model achieves dramatic improvement in PSNR (17.77 to 28.78) and removes haze more effectively.

FIGURE 3. - Dehazing results on a real-world hazy image of our energy-based method as well as four energy minimization based methods. We also show the results of the original dark channel prior and our proximal dehaze-net as comparisons. Ours-EM can achieve satisfactory result while our learning-based model further improves the visual effect and processing speed.
FIGURE 3.

Dehazing results on a real-world hazy image of our energy-based method as well as four energy minimization based methods. We also show the results of the original dark channel prior and our proximal dehaze-net as comparisons. Ours-EM can achieve satisfactory result while our learning-based model further improves the visual effect and processing speed.

FIGURE 4. - The architecture of proximal dehaze-net. We first estimate the atmospheric light 
$A$
 from input hazy image 
$I$
 by A-Net, then subtract 
$A$
 from 
$I$
 in each color channel to get the scaled hazy image 
$P$
 (
$P^{c} = I^{c} - A^{c}$
), which will be sent into a multi-stage learning framework to predict the haze-free image. For each stage of the learning framework, we first calculate the auxiliary variables 
$\hat {U}_{n}$
, 
$\hat {T}_{n}$
 and 
$\hat {Q}_{n}$
 following Algorithm 1 and then learn corresponding proximal mappings with CNNs. 
$\mathcal {F}_{n}$
, 
$\mathcal {G}_{n}$
, and 
$\mathcal {H}_{n}$
 in the gray dashed box are sub-modules to be learned at the 
$n$
-th stage. After 
$N$
 stages, we get the final predictions 
$U_{N}$
, 
$T_{N}$
 and 
$Q_{N}$
. We add 
$A$
 back to 
$Q_{N}$
 to reconstruct the clean image 
$J$
. The whole framework can be trained end-to-end.
FIGURE 4.

The architecture of proximal dehaze-net. We first estimate the atmospheric light A from input hazy image I by A-Net, then subtract A from I in each color channel to get the scaled hazy image P (P^{c} = I^{c} - A^{c} ), which will be sent into a multi-stage learning framework to predict the haze-free image. For each stage of the learning framework, we first calculate the auxiliary variables \hat {U}_{n} , \hat {T}_{n} and \hat {Q}_{n} following Algorithm 1 and then learn corresponding proximal mappings with CNNs. \mathcal {F}_{n} , \mathcal {G}_{n} , and \mathcal {H}_{n} in the gray dashed box are sub-modules to be learned at the n -th stage. After N stages, we get the final predictions U_{N} , T_{N} and Q_{N} . We add A back to Q_{N} to reconstruct the clean image J . The whole framework can be trained end-to-end.

FIGURE 5. - Sub-network structure. We adopt residual encoder-decoder (RED) as the base architecture of sub-networks in our model. We use 2-stride convolution for down-sampling and 2-stride transposed convolution for up-sampling. For A-Net and T-Net, the last layer is Sigmoid. For U-Net and Q-Net, the last layer is Tanh. The bottleneck of RED is stacked residual blocks.
FIGURE 5.

Sub-network structure. We adopt residual encoder-decoder (RED) as the base architecture of sub-networks in our model. We use 2-stride convolution for down-sampling and 2-stride transposed convolution for up-sampling. For A-Net and T-Net, the last layer is Sigmoid. For U-Net and Q-Net, the last layer is Tanh. The bottleneck of RED is stacked residual blocks.

FIGURE 6. - Computation graph of GIF block. Based on [56], we construct its differentiable computation graph. GIF block takes image 
$p$
 and guide image 
$I$
 as inputs, outputs the filtered image 
$q$
. In our paper, we set 
$p:=\text {T-Net}(\hat {T})$
, 
$I:=P$
 and 
$q:=T$
.
FIGURE 6.

Computation graph of GIF block. Based on [56], we construct its differentiable computation graph. GIF block takes image p and guide image I as inputs, outputs the filtered image q . In our paper, we set p:=\text {T-Net}(\hat {T}) , I:=P and q:=T .

FIGURE 7. - Intermediate results of transmission maps produced by our proposed data-preprocessing.
FIGURE 7.

Intermediate results of transmission maps produced by our proposed data-preprocessing.

FIGURE 8. - Dehazing results on synthetic images from multiple benchmark datasets.
FIGURE 8.

Dehazing results on synthetic images from multiple benchmark datasets.

FIGURE 9. - Dehazing results of the prior-based methods on real-world images.
FIGURE 9.

Dehazing results of the prior-based methods on real-world images.

In recent years, deep learning methods [15]–​[29] have overwhelmed single image dehazing area. The early learning-based methods usually learn transmission maps or other haze-related variables. For instance, Ren et al. [16] and Cai et al. [17] design CNNs to learn transmission maps and then recover clean images, and Li et al. [18] propose to learn a K -module for the estimation of transmission T and atmospheric light A . More recent works [19], [23], [25], [26], [29]–​[31] propose to learn a direct mapping between hazy images and clean images by developing new CNN architectures or investigating novel effective loss functions. Besides, to achieve more realistic dehazing results, generative adversarial networks [20], [21], [25], [28], [32] are also widely used in image dehazing task. The learning-based methods have shown promising results for single image dehazing. However, these methods usually take CNNs to learn a mapping from input hazy images to the transmissions or haze-free images, without considering haze-related image priors to constrain the mapping space compared with the traditional methods.

Our main motivation is to combine haze imaging mechanism and deep learning in a novel model-driven deep learning framework [33], which takes the advantages of both the prior-based methods and the deep learning-based methods. Compared with the traditional prior-based methods, our approach is free of parameter tuning after training and runs at a higher speed. Most importantly, combined with the data-driven method, the dehazing performance is significantly improved. Compared with the deep learning-based methods, our approach integrates physical mechanism constraint into the deep learning framework and can produce stable dehazing results (see Figure 10). Moreover, our method learns to explicitly predict atmospheric light and transmission maps. They may be helpful for model explanation and downstream tasks such as semantic segmentation on hazy images.

FIGURE 10. - Dehazing results of the learning-based methods on real-world images.
FIGURE 10.

Dehazing results of the learning-based methods on real-world images.

We build our model-driven learning approach by the following steps. First, based on the haze imaging model, we formulate the inverse problem of single image dehazing as an energy model with physical constraints in image space and haze-related feature space (dark channel space in this paper), regularized by haze-related image priors. Second, we design an iterative optimization algorithm for minimizing the dehazing energy function using the half-quadratic splitting algorithm, with proximal operators for modeling the regularization terms. Third, we propose a deep neural network based on the iterative algorithm, dubbed as proximal dehaze-net, to implicitly learn these image priors by learning their corresponding proximal operators using convolutional neural networks.

In summary, our work makes three main contributions. First, we propose a novel energy model for single image dehazing, which investigates the haze imaging model constraints in both image space and haze-related feature space. Second, based on the iterative algorithm for minimizing the energy model, we design a multi-stage deep neural network, by discriminatively learning haze-related image priors, including dark channel prior, transmission prior and clean image prior, saving the effort of manually designing them. We also learn to predict the atmospheric light instead of estimating it with traditional methods. Third, extensive experiments show the effectiveness of learning haze-related image priors, and the proposed proximal dehaze-net achieves promising results on both synthetic and real-world hazy images.

This paper is an extension of our previous work [22]. In this paper, we extend the contents of our work in the following three aspects. First, we reformulated our dehazing model by introducing an additional clean image prior learning module. We also learned to predict the atmospheric light instead of estimating it with the traditional methods. Second, we improved the performance of our model by replacing lightweight sub-networks with more powerful CNN backbones. Moreover, we treated hyper-parameters that need manual adjusting as learnable variables. Third, for fair comparisons, we evaluated our method on multiple public datasets, including SOTS and NTIRE 2018 datasets, and our method achieves promising results on both synthetic images and real-world images. As a comparison between the original PDN model [22] (denoted as PDN-ECCV) and the current extended PDN model, we show in Figure 2 two examples of synthetic and real hazy image dehazing. We can see that the extended PDN model improves the dehazing ability both quantitatively and qualitatively.

SECTION II.

Related Work

A. Haze-Related Image Priors

Most traditional dehazing methods assume an image prior on haze-free images or latent transmission maps based on human experiences. Researchers have proposed various effective priors for single image dehazing. The most related work to ours is the dark channel prior (DCP) [4]. The dark channel of a color image is defined as the minimum of local image patches:\begin{equation*} I^{d}(x)=\min _{c\in \{r,g,b\}}\Big (\min _{y\in \rm {\Omega }(x)}\big (I^{c}(y)\big)\Big),\tag{2}\end{equation*}

View SourceRight-click on figure for MathML and additional features. where I^{c} is a color channel of I , and \rm {\Omega }(x) is a local patch centered at x . Dark channel prior assumes that, in most non-sky patches, at least one color channel of a haze-free outdoor image has very low intensities at some pixels. According to dark channel prior, the transmission can be estimated by:\begin{equation*} T(x)=1-\omega \min _{c\in \{r,g,b\}}\left({\min _{y\in \rm {\Omega }(x)}\left({\frac {I^{c}(y)}{A^{c}}}\right)}\right),\tag{3}\end{equation*}
View SourceRight-click on figure for MathML and additional features.
where \omega is a constant for keeping aerial perspective.

Since image priors usually rely on human observations and experiences, the traditional dehazing methods are not always applicable to diverse scenes. For instance, DCP is effective for dehazing but may fail when the scene color is close to the atmospheric light, e.g., sky regions in the wild environment and light color walls in cityscapes. Instead of constraining dark channel to be close to zero as in DCP, we learn dark channel prior by learning its corresponding proximal mapping from training data using a convolutional neural network, potentially being able to well approximate the dark channels of haze-free images as shown in Figure 1.

B. Dehazing by Energy Minimization

It is a common practice to build energy functions for various image restoration and reconstruction problems. For the problem of single image dehazing, there are also several methods proposed based on image priors and energy minimization [2], [7], [10], [34]–​[38]. The main idea of these methods is to firstly find effective image priors for describing transmission maps or haze-free images, and then build energy functions with these image priors as regularization terms. The energy function is then minimized for optimal transmission maps or haze-free images in an iterative manner.

However, it highly relies on expert experiences to find or design effective dehazing priors. It is often time-consuming for most energy-minimization methods to dehaze a single image due to a quantity of optimization steps. There also exist hyper-parameters in the energy models that usually have to be carefully adjusted in order to get the best visual effect for each individual image. As a comparison, our proposed model can adaptively learn haze-related image priors. Through discriminative learning, we can reduce the number of iterations of the optimization algorithm to only 2 or 3, which greatly reduces the time cost. In the meanwhile, hyper-parameters in the energy model are also learned during training, saving the effort of manually tuning.

C. Deep Unfolding Networks

Recently, there have been several works to solve image inverse problems under the iterative deep learning framework [22], [24], [33], [39]–​[45]. Zhang et al. [40] train a set of effective denoisers and plug them in the scheme of the half-quadratic splitting algorithm as modules. Meinhardt et al. [41] solve the inverse problem in image processing using the primal-dual hybrid gradient method, and replace the proximal operator with a denoising neural network. In [39], [42], [43], the linear inverse problems are solved by learning proximal operators in the scheme of iterative optimization algorithms. These methods can well solve linear inverse problems such as denoising, super-resolution, non-blind deconvolution, compressive sensing MRI, etc.

Compared with these works, we focus on single image dehazing, which is a challenging inverse problem with more unknown variables in the imaging model. Instead of using common linear inverse models in these works, we specify single image dehazing as a non-linear inverse problem with regularization terms for haze-related features. We propose to discriminatively learn effective image priors by learning proximal mappings for the regularization terms using CNNs. The most related works to ours are [24], [45]. In [24], the authors learned deep priors for single image dehazing, but did not investigate image priors on haze-related features. In [45], the authors learned image priors for deraining problem based on the model-driven approach. To the best of our knowledge, our work [22] is the first to learn haze-related priors for image dehazing task.

SECTION III.

Dehazing as an Inverse Problem

In this section, we first build an energy function with physical model constraints in both image space and feature space, and then design an iterative algorithm for energy minimization based on the half-quadratic splitting (HQS) algorithm.

A. Dehazing Energy Model

Considering the haze imaging model in Eqn. (1), given a hazy image I\in \mathbb {R}^{M\times N\times 3} , we assume a known global atmospheric light A\in \mathbb {R}^{3} , and subtract A from both sides of Eqn. (1) in each color channel:\begin{equation*} I^{c}(x) - A^{c} = (J^{c}(x) - A^{c}) T(x),~c \in \{r,g,b\},\tag{4}\end{equation*}

View SourceRight-click on figure for MathML and additional features. where c is the color channel. For simplicity, let P^{c} = I^{c} - A^{c} and Q^{c} = J^{c} - A^{c} . Then P and Q represent the normalized hazy image and clean image respectively. Thus, Eqn. (4) can be rewritten in a concise form as:\begin{equation*} P^{c} = Q^{c} \circ T,~c \in \{r,g,b\},\tag{5}\end{equation*}
View SourceRight-click on figure for MathML and additional features.
where \circ is the Hadamard product for matrices. This is a physical constraint in image color space.

Now we consider physical constraint in haze-related feature space. Let \Phi _{h} be a transformation from image space to any haze-related feature space, and we simply apply \Phi _{h} on both sides of Eqn. (5):\begin{equation*} \Phi _{h}(P) = \Phi _{h}(Q\circ T),\tag{6}\end{equation*}

View SourceRight-click on figure for MathML and additional features. in which we let T element-wisely multiply Q channel by channel. There are various choices for transformation \Phi _{h} , such as dark channel, local max contrast, hue disparity and local max saturation, as mentioned by Tang et al. [15]. However, dark channel has been proved to be the most effective [15], we therefore take \Phi _{h} as the dark channel of an image. Thus, we physically constrain our model in the dark channel feature space. According to He [4], we can further assume that T is locally constant, then we have \begin{equation*} P^{d}= Q^{d} \circ T,\tag{7}\end{equation*}
View SourceRight-click on figure for MathML and additional features.
where P^{d} , Q^{d} are the dark channels of P , Q .

By enforcing Eqns. (5) and (7) as data fidelity terms, we design a dehazing energy function:\begin{align*}&\hspace {-0.5pc}E(Q,T) = \frac {\alpha }{2} \sum _{c}\|P^{c} - Q^{c} \circ T\|_{F}^{2} + \frac {\beta }{2} \|P^{d} - Q^{d} \circ T\|_{F}^{2} \\& \qquad\qquad\qquad\qquad\qquad\qquad + f(Q^{d}) + g(T) + h(Q),\tag{8}\end{align*}

View SourceRight-click on figure for MathML and additional features. where \alpha and \beta are coefficients for data terms, \|\cdot \|_{F} is the Frobenius norm, and f(Q^{d}) , g(T) and h(Q) are regularization terms modeling the priors on dark channel Q^{d} , transmission map T and clean image Q . The optimal haze-free image Q^{*} and transmission map T^{*} can be obtained by solving the following optimization problem:\begin{equation*} \left \{{Q^{*},T^{*}}\right \} = \mathop {\mathrm {arg\,min}} _{Q,T}~E(Q,T).\tag{9}\end{equation*}
View SourceRight-click on figure for MathML and additional features.

Regularization Terms: We have three regularization terms f , g and h that respectively model dark channel prior, transmission prior and clean image prior. Multiple image priors can be taken for them. For example, f for the dark channel can be taken as \ell _{0} or \ell _{1} regularizer, enforcing the dark channel to be sparse and close to zero. The transmission map is closely related to the latent scene depth, which is piecewise-smooth and edge-aligned with the depth, thus its regularizer g can be modeled by MRF [46], [47], or TGV [10], [48]. For clean image prior h , we can use common image priors like TV [49].

B. Model Optimization

It is non-trivial to directly solve optimization problem Eqn. (9), so we turn to the half-quadratic splitting (HQS) algorithm to break it into easier sub-problems. The HQS algorithm has been widely used to solve image inverse problems [50]–​[54]. By introducing an auxiliary variable U to substitute Q^{d} , i.e., the dark channel of the latent haze-free image, we derive the augmented energy function:\begin{align*}&\hspace {-0.5pc}E(Q,T,U) = \frac {\alpha }{2} \sum _{c} \|Q^{c} \circ T - P^{c}\|_{F}^{2} + \frac {\beta }{2} \|U \circ T - P^{d}\|_{F}^{2} \\&+\, \frac {\gamma }{2} \|U - Q^{d}\|_{F}^{2} + f(U) + g(T) + h(Q),\tag{10}\end{align*}

View SourceRight-click on figure for MathML and additional features. in which \gamma is a penalty weight, and when \gamma \rightarrow \infty , the solution of minimizing Eqn. (10) converges to that of minimizing Eqn. (8). We minimize Eqn. (10) by alternately updating U , T and Q while fixing the other two variables. We initialize Q_{0}=P and all elements of T_{0} are ones, then for the n -th iteration of the HQS algorithm, we successively solve the following sub-problems.

Update U : Given the estimated haze-free image Q_{n-1} and transmission map T_{n-1} at iteration n-1 , the auxiliary variable U_{n} for the n -th iteration is updated as:\begin{align*}&\hspace {-0.5pc}U_{n} = \mathop {\mathrm {arg\,min}} _{U}~\frac {\beta }{2}\|U\circ T_{n-1}-P^{d}\|_{F}^{2} \\&+\, \frac {\gamma }{2}\|U-Q^{d}_{n-1}\|_{F}^{2} + f(U),\tag{11}\end{align*}

View SourceRight-click on figure for MathML and additional features. from which we can derive \begin{equation*} U_{n} = \mathrm {prox}_{\frac {1}{u_{n}}f}(\hat {U}_{n}),\tag{12}\end{equation*}
View SourceRight-click on figure for MathML and additional features.
where \hat {U}_{n} is an intermediate variable defined as:\begin{equation*} \hat {U}_{n} = \frac {\beta T_{n-1} \circ P^{d} + \gamma Q_{n-1}^{d}}{\beta T_{n-1}\circ T_{n-1} + \gamma },\tag{13}\end{equation*}
View SourceRight-click on figure for MathML and additional features.
and u_{n} = \beta T_{n-1}\circ T _{n-1}+ \gamma . The proximal operator prox [55] we used is defined as:\begin{equation*} \mathrm {prox}_{\lambda f}(V) = \mathop {\mathrm {arg\,min}} _{X}\frac {1}{2}\|X-V\|_{F}^{2} + \lambda f(X),\tag{14}\end{equation*}
View SourceRight-click on figure for MathML and additional features.
assuming that f(X) is separable for different elements in a matrix X such that f(X) = \sum _{i} f(x_{i}) . This assumption is reasonable for many common regularizations such as \ell _{1} or \ell _{2} . In practice, we relax this constraint and extend it to general regularizations. In addition, we also extend \lambda to be a matrix with the same size as X .

Update T : We next update the transmission map T_{n} . Given Q_{n-1} and U_{n} , T_{n} is computed as:\begin{align*}&\hspace {-0.5pc}T_{n} = \mathop {\mathrm {arg\,min}} _{T}~\frac {\alpha }{2}\sum _{c}\|Q^{c}_{n-1}\circ T - P^{c}\|_{F}^{2} \\&+\, \frac {\beta }{2}\| U_{n}\circ T - P^{d} \|_{F}^{2} + g(T).\tag{15}\end{align*}

View SourceRight-click on figure for MathML and additional features.

Then we can derive \begin{equation*} T_{n} = \mathrm {prox}_{\frac {1}{t_{n}}g}\left ({\hat {T}_{n} }\right),\tag{16}\end{equation*}

View SourceRight-click on figure for MathML and additional features. where \hat {T}_{n} is an intermediate variable defined as:\begin{equation*} \hat {T}_{n} = \frac {\alpha \sum _{c} Q_{n-1}^{c}\circ P^{c} + \beta U_{n}\circ P^{d}}{\alpha \sum _{c} Q_{n-1}^{c}\circ Q_{n-1}^{c} + \beta U_{n}\circ U_{n}},\tag{17}\end{equation*}
View SourceRight-click on figure for MathML and additional features.
and t_{n} = \alpha \sum _{c} Q_{n-1}^{c}\circ Q_{n-1}^{c} + \beta U_{n}\circ U_{n} .

Update Q : Given T_{n} and U_{n} , the haze-free image Q_{n} is updated as:\begin{align*}&\hspace {-0.5pc}Q_{n} = \mathop {\mathrm {arg\,min}} _{Q} \frac {\alpha }{2}\sum _{c}\|Q^{c} \circ T_{n} -P^{c}\|_{F}^{2} \\& \qquad\qquad\qquad\qquad\quad +\, \frac {\gamma }{2}\|Q^{d} - U_{n}\|_{F}^{2} + h(Q).\tag{18}\end{align*}

View SourceRight-click on figure for MathML and additional features.

Since computing the dark channel of an image is to extract the smallest value from the local color patch around each pixel, the second term of Eqn (18) only constrains on limited pixels in the original image Q , which may cause unstable results. Therefore we ignore the second data term, and the clean image Q_{n} is updated as:\begin{equation*} Q_{n} = \mathrm {prox}_{\frac {1}{q_{n}}h}(\hat {Q}_{n}),\tag{19}\end{equation*}

View SourceRight-click on figure for MathML and additional features. where \hat {Q}_{n} = \frac {P} {\max (T_{n}, \epsilon)} , q_{n} = T_{n} , and \epsilon is a constant to prevent extreme low transmission values.

After N iterations, the final haze-free image J can be derived by adding Q_{N} with A in each color channel: J^{c}=Q_{N}^{c} + A^{c} . In summary, the iterative procedure for optimizing the proposed energy model with the HQS algorithm is shown as Algorithm 1.

Algorithm 1 Energy Minimization for Single Image Dehazing With Half-Quadratic Splitting

Input:

Hazy image I , iteration number N

Output:

Transmission T , haze-free image J

1:

Estimate atmospheric light A from I .

2:

Let P^{c}=I^{c} - A^{c} , c\in \{r,g,b\} .

3:

Let Q^{c}_{0}(x):=P^{c}(x) , T_{0}(x):=1 , \gamma:=1 .

4:

for n=1:N do

5:

Update U_{n} based on Eqn. (12).

6:

Update T_{n} based on Eqn. (16).

7:

Update Q_{n} based on Eqn. (19).

8:

Update \gamma:= \delta \gamma , \delta > 1 .

9:

end for

10:

return A , T=T_{N} , J^{c}=Q^{c}_{N} + A^{c} .

For an instance of our energy model, we concretize these regularization terms in Eqn (10). Specifically, for dark channel U , we use \ell _{1} regularization for sparse and low pixel values. For transmission T , we use ATGV [10], [48] regularization for smooth and edge-aligned maps. For clean image Q , we do not add any constraints, i.e., \begin{align*} f(U)=&\|U\|_{1}, \\ g(T)=&\text {ATGV}(T), \\ h(Q)=&0.\end{align*}

View SourceRight-click on figure for MathML and additional features.

ATGV is the anisotropic total generalized variation:\begin{equation*} \min _{T,V}~\alpha _{1} \|D^{\frac {1}{2}}(\nabla T - V)\|_{1} + \alpha _{0} \|\nabla V\|_{1} + \mathbf {w}\|T-\hat {T}\|_{1},\end{equation*}

View SourceRight-click on figure for MathML and additional features. where \hat {T} is the initial transmission map, D^{\frac {1}{2}} is the anisotropic diffusion tensor decided by guide image P . The dehazing result is shown as Ours-EM (energy minimization) in Figure 3. We also demonstrate several other energy-based methods as comparisons, and we can see that our energy model is effective in removing hazes.

SECTION IV.

Proximal Dehaze-Net

Although our energy-based method can remove hazes effectively, we can further improve it in visual performance and processing speed by learning haze-related image priors. Instead of designing image priors by hand according to human experiences, we model haze-related priors with convolutional neural networks via learning proximal mappings appeared in Section III-B. Note that the above introduced optimization process requires the atmospheric light A to be known in advance. In the conference version of our work [22], we refer to [4] to pre-estimate the atmospheric light A . In this paper, to estimated A more accurately, we introduce an extra convolutional neural network to predict the atmospheric light. Combining the learning of A and the proposed iterative optimization algorithm, we build a deep learning framework for single image dehazing, denoted by proximal dehaze-net, as illustrated in Figure 4.

The core part of this framework is an architecture with N stages implementing N iterations of Algorithm 1 for solving Eqn. (10). As shown in Figure 4, the atmospheric light A is first estimated by A-Net, then we subtract A from I in each color channel to obtain P , which is an normalized hazy image within the range of [-1,1] . P is sent into an N -stage learning framework. For the n -th stage, it takes the outputs of the previous stage U_{n-1} , T_{n-1} , Q_{n-1} as inputs, and learns to predict U_{n} , T_{n} and Q_{n} for the current stage.

As mentioned above, instead of designing image priors by hand, we model them by using CNNs to learn their corresponding proximal operators \mathrm {prox}_{\frac {1}{u_{n}}f} , \mathrm {prox}_{\frac {1}{t_{n}}g} and \mathrm {prox}_{\frac {1}{q_{n}}h} for updating U_{n} , T_{n} and Q_{n} in each stage, i.e., \begin{align*} {U}_{n}=&\mathrm {prox}_{\frac {1}{u_{n}}f}(\hat {U}_{n}) \triangleq \mathcal {F}_{n}(\hat {U}_{n}), \\[-2pt] {T}_{n}=&\mathrm {prox}_{\frac {1}{t_{n}}g}(\hat {T}_{n}) \triangleq \mathcal {G}_{n}(\hat {T}_{n}), \\[-2pt] {Q}_{n}=&\mathrm {prox}_{\frac {1}{q_{n}}h}(\hat {Q}_{n}) \triangleq \mathcal {H}_{n}(\hat {Q}_{n}),\tag{20}\end{align*}

View SourceRight-click on figure for MathML and additional features. where \mathcal {F}_{n} , \mathcal {G}_{n} and \mathcal {H}_{n} are sub-modules to be learned for representing the corresponding proximal operators at the n -th stage. In this way, we design an end-to-end training architecture, dubbed as proximal dehaze-net (PDN). As shown in Figure 4, each stage of the proximal dehaze-net implements one iteration of model optimization discussed in Section III-B, and the proximal operators are substituted by convolutional neural networks as in Eqn. (20). Note that in Eqn. (14), we assume that the regularization functions are separable, while in this section, we relax this constraint to make full use of the representation ability of convolutional neural networks. We now introduce the structures of these sub-modules for each stage.

At the n -th stage, \hat {U}_{n} is first computed by Eqn. (13), then sent into a convolutional neural network, \text {U-Net}_{n} , to learn sub-module \mathcal {F}_{n} . The updated dark channel is:\begin{equation*} U_{n} = \mathcal {F}_{n}(\hat {U}_{n}) \triangleq \text {U-Net}_{n}([\hat {U}_{n}, P]),\tag{21}\end{equation*}

View SourceRight-click on figure for MathML and additional features. in which we concatenate \hat {U}_{n} with hazy image P as the input of \text {U-Net}_{n} to prevent\vphantom {\int _{1}} information loss.

Similarly, \hat {T}_{n} is first computed by Eqn. (17), concatenated with P and then sent into a convolutional neural network, \text {T-Net}_{n} , to learn sub-module \mathcal {G}_{n} . Note that \text {T-Net}_{n} is followed by a GIF block (performing guided image filtering) to ensure edge alignment between the transmission map and the original image. The updated transmission map is computed as:\begin{align*} \hat {T}_{n'}=&\text {T-Net}_{n}\big ([\hat {T}_{n}, P] \big), \\ T_{n}=&\text {GIF}\big (\hat {T}_{n'}, P \big),\tag{22}\end{align*}

View SourceRight-click on figure for MathML and additional features. where \hat {T}_{n'} is the output of \text {T-Net}_{n} , and \text {GIF}(\hat {T}_{n'},P) is to perform guided image filtering [56] on image \hat {T}_{n'} with P as the guidance image. Thus, the sub-module \mathcal {G}_{n} can be represented as the composition of \text {T-Net}_{n} and the GIF block:\begin{equation*} T_{n} = \mathcal {G}_{n}(\hat {T}_{n}) \triangleq \text {GIF} \big ({\text {T-Net}_{n}}([\hat {T}_{n}, P]), P \big).\tag{23}\end{equation*}
View SourceRight-click on figure for MathML and additional features.

Finally, for Q_{n} , the intermediate variable \hat {Q}_{n} is first computed using Eqn. (19), and then sent into a convolutional neural network \text {Q-Net}_{n} concatenated with P to perform proximal mapping \mathrm {prox}_{\frac {1}{q_{n}}h} . The updated clean image is:\begin{equation*} Q_{n} = \mathcal {H}_{n}(\hat {Q}_{n}) \triangleq {\text {Q-Net}_{n}}([\hat {Q}_{n}, P]),\tag{24}\end{equation*}

View SourceRight-click on figure for MathML and additional features. where Q_{n} is the estimation of the haze-free image with A subtracted at the n -th stage. U_{n} , T_{n} and Q_{n} then serve as the inputs of the (n+1) -th stage of our proximal dehaze-net.

After N stages, we obtain the final outputs U_{N} , T_{N} and Q_{N} , and we use Q_{N} and the predicted atmospheric light A to reconstruct the haze-free image as shown in Figure 4.

Sub-Networks: Our proximal dehaze-net includes four kinds of sub-networks, i.e., A-Net, U-Net, T-Net and Q-Net, and they share similar structures. As shown in Figure 5, we adopt the commonly used residual encoder-decoder (RED) as the base architecture for these sub-networks. Accordingly, these sub-networks consist of several stacked down-sampling convolution blocks for the encoder part and up-sampling convolution blocks for the decoder part. The bottleneck is made up of stacked residual blocks [57], [58]. Skip connections between the encoder and the decoder are used to prevent from losing spatial information. For the last layer of an RED, we use Sigmoid for A-Net and T-Net, since the outputs are within the range of [0, 1], and we use Tanh for U-Net and Q-Net since their outputs are within the range of [-1,1] .

GIF Block: GIF block stands for guided filtering [56] computation block within our proximal dehaze-net. GIF block enforces the transmission map learned by T-Net to be well aligned with the image in edges. It takes the hazy image P as guidance and performs guided image filtering on the output of T-Net. As stated in [56], the process of GIF consists of a series of average filtering operations and simple element-wise operations. The computation graph of GIF block is shown in Figure 6, and more details of the GIF algorithm can be found in [56]. Many previous works [16], [17], [56] use GIF as a post-processing of the estimated transmission map. On the contrary, we include the process of guided image filtering within our end-to-end trainable system. For GIF block implementation, we adopt the work of Wu et al. [59] and we will discuss the necessity of GIF block later.

Loss Functions: To train our proximal dehaze-net, we introduce commonly used loss functions, including \ell _{1} reconstruction loss, total variation (TV) loss and structure similarity (SSIM) loss on network outputs and the corresponding ground truths. Specifically, the atmospheric light loss is:\begin{equation*} L_{A} = \|A^{*} - A^{gt}\|_{1} + \varepsilon _{1}~TV(A^{*}) + \varepsilon _{2}~L_{s}(A^{*}, A^{gt}),\end{equation*}

View SourceRight-click on figure for MathML and additional features. where A^{*} is the atmospheric light map estimated by A-Net and L_{s}(\cdot,\cdot) = 1 - \text {SSIM}(\cdot,\cdot) defines the structure similarity loss. As for U , T and Q , the loss function is defined as:\begin{equation*} L_{z} = \sum _{n=1}^{N} \|z_{n} - z^{gt}\|_{1} + \varepsilon _{1}~TV(z_{n}) + \varepsilon _{2}~L_{s}(z_{n}, z^{gt}),\end{equation*}
View SourceRight-click on figure for MathML and additional features.
where z \in \{U, T, Q\} , Q_{n}^{gt} = J^{gt} - A^{gt} , N is the number of PDN stages, \varepsilon _{1} and \varepsilon _{2} are coefficients for TV loss and SSIM loss. Thus the total loss function is:\begin{equation*} L = \lambda _{A} L_{A} + \lambda _{U} L_{U} + \lambda _{T} L_{T} + \lambda _{Q} L_{Q},\end{equation*}
View SourceRight-click on figure for MathML and additional features.
where \lambda _{A} , \lambda _{U} , \lambda _{T} , \lambda _{Q} are weights for these loss terms. Empirically, for all experiments, the loss coefficients \varepsilon _{1} for total variation loss is set to 0.1 for L_{A} , 0.01 for L_{U} and L_{T} , and 0.001 for L_{Q} . Coefficients \varepsilon _{2} for SSIM loss are set to 0.01 for all loss terms. Finally we set the coefficients \lambda _{A}=\lambda _{U}=\lambda _{T}=\lambda _{Q}=1 .

SECTION V.

Experiments

To verify the effectiveness of the proposed proximal dehaze-net, we evaluate our method on different datasets and compare it with other single image dehazing methods.

A. Datasets

We evaluate our proximal dehaze-net on multiple benchmark datasets for single image dehazing, including RESIDE dataset [60] and NTIRE 2018 single image dehazing challenge [61]. Both RESIDE and NTIRE 2018 datasets consist of indoor and outdoor subsets.

1) Reside Dataset

RESIDE dataset consists of indoor and outdoor datasets. The indoor dataset contains 13990 generated hazy/clean training images and 500 test images. The outdoor dataset contains about 8400 haze-free images with depth maps that can be used to synthesize training pairs and 500 hazy images as the test set. Note that for the outdoor dataset, we first remove redundant images from the training set that are overlapped with the test set. For each dataset, we randomly crop 64000 patches of 256\times 256 as the training set and apply horizontal/vertical flipping and random rotation as data augmentation. Considering the natural gap between indoor and outdoor images, we individually train two models for indoor and outdoor situations.

2) NTIRE 2018 Dataset

NTIRE 2018 single image dehazing challenge also consists of two subsets, I-Hazy (indoor hazy images) and O-Hazy (outdoor hazy images). I-Hazy contains 25 training images and 5 test images while O-Hazy contains 35 training images and 5 test images. Similar to RESIDE, we train two models separately for indoor and outdoor images.

3) Data Pre-Processing

To effectively train our PDN model, we need ground truths \{A^{gt}, U^{gt}, T^{gt}, Q^{gt}\} . For RESIDE dataset, A^{gt} , T^{gt} and J^{gt} are directly provided, Q^{gt} = J^{gt} - A^{gt} , and we compute the ground truth U^{gt} as the dark channel of J^{gt}-A^{*} . However, for I-Hazy and O-Hazy, only J^{gt} is available. To handle this problem, we first treat A and T as unknowns and minimize the following energy function to obtain A and T :\begin{align*} E(A,T) = \frac {1}{2}\sum _{c}\| J^{c} \circ T + A^{c} (1 - T) - I^{c} \|_{F}^{2} + \lambda \text {TV}(T), \\\tag{25}\end{align*}

View SourceRight-click on figure for MathML and additional features. where A\in \mathbb {R}^{3} and T\in \mathbb {R}^{p\times p} , which means that we compute A and T within a p\times p local patch (p=512 in our experiments). To show the effectiveness of the proposed pre-processing method, in Figure 7, we illustrate two examples of the intermediate results of the recovered transmission maps. Although the accuracy of recovered transmission maps is restricted since real-world hazy images do not abide strictly by Eqn. (1), we can claim that the results are reasonable to a certain extent and sufficient to train our dehazing model. We use simple gradient descent algorithm to solve the above problem, and after we have A and T , we prepare training data with the same setting as RESIDE.

B. Implementation Details

We implement and train our PDN model with PyTorch [62] framework. Note that hyper-parameters [\alpha,\beta,\gamma] are also trained simultaneously and they are initialized with 1. Since all hyper-parameters should be positive, we learn the exponent of them instead of themselves. We choose the Adam optimizer with default parameters, and the initial learning rate is set to 10−4, which will be decreased by multiplying a factor of 0.75 every 10 epochs. We use a batch size of 16 and it takes about 3 days to train a single-stage network for 100 epochs on a Titan X GPU. For multi-stage networks, we first initialize the weights with the pre-trained one-stage network and then continue the training.

C. Results on Synthetic Datasets

We first evaluate our proximal dehaze-net on synthetic datasets and compare it with other single image dehazing methods. We select some representative single image dehazing methods, including DCP [4], CAP [8], NLD [9], GRM [10], IDE [11], MSCNN [16], DehazeNet [17], AODNet [18], DcGAN [21], GDN [23], DADN [28] and FDGAN [27]. Among these methods, DCP, CAP, NLD, GRM and IDE are traditional image processing methods based on image priors. MSCNN, DehazeNet, AODNet, DcGAN, GDN, DADN and FDGAN are deep learning methods that predict either transmission maps or clean images. For fair comparisons, we retrained these learning-based methods on the corresponding datasets if the training codes are provided by authors. We evaluate these methods on SOTS and NTIRE 2018 datasets. Both datasets consist of indoor and outdoor subsets. We report the peak-signal-to-noise ratio (PSNR) and structural similarity (SSIM) as performance metrics.

As shown in Table 1, the learning-based methods are usually quantitatively superior to the traditional prior-based methods. Our proposed PDN model surpasses most methods and achieves competitive PSNR and SSIM values on multiple benchmark datasets. Specifically, on SOTS indoor and SOTS outdoor datasets, we achieve the highest PSNRs and comparable SSIMs with state-of-the-art method GDN [23]. However, on realistic dehazing datasets, NTIRE 2018 (I-Hazy and O-Hazy in Table 1), we achieve the best PSNRs and SSIMs among compared methods on both indoor and outdoor cases. The dehazing results on NTIRE datasets also verify the effectiveness and rationality of the proposed data pre-processing algorithm to generate atmospheric lights and transmission maps from given hazy/clean image pairs.

TABLE 1 Dehazing Results on SOTS (Indoor and Outdoor) and NTIRE 2018 (I-Hazy and O-Hazy) Datasets. We Show the Average Peak Signal-to-Noise Ratio (PSNR) and Structure Similarity (SSIM) of the Recovered Images by the Compared Methods. Best Results are Shown in Red and the Second Best are Shown in Blue
Table 1- 
Dehazing Results on SOTS (Indoor and Outdoor) and NTIRE 2018 (I-Hazy and O-Hazy) Datasets. We Show the Average Peak Signal-to-Noise Ratio (PSNR) and Structure Similarity (SSIM) of the Recovered Images by the Compared Methods. Best Results are Shown in Red and the Second Best are Shown in Blue

As a visualization, we show some dehazing examples from these benchmark datasets in Figure 8. The 1st and 2nd examples are from SOTS indoor dataset, the 3rd and 4th examples are from SOTS outdoor dataset, and the last two examples are from I-Hazy and O-Hazy datasets respectively. From Figure 8, we can see that learning-based methods usually behave better than the traditional prior-based methods in visual effect. The traditional methods (DCP, CAP) sometimes over dehaze the images and cause darkened images and color distortion. On the other hand, the learning-based methods can produce dehazed images that are closer to ground truths. However, the learning-based methods sometimes fail to effectively remove all hazes on an image, such as DehazeNet and AODNet. Recent methods, such as GDN, are powerful in removing hazes from synthetic images but behave not that well on realistic images, e.g., the last two examples in Figure 8. As a comparison, our method can remove hazes effectively while keeping dehazed images visually pleasing.

D. Results on Real Datasets

In Figures 9 and 10, we respectively show the dehazing results of our method on real-world hazy images by comparing with the traditional methods and the learning-based methods. On one hand, as we can see from Figure 9, the traditional methods are usually effective in removing hazes due to the investigation of useful haze-related image priors. However, they tend to overly enhance the hazy images, which causes over-saturation or color distortion, such as BCCR [7], CAP [8], NLD [9] and IDE [11]. DCP [4] are likely to produce undesirable artifacts, especially in the sky regions. GRM [10] can well suppress artifacts but will lose detailed textures due to its smooth regularization term. On the other hand, the learning-based methods are usually able to produce more visually pleasing results, as shown in Figure 10. However, MSCNN [16] and DehazeNet [17] are not always as effective as the traditional methods in haze removal. AODNet [18] often produces darken images. DCPDN [20] predicts inexact transmission maps in areas with high intensity. GFN [19] and DcGAN [21] directly predict haze-free images and sometimes cause unexpected dark artifacts due to the lack of haze imaging model constraint. FDGAN [27] and DADN [28] can produce more visual-pleasing results, but they both rely on photo-realistic synthetic training data. By learning haze-related image priors, our method combines the advantages of the traditional methods and the learning-based methods, being able to effectively remove hazes and keep the dehazed images natural in the meantime.

SECTION VI.

Discussion and Analysis

We now discuss the parameter selection, effectiveness of learning haze-related image priors as well as the necessity of GIF block. We then analyze the effect of stage numbers on the performance of our model. We also compare the running speed with the recent dehazing methods.

A. Effects of Parameters for Loss Terms

As we mentioned in Section IV, we utilize a combined loss function for training our network, introducing multiple hyper-parameters for different loss terms, i.e., \varepsilon _{1} for TV loss, \varepsilon _{2} for SSIM loss and \lambda _{z} , z\in \{A,U,T,Q\} . First, we observe in experiments that TV loss and SSIM loss affect slightly the performance of our model. By removing TV and SSIM losses, the PSNR on SOTS indoor dataset decreases from 32.53 to 32.52 for 1-stage PDN, and from 33.55 to 33.41 for 2-stage PDN. Thus, we set the values of \varepsilon _{1} and \varepsilon _{2} empirically. Second, to evaluate the effect of \lambda _{z} , we adjust one of them from 0 to 1.8 with the step of 0.2 while fixing the other three as 1, and train for 20 epochs to investigate the performance on validation dataset (no overlapping with test dataset). As we can see from Figure 11, when \lambda \geq 1 , our PDN model reaches a relative steady state. Thus, for simplicity, we set \lambda _{Q}=\lambda _{T}=\lambda _{U}=\lambda _{Q}=1 in our experiments.

FIGURE 11. - Performance of our PDN model on validation dataset with different parameter settings.
FIGURE 11.

Performance of our PDN model on validation dataset with different parameter settings.

B. Learning Image Priors

Our network learns multiple haze-related image priors. In this section, we discuss the effect of learning each image prior. To do so, we respectively remove dark channel prior f , transmission prior g and clean image prior h from the dehazing energy function in Eqn. (8). We then deduce new proximal dehaze-nets without learning these priors, dubbed as Net-ND (without learning dark channel prior), Net-NT (without learning transmission prior) and PDN-NC (without learning clean image prior). We also discuss the effect of learning atmospheric light A by dropping out A-Net and pre-compute A with [4], denoted as PDN-NA. We train PDN-NA, PDN-ND, PDN-NT and PDN-NC with one stage from scratch on RESIDE indoor training set, and evaluate them on SOTS indoor test dataset. The quantitative results are shown in Figure 12. We can see that the PDN model is promoted by learning these image priors with CNNs. Notably, learning clean image prior and atmospheric light contribute most to the improvement by about 7 dB and 5 dB in PSNR. Learning dark channel prior and transmission prior also brings considerable performance improvements to our PDN model by about 1 dB and 1.5 dB respectively. This proves the effectiveness of learning haze-related image priors for the single image dehazing task.

FIGURE 12. - Effectiveness of learning haze-related priors. The full PDN model with learning all haze related priors achieves the best performance.
FIGURE 12.

Effectiveness of learning haze-related priors. The full PDN model with learning all haze related priors achieves the best performance.

As a visualization, we show an example of the dehazing results of these variants of our model in Figure 13. We can see that our full model achieves the highest PSNR, and learning all these image priors will help to remove hazes more effectively. Without learning dark channel prior or transmission prior, our model can also effectively dehaze images, but some slight hazes are still observable. Without learning clean image prior or the atmospheric light, the remaining hazes in the image are still quite obvious.

FIGURE 13. - Dehazing results of the variants of our model measured in PSNR. Omitting these prior learning modules causes that the hazes are not removed effectively, and the full model achieves the highest PSNR.
FIGURE 13.

Dehazing results of the variants of our model measured in PSNR. Omitting these prior learning modules causes that the hazes are not removed effectively, and the full model achieves the highest PSNR.

To illustrate what are learned for the proximal mappings \mathcal {F} , \mathcal {G} and \mathcal {H} , in Figure 14, we show an example of the learned proximal mappings for dark channel, transmission map and clean image prior of our proximal dehaze-net. In Figure 14, the two figures in each of the three red boxes denotes the input and output of these learned proximal mappings. We can observe that the learned proximal mapping \mathcal {F} produces reasonable dark channel U with lower values than input dark channel \hat {U} . The learned proximal mapping \mathcal {G} produces a piecewise-smooth transmission map T that is consistent with the underlying scenery depth, which is more accurate compared with the input transmission estimation \hat {T} . Final proximal mapping \mathcal {H} will further remove haze residuals in the estimated image \hat {Q} and produce a more clear image Q .

FIGURE 14. - The learned hazed related image priors by our proximal dehaze-net. 
$\hat {U}$
, 
$\hat {T}$
, 
$\hat {Q}$
 are dark channel, transmission and haze-free image before prior learning modules. 
$U$
, 
$T$
, 
$Q$
 are corresponding results after prior learning. We can see that the learned dark channel is darker, the learned transmission map is more accurate, and the learned haze-free image is visually better. Note that 
$P$
 and 
$Q$
 are obtained by subtracting 
$A$
 from 
$I$
 and 
$J$
, and we add 
$A$
 back for better visualization.
FIGURE 14.

The learned hazed related image priors by our proximal dehaze-net. \hat {U} , \hat {T} , \hat {Q} are dark channel, transmission and haze-free image before prior learning modules. U , T , Q are corresponding results after prior learning. We can see that the learned dark channel is darker, the learned transmission map is more accurate, and the learned haze-free image is visually better. Note that P and Q are obtained by subtracting A from I and J , and we add A back for better visualization.

C. Necessity of Gif Block

GIF block is a part of the learned proximal mapping \mathcal {G} . As stated in Section IV, GIF block serves as a guided image filtering. Its effect is to force edge alignment between the estimated transmission map and the original image. To verify the effectiveness of the embedded GIF block, we train a PDN without GIF block (denoted as PDN-NG) and evaluate PDN-NG on SOTS indoor dataset. The performance is shown in Figure 12. We can see that PDN-NG without GIF block achieves almost the same quantitative performance compared with the full PDN model. We also found that, for most images, the PDN model without GIF block behaves as well as our full model. The reason for this may be the powerful learning ability of Q-Net. However, as shown in Figure 15, for some examples, the dehazed image has visible halo effect around image edges, which degrades the image visual quality. As a comparison, the result of our full model with GIF block is more clear and natural.

FIGURE 15. - Effectiveness of using GIF block. (a) Input hazy image. (b) Without using GIF block, there are obvious halo artifacts along image edges. (c) Using GIF block, our proximal dehaze-net produces haze-free image without artifacts. Please zoom in for better visualization.
FIGURE 15.

Effectiveness of using GIF block. (a) Input hazy image. (b) Without using GIF block, there are obvious halo artifacts along image edges. (c) Using GIF block, our proximal dehaze-net produces haze-free image without artifacts. Please zoom in for better visualization.

D. Effect of Multi-Stage Network

Our proximal dehaze-net model is a multi-stage learning architecture based on dehazing optimization algorithm, and more network stages are supposed to achieve higher performance on the benchmarks. However, since we use residual encoder-decoder (RED) backbone as sub-networks, which is powerful in universal image restoration tasks, we are able to achieve satisfactory results with only one stage both on quantitative performances and visual effects. As shown in Table 1 and Figure 12, we use a two-stage PDN to achieve the best PSNR on SOTS dataset. Compared with the one-stage PDN, the two-stage PDN improves the PSNR values on indoor and outdoor datasets by 1.02 dB and 0.42 dB respectively. The improvements of SSIM values are less than 0.01. On NTIRE I-Hazy and O-Hazy datasets, we achieve the best performances with a one-stage PDN. Adding more stages will not continue to improve qualitative results significantly. Furthermore, the visual effects are similar among PDN models with different stages. For simplicity, we use a two-stage PDN (PDN-S2) for SOTS and a one-stage PDN (PDN-S1) for NTIRE respectively in our paper.

E. Running Time

The complex part of our model in implementation is the dark channel computation and its backward process. In the conference version of our work [22], we realize this with low-level CUDA language for high-speed computation. In [32], dark channel and its backward are realized with the look-up table technique. In [63], dark channel is calculated by extracting local patches and finding the minimum. In this paper, we find that dark channel can be simply computed as the negative one-stride max-pooling operation, which densely extracts the local maximum of a negative image, i.e., I^{d} = - \text {MP}(-\min (I), w, 1) , where \min (I) is the minimum of each channel and MP is the max-pooling operator with kernel size w and stride 1. Therefore our implementation is purely PyTorch-based and our PDN model can process images at high speed on GPU devices. In Table 2, we show the average running time (in seconds) of single image dehazing methods on 500 images in 460\times 620 resolution, i.e., the average time of 500 evaluations. For a fair comparison, we compute the running times on both CPU and GPU devices if possible. For GPU time, we test on a Titan X GPU device with 12 GB memories. For CPU time, we test on an Intel Xeon E5-2650 CPU @ 2.20 GHz without parallel acceleration.

TABLE 2 Computation Time for Different Methods to Dehaze a Color Image of 460\times620 . PDN-S1 and PDN-S2 Denote Our PDN Model With 1 and 2 Stages. We Report the Running Times on CPU and GPU Devices for Fair Comparison
Table 2- 
Computation Time for Different Methods to Dehaze a Color Image of 
$460\times620$
. PDN-S1 and PDN-S2 Denote Our PDN Model With 1 and 2 Stages. We Report the Running Times on CPU and GPU Devices for Fair Comparison

SECTION VII.

Other Applications and Limitations

A. Extensions to More Applications

Though our network is trained for image dehazing, we can apply it to other similar tasks without the need to retrain on their corresponding datasets. Figure 16 (a)-(c) show an example of anti-halation enhancement using our method. Although halation has a different imaging model, it brings haze-like effects to image [17], [18]. Our proximal dehaze-net can be directly applied to anti-halation image enhancement. In Figure 16 (d)-(f), we show an example of underwater image enhancement. Ignoring the forward scattering component, the simplified underwater optical model has a similar formulation with haze imaging model [64], [65]. Compared with methods that are specifically designed for this task such as [64], our network can also effectively remove haze-like effects in this underwater image.

FIGURE 16. - Extension to similar tasks. The first row shows an example of anti-halation enhancement. The second row shows an example of underwater image enhancement.
FIGURE 16.

Extension to similar tasks. The first row shows an example of anti-halation enhancement. The second row shows an example of underwater image enhancement.

B. Failed Cases

While our method behaves well on most natural images, it has limitations in certain situations where the photo is taken in the night or under heavy hazy weather. For night-time hazy images, there are usually multiple light and color sources, and the image quality is degraded by low light conditions. The commonly used haze imaging model is not sufficient to describe the night-time haze phenomenon, and our PDN model has no access to such kind of training data. As shown in Figure 17 (a)-(c), our method fails to remove all the hazes in the image. On the other hand, the method [66] designed for night-time dehazing is capable of removing hazes more adequately, but the visual quality is also lowered. As for images with very thick fog, too many details are overwhelmed by hazes, and it becomes quite difficult for current methods to achieve satisfactory results. As shown in Figure 17 (d)-(f), there still remains visible hazes in the dehazed image by our method. As a comparison, iPal-DH [67], trained on Dense Haze dataset [68], is able to dehaze images with dense haze more effectively, but it also fails to recover lost details covered by hazes.

FIGURE 17. - Failure cases of our method. The first row shows an example of night-time image dehazing. The second row shows an example of heavy fog image dehazing.
FIGURE 17.

Failure cases of our method. The first row shows an example of night-time image dehazing. The second row shows an example of heavy fog image dehazing.

SECTION VIII.

Conclusion

In this paper, we propose a model-driven deep learning approach, proximal dehaze-net, for single image dehazing. We first build an energy function based on haze imaging model constraints in both image space and haze-related feature space, and then design an iterative algorithm for solving the energy model. We unfold the iterative algorithm into a multi-stage network by learning proximal operators using CNNs. The proposed proximal dehaze-net achieves promising results for single image dehazing.

Although the proposed PDN model is effective in most cases, our method shares some inherent drawbacks with other learning-based methods. First, the learning scheme is built upon the haze imaging model, which may limit the applications to more complex real-world situations, such as non-uniform haze or night-time haze. This problem can be solved by considering imaging models that can better describe real situations and better techniques to simulate realistic datasets. Second, our method has difficulty in handling cross-domain image dehazing problems. As shown in Table 1, we have to train separate models to achieve the best performance on different benchmarks, and a model trained on one dataset may behave poorly on another dataset. This problem can be solved by the simple early stopping strategy to prevent over-fitting to a specific dataset but will decrease the performance on this dataset. There exists a trade-off between the performance on one dataset and the generalization ability of the trained model. In the future, we will consider domain adaptation to better handle this problem.

Usage
Select a Year
2025

View as

Total usage sinceJul 2021:2,123
051015202530JanFebMarAprMayJunJulAugSepOctNovDec162429000000000
Year Total:69
Data is updated monthly. Usage includes PDF downloads and HTML views.

References

References is not available for this document.