Processing math: 0%
Singe Image Dehazing With Unsharp Masking and Color Gamut Expansion | IEEE Journals & Magazine | IEEE Xplore

Singe Image Dehazing With Unsharp Masking and Color Gamut Expansion


Single Image Dehazing With Unsharp Masking and Color Gamut Expansion. NBP stands for no-black-pixel, and CDF stands for cumulative distribution function.

Abstract:

Image dehazing is a fundamental problem in computer vision and has hitherto engendered prodigious amounts of studies. Recently, with the well-recognized success of deep l...Show More

Abstract:

Image dehazing is a fundamental problem in computer vision and has hitherto engendered prodigious amounts of studies. Recently, with the well-recognized success of deep learning techniques, this field has been dominated by deep dehazing models. However, deep learning is not always a panacea, especially for the practicalities of image dehazing, because high computational complexity, expensive maintenance costs, and high carbon emission are three noticeable problems. Computational efficiency is, therefore, a decisive factor in real-world circumstances. To cope with this growing demand, we propose a linear time algorithm tailored to three primitive parts: unsharp masking (pre-processing), dehazing, and color gamut expansion (post-processing). The first enhances the sharpness according to the local variance of image intensities. The second removes haze based on the improved color attenuation prior, and the third addresses a residual effect of color gamut reduction. Extensive experimental results demonstrated that the proposed method performed comparatively with popular benchmarks, notably deep dehazing models. With such a comparative performance, the proposed method is still fast and efficient, favoring real-world computer vision systems.
Single Image Dehazing With Unsharp Masking and Color Gamut Expansion. NBP stands for no-black-pixel, and CDF stands for cumulative distribution function.
Published in: IEEE Access ( Volume: 10)
Page(s): 102462 - 102474
Date of Publication: 26 September 2022
Electronic ISSN: 2169-3536

Funding Agency:

Author image of Dat Ngo
Technical Support and Development Center for Display Device Convergence Technology, Busan, South Korea
Dat Ngo received the B.S. degree in computer engineering from The University of Danang—University of Science and Technology, Danang, Vietnam, in 2016, and the M.S. and Ph.D. degrees in electronic engineering from Dong-A University, Busan, South Korea, in 2018 and 2021, respectively.
He is a Postdoctoral Researcher with the Technical Support and Development Center for Display Device Convergence Technology, Busan. His resear...Show More
Dat Ngo received the B.S. degree in computer engineering from The University of Danang—University of Science and Technology, Danang, Vietnam, in 2016, and the M.S. and Ph.D. degrees in electronic engineering from Dong-A University, Busan, South Korea, in 2018 and 2021, respectively.
He is a Postdoctoral Researcher with the Technical Support and Development Center for Display Device Convergence Technology, Busan. His resear...View more
Author image of Gi-Dong Lee
Technical Support and Development Center for Display Device Convergence Technology, Busan, South Korea
Department of Electronic Engineering, Dong-A University, Busan, South Korea
Gi-Dong Lee received the B.S., M.S., and Ph.D. degrees in electronic engineering from Busan National University, Busan, South Korea, in 1989, 1991, and 2000, respectively.
He was a Postdoctoral Researcher at the Liquid Crystal Institute, Kent State University, OH, USA, until 2003. Since 2003, he has been with the Department of Electronic Engineering, Dong-A University, Busan, where he is a Full Professor. He is also the Di...Show More
Gi-Dong Lee received the B.S., M.S., and Ph.D. degrees in electronic engineering from Busan National University, Busan, South Korea, in 1989, 1991, and 2000, respectively.
He was a Postdoctoral Researcher at the Liquid Crystal Institute, Kent State University, OH, USA, until 2003. Since 2003, he has been with the Department of Electronic Engineering, Dong-A University, Busan, where he is a Full Professor. He is also the Di...View more
Author image of Bongsoon Kang
Technical Support and Development Center for Display Device Convergence Technology, Busan, South Korea
Department of Electronic Engineering, Dong-A University, Busan, South Korea
Bongsoon Kang received the B.S. degree in electronic engineering from Yonsei University, Seoul, South Korea, in 1985, the M.S. degree in electrical engineering from the University of Pennsylvania, Philadelphia, USA, in 1987, and the Ph.D. degree in electrical engineering from Drexel University, Philadelphia, USA, in 1990.
He was a Senior Staff Researcher at Samsung Electronics, Suwon, South Korea, from 1989 to 1999. Since ...Show More
Bongsoon Kang received the B.S. degree in electronic engineering from Yonsei University, Seoul, South Korea, in 1985, the M.S. degree in electrical engineering from the University of Pennsylvania, Philadelphia, USA, in 1987, and the Ph.D. degree in electrical engineering from Drexel University, Philadelphia, USA, in 1990.
He was a Senior Staff Researcher at Samsung Electronics, Suwon, South Korea, from 1989 to 1999. Since ...View more

CCBY - IEEE is not the copyright holder of this material. Please follow the instructions via https://creativecommons.org/licenses/by/4.0/ to obtain full-text articles and stipulations in the API documentation.
SECTION I.

Introduction

Atmospheric scattering occurs when the sunlight enters the atmosphere and is diffused in all directions due to particle-particle collision. Scattering, coupled with absorption, decreases the quality of digital images, resulting in various types of degradation, such as faint color, low contrast, and detail loss. Although the degradation degree depends on the size of atmospheric particles, which varies according to weather conditions [1], this phenomenon is widely referred to as haze. On the one hand, haze obscures distant objects and affects the visibility perceived by human visual systems. On the other hand, it also affects high-level computer vision applications that assume clean input image/video data, as pointed out in [2]. Hence, image dehazing is a research branch focusing on alleviating the adverse effects of haze. Fig. 1 demonstrates the dehazing results of a real-world hazy image using a deep learning model [3] and the proposed method. Our result is more favorable to human visual systems because haze has been removed efficiently while fine details have been satisfactorily recovered (in the blue-cropped region).

FIGURE 1. - A real-world hazy image and its corresponding dehazing results by Ren et al. [3] and the proposed method. MS-CNN stands for the multi-scale convolutional neural network.
FIGURE 1.

A real-world hazy image and its corresponding dehazing results by Ren et al. [3] and the proposed method. MS-CNN stands for the multi-scale convolutional neural network.

A. Degradation Model

Reference [4] formalizes the haze-induced degradation by a model comprising the direct attenuation and the airlight scattering, denoted as blue and red in Fig. 2, respectively. The former occurs when the reflected light at a particular wavelength \lambda (in m) propagates through suspended particles, where each of them causes an angular scattering that attenuates the light energy. For a unit volume of differential width \mathrm {d}r_{1} , the attenuated energy is formulated by:\begin{equation*} \frac {\mathrm {d}E_{a}(r_{1},\lambda)}{E_{a}(r_{1},\lambda)} = -\beta _{sc}\mathrm {d}r_{1}, \tag{1}\end{equation*} View SourceRight-click on figure for MathML and additional features. where E_{a}(r_{1},\lambda) (in \mathrm {Wm}^{-2} ) denotes the irradiance at the unit volume \mathrm {d}r_{1} , and \beta _{sc} (in \mathrm {m}^{-1} ) denotes the scattering coefficient. The captured irradiance, E_{a}(d,\lambda) at r_{1}=d , is then obtained by integrating both sides of (1) as follows:\begin{equation*} \int _{E_{a}(0,\lambda)}^{E_{a}(d,\lambda)} \frac {\mathrm {d}E_{a}(r_{1},\lambda)}{E_{a}(r_{1},\lambda)} = \int _{0}^{d} -\beta _{sc}\mathrm {d}r_{1}. \tag{2}\end{equation*} View SourceRight-click on figure for MathML and additional features.

FIGURE 2. - Illustration of the atmospheric scattering phenomenon.
FIGURE 2.

Illustration of the atmospheric scattering phenomenon.

At r_{1}=0 , the irradiance E_{a}(0,\lambda) = \Omega _{\lambda} S_{0} F_{\lambda} along the optical path is calculated from the mean radiance of the sky S_{0} (in \mathrm {Wm}^{-2}\mathrm {sr}^{-1} ), the solid angle \Omega _{\lambda} (in sr), and the dimensionless reflectance coefficient F_{\lambda} . Hence, \begin{equation*} E_{a}(d,\lambda) = \Omega _{\lambda} S_{0} F_{\lambda} \, \mathrm {exp}(-\beta _{sc}d). \tag{3}\end{equation*} View SourceRight-click on figure for MathML and additional features.

The other part, the airlight, indicates a portion of light reflected from the terrain surface or scattered directly to the camera. The irradiance E_{s}(r_{2},\lambda) , in this case, can be calculated by considering a unit volume \mathrm {d}r_{2} whose radiance is S_{0}\beta _{sc}\mathrm {d}r_{2} . Similarly, the irradiance captured by the camera is attenuated, and its value at a distance r_{2} from the camera is:\begin{equation*} \mathrm {d}E_{s}(r_{2},\lambda) = \Omega _{\lambda} S_{0} \beta _{sc} \, \mathrm {exp}(-\beta _{sc}r_{2}) \mathrm {d}r_{2}. \tag{4}\end{equation*} View SourceRight-click on figure for MathML and additional features. The irradiance E_{s}(d,\lambda) at r_{2}=d is then obtained by integrating (4) over the whole optical path, as follows:\begin{align*} E_{s}(d,\lambda)=&\Omega _{\lambda} S_{0} \beta _{sc} \int _{0}^{d} \mathrm {exp}(-\beta _{sc}r_{2}) \mathrm {d}r_{2} \\=&\Omega _{\lambda} S_{0}[1-\mathrm {exp}(-\beta _{sc}d)]. \tag{5}\end{align*} View SourceRight-click on figure for MathML and additional features.

The total irradiance E_{t}(d,\lambda) is the sum of E_{a}(d,\lambda) and E_{s}(d,\lambda) as:\begin{align*} E_{t}(d,\lambda) = \Omega _{\lambda} S_{0} F_{\lambda} \, \mathrm {exp}(-\beta _{sc}d) + \Omega _{\lambda} S_{0}[1-\mathrm {exp}(-\beta _{sc}d)]. \\{}\tag{6}\end{align*} View SourceRight-click on figure for MathML and additional features.

For ease of representation, it is convenient to substitute E_{t}(d,\lambda) with \mathbf {I}(x) , where the boldface indicates the wavelength-dependent characteristics. Considering general cameras with sensors sensitive to red, green, and blue wavelengths, \mathbf {I}(x)=\{I^{R}(x),I^{G}(x),I^{B}(x)\} denotes the x -th hazy intensities. Similarly, \mathbf {J}(x) , \mathbf {A} , and t(x) can substitute for \Omega _{\lambda} S_{0} F_{\lambda} , \Omega _{\lambda} S_{0} , and \mathrm {exp}[-\beta _{sc}d(x)] , respectively. These three are referred to as the haze-free intensities, the atmospheric light, and the transmittance. The following is, therefore, the simplified form of (6), and it commonly serves as the degradation model in image dehazing:\begin{equation*} \mathbf {I}(x) = \mathbf {J}(x)t(x) + \mathbf {A}[1-t(x)]. \tag{7}\end{equation*} View SourceRight-click on figure for MathML and additional features.

B. Ill-Posedness

The concept of ill-posedness dates back to [5], and a mathematical problem is called ill-posed (or incorrectly posed) if at least one of the following conditions for its solution fails:

  • The existence

  • The uniqueness

  • The stability

In (7), the hazy intensities \mathbf {I} captured by the camera are the only data available, whereas the remainder is unknown. Accordingly, for a given set of \mathbf {A} and t estimated, (7) yields a different solution of \mathbf {J} and violates the uniqueness condition. This issue renders image dehazing ill-posed and brings about prodigious amounts of relevant studies. Recently, researchers have adopted deep learning techniques to address the ill-posedness, as witnessed by [6], [7], [8], [9], [10], and [11]. Despite such excellent performances as those deep dehazing models have delivered, they have been linked with several problems in real-world execution, such as high power consumption, high carbon emission, and expensive maintenance costs [12].

Furthermore, for a low-level vision task such as image dehazing, deep neural networks (DNNs) are often overkill, as discussed in [13] about deep learning and traditional computer vision techniques. In fact, they fit well with high-level cognitive tasks, such as object classification, recognition, and localization. The data-driven performance of DNNs is also more of a hindrance than a help because abstract features learned by DNNs are specific to the training dataset, whose construction is highly cumbersome for statistical reliability. Thus, the learned features may be inappropriate for images different from those in the training set, lowering the performance in general.

SECTION II.

Related Works

This section briefly reviews influential works in the literature based on the categorization in [14], where algorithms have been divided into three categories according to their data exploitation. The first two, image processing and machine learning, were typified by low-level hand-engineered image features discovered through statistical analysis of real-world images. The last category, deep learning, exploited the powerful representation capability of DNNs to learn high-level data-driven image features. This categorization could give useful insights into (i) the complexity of dehazing algorithms and (ii) subjective/objective preferences for dehazed images. Generally, image processing and machine learning-based methods possess low complexity and favor human perception. Deep learning-based methods, on the contrary, are computationally costly and favor image quality assessment metrics.

A. Image Processing

The dark channel prior [15], one of the most influential works in image dehazing, is a prime example of the first category. He et al. [15] observed natural haze-free images and discovered that the dark channel–calculated as the local minimum of the minimum channel–tended to approach zero at non-sky image patches. This finding, coupled with the degradation model, offered an efficient means to estimate the transmittance, which required soft-matting [16] for edge-aware smoothing. He et al. [15] also proposed locating the atmospheric light as the brightest pixel (in the red-green-blue (RGB) color space) among the top 5% pixels with the highest intensities in the dark channel. Given the estimated transmittance and atmospheric light, they reversed (7) to obtain the haze-free image.

Another influential work is the linear time algorithm by Tarel and Hautiere [17]. They first white-balanced the input image to support the assumption that the atmospheric light was pure white. After that, they inferred the airlight, the term \mathbf {A}[1-t(x)] in (7), from image whiteness as a percentage of its local deviation from its local average. This airlight inference required two median filters, hence the widely known name the median of medians along lines. Finally, they post-processed the restored image using a simple tone mapping operator to expand the dynamic range. The most computationally heavy part of this algorithm is the median filter, whose implementation is, fortunately, \mathcal {O}(N) [18], therein lying the linear time complexity.

B. Machine Learning

As image dehazing based on in situ information is highly challenging, the distilled knowledge from relevant image datasets may improve the dehazing performance. Zhu et al. [19] developed the color attenuation prior based on extensive observations of natural outdoor images. This prior stated that the scene depth correlated with the difference between image saturation and brightness. Zhu et al. [19] then utilized a linear function to model that correlation and estimated the function’s parameters by applying supervised learning on a synthetic dataset. Thus, the distilled knowledge was parameters used to estimate the scene depth from image saturation and brightness.

From a more general perspective, Tang et al. [20] investigated four haze-relevant features, including the dark channel, hue disparity, locally maximum contrast, and locally maximum saturation, at multiple scales and found the following. Although the dark channel was the most informative feature (as discovered by He et al. [15]), other features also contributed in a complementary manner. Hence, Tang et al. [20] devised a framework for inferring the transmittance from different haze-relevant features. In [20], they employed a random forest regressor for ease of analysis and demonstration, albeit with slow inference time. They also discussed the importance of post-processing and presented two post-processing options: adaptive atmospheric light estimation and adaptive exposure scaling.

C. Deep Learning

The aforementioned approaches require significant efforts in seeking (i) a good feature (or a set of features) and (ii) an efficient inference scheme. However, there is no guarantee that they will always perform as intended in all circumstances. As a result, deep learning has been applied to image dehazing to improve flexibility. Given a reliable training dataset, DNNs can estimate the transmittance and atmospheric light with high accuracy because they allow learning and augmenting image features from low to high levels of abstraction. For example, Cai et al. [21] designed a convolutional neural network (CNN) to perform the following: low-level feature extraction, multi-scale mapping, augmentation (for spatial invariance), and non-linear transmittance inference.

The powerful learning ability of DNNs or deep CNNs also allows them to infer the dehazed image directly from the hazy input. In this direction, the encoder-decoder network has been proved highly efficient for end-to-end learning [22], [23]. In addition, some well-known image processing schemes can be applied to deep learning to improve performance, as witnessed by multi-scale image fusion [22] and domain adaptation [23]. Also, inspired by the human brain that knowledge learned from doing a particular activity may benefit another activity, joint learning is a promising direction, typified by [24], where image dehazing benefits object detection.

Some state-of-the-art deep dehazing networks developed recently include GridDehazeNet (GDN) [25], multi-scale boosted dehazing network (MSBD) [26], you only look yourself (YOLY) [27], and self-augmented unpaired image dehazing (D4) [28]. GDN is a supervised network and comprises three modules. The pre-processing module applies different data-driven enhancement processes to the input image. The backbone module then fuses the results based on the grid network, where a channel-wise attention mechanism is adopted to facilitate the cross-scale circulation of information. Finally, the post-processing module remedies residual artifacts to improve the dehazing quality.

MSBD is also a supervised network designed with boosting and error feedback mechanisms. The former successively refines the intermediate dehazing result to reduce the portion of haze (PoH defined by Dong et al. [26]), and the latter successively recovers spatial details obscured by haze. YOLY, on the contrary, is an unsupervised and untrained network. Based on the layer disentanglement in [29], YOLY is designed with three sub-networks that decompose the hazy image into three latent layers corresponding to scene radiance, transmittance, and atmospheric light. Thus, YOLY supervises itself to jointly optimize three sub-networks and reconstruct the hazy image from a single input.

Yang et al. [28] argued that YOLY lacked knowledge from the clean image domain and developed D4 as an alternative solution. Unlike other unpaired networks, D4 takes account of the scattering coefficient and scene depth when carrying out dehazing and rehazing cycles. Consequently, D4 can benefit from physical-model-based haze removal and generation to improve the performance of unpaired learning.

D. Motivations

Image dehazing has undergone approximately five decades of development since the pioneering work in 1972 [30]. It is currently in the mature stage, and the focus is deemed to attain computational efficiency for integrating into low-cost edge devices, which are prevalent in the Industry 4.0 era.

As discussed thus far, although DNNs offer some definite advantages, such as accuracy and flexibility, they are not a preferable option. In contrast, traditional computer vision techniques are more suitable for image dehazing because a hand-engineered method can deliver comparative performance at a cheaper computational cost. This paper, therefore, proposes an \mathcal {O}(N) algorithm that pre-processes, dehazes, and post-processes hazy images to satisfactorily restore clean visibility.

SECTION III.

Proposed Method

Fig. 3 illustrates three major steps constituting the proposed method. The first is unsharp masking for enhancing the sharpness, wherein the enhancement is locally adapted to the variance of image intensities lest the out-of-range problem occurs. The second performs image dehazing based on the improved color attenuation prior [31], which estimates the transmittance from saturation and brightness (as by Zhu et al. [19]) and applies two no-black-pixel (NBP) constraints. The third performs color gamut expansion by enhancing the luminance and then expanding the color gamut proportionally to avoid color distortion. The following describes these three steps in more detail.

FIGURE 3. - Illustration of the proposed method. NBP stands for no-black-pixel, and CDF stands for the cumulative distribution function.
FIGURE 3.

Illustration of the proposed method. NBP stands for no-black-pixel, and CDF stands for the cumulative distribution function.

A. Pre-Processing

In the beginning, it is worth recalling that the input RGB image of size H\times W is denoted as \mathbf {I}\in \mathbb {R}^{H\times W\times 3} , or interchangeably, \mathbf {I}=\{I^{R},I^{G},I^{B}\} , where I^{c}\in \mathbb {R}^{H\times W} and c\in \{R,G,B\} .

As haze is depth-dependent, it is generally smooth except at discontinuities. Hence, it can be viewed as a low-frequency component that obscures fine details in the captured image. This pre-processing step then enhances these obscured details by adding the scaled Laplacian image to the original, as Fig. 4 shows. Because the sharpness enhancement only applies to the luminance channel, it is necessary to convert between RGB and YCbCr color spaces using (8) and (9) from [32]. In (8), Y , Cb, and Cr are the luminance, blue-difference chroma, and red-difference chroma components of the input image \mathbf {I} , and Y_{e} denotes the output luminance with sharpness enhanced. In (9), \mathbf {I}_{e}\in \mathbb {R}^{H\times W\times 3} , or \mathbf {I}_{e}=\{I_{e}^{R},I_{e}^{G},I_{e}^{B}\} , is the output RGB image corresponding to \{Y_{e},\mathrm {Cb},\mathrm {Cr}\} .\begin{align*} \begin{bmatrix} Y\\ \mathrm {Cb}\\ \mathrm {Cr} \end{bmatrix}=&\begin{bmatrix} 0.183 &\quad 0.614 &\quad 0.062\\ -0.101 &\quad -0.338 &\quad 0.439\\ 0.439 &\quad -0.399 &\quad -0.040 \end{bmatrix} \! \begin{bmatrix} I^{R}\\ I^{G}\\ I^{B} \end{bmatrix} \\&+ \begin{bmatrix} 16\\ 128\\ 128 \end{bmatrix}, \tag{8}\\ \begin{bmatrix} I_{e}^{R}\\ I_{e}^{G}\\ I_{e}^{B} \end{bmatrix}=&\begin{bmatrix} 1.164 &\quad 0 &\quad 1.793\\ 1.164 &\quad -0.213 &\quad -0.534\\ 1.164 &\quad 2.115 &\quad 0 \end{bmatrix} \! \begin{bmatrix} Y_{e} - 16\\ \mathrm {Cb} - 128\\ \mathrm {Cr} - 128 \end{bmatrix}. \tag{9}\end{align*} View SourceRight-click on figure for MathML and additional features.

FIGURE 4. - Block diagram of the pre-processing step.
FIGURE 4.

Block diagram of the pre-processing step.

Next, the Laplacian image is obtained by convolving the input luminance Y with the Laplacian operator \nabla ^{2} , whose definition is in (10). Meanwhile, the local variance v of luminance intensities is calculated as the expected value of the squared deviation from the mean, as (11) illustrates. The symbol \circledast denotes the convolution operator, and U_{k} is an all-ones square matrix of size k\times k , where k=\{2n+1 | n\in \mathbb {Z}^{+}\} is an odd integer.\begin{align*} \nabla ^{2}\triangleq&\begin{bmatrix} 0 &\quad 1 &\quad 0\\ 1 &\quad -4 &\quad 1\\ 0 &\quad 1 &\quad 0 \end{bmatrix}, \tag{10}\\ v=&Y^{2} \circledast \left ({\frac {U_{k}}{k^{2}}}\right) - \left [{Y \circledast \left ({\frac {U_{k}}{k^{2}}}\right)}\right]^{2}. \tag{11}\end{align*} View SourceRight-click on figure for MathML and additional features.

As demonstrated at the bottom-left of Fig. 4, the scaling factor \alpha is a piece-wise linear function of the local variance v . The function definition is in (12), where \{\alpha _{1},\alpha _{2},v_{1},v_{2}\} are user-defined parameters for fine-tuning. Hence, the output luminance Y_{e} is obtained by (13), which scales the Laplacian image and adds it back to the input luminance. The YCbCr-to-RGB conversion in (9) then yields the output RGB image \mathbf {I}_{e} .\begin{align*} \alpha=&\begin{cases} \alpha _{1} & v < v_{1}\\ \left ({\dfrac {\alpha _{2}-\alpha _{1}}{v_{2}-v_{1}}}\right)v + \dfrac {\alpha _{1} v_{2} - \alpha _{2} v_{1}}{v_{2}-v_{1}} & v_{1} \leq v \leq v_{2}\\ \alpha _{2} & v > v_{2}, \end{cases} \qquad \tag{12}\\ Y_{e}=&Y + \alpha \cdot (\nabla ^{2} \circledast Y). \tag{13}\end{align*} View SourceRight-click on figure for MathML and additional features.

Unsharp masking can be loosely viewed as a “mildly dehazing” step because it partially relieves the impact of haze on image sharpness. The following, conversely, is a haze-removal-dedicated step developed from the improved color attenuation prior [31].

B. Dehazing

Two important parts of this step are (i) scene depth estimation and (ii) NBP constraint derivation. The former is based on the color attenuation prior [19] with several improvements in the learning scheme and the dataset preparation. Meanwhile, the latter is inspired by [33] to constrain the transmittance lest black pixels occur. Fig. 5 shows the overall block diagram, where the input image \mathbf {I}_{e} is from the previous step, and the dehazed image is denoted as \mathbf {J}\in \mathbb {R}^{H\times W\times 3} .

FIGURE 5. - Block diagram of the dehazing step.
FIGURE 5.

Block diagram of the dehazing step.

1) Scene Depth Estimation

The scene depth d is inferred from saturation S and brightness V using a linear function below:\begin{equation*} d = \theta _{0} + \theta _{1} S + \theta _{2} V + \varepsilon, \tag{14}\end{equation*} View SourceRight-click on figure for MathML and additional features. where \{\theta _{0},\theta _{1},\theta _{2}\} are the function’s parameters, and \varepsilon denotes the error associated with the inference. Zhu et al. [19] assumed that \varepsilon followed the normal distribution \mathcal {N}(0,\sigma ^{2}) with zero mean and \sigma ^{2} variance. Hence, it was derived from (14) that d also followed a normal distribution \mathcal {N}(\theta _{0} + \theta _{1} S + \theta _{2}\,\,V, \sigma ^{2}) . Given an annotated dataset of hazy images and their corresponding scene depths, maximum likelihood estimates can be applied to learn the parameters. However, as it is virtually impossible to obtain such a dataset in practice, a synthetic dataset with depth information drawn from a probability distribution is a viable alternative.

Zhu et al. [19] utilized the standard uniform distribution (SUD) to generate a synthetic training dataset. After that, they adopted the stochastic gradient ascent (SGA) to find the parameters that maximized the log-likelihood function. In the proposed method, the enhanced equidistribution [31] supersedes SUD to improve the statistical reliability of the synthetic dataset. Additionally, the mini-batch gradient ascent with an adaptive learning rate [34] replaces SGA to reduce the convergence time.

The scene depth d is now available, but there is no guarantee that it will be mostly smooth except at discontinuities. Consequently, the refinement block applies a modified hybrid median filter [35] to the scene depth to impose edge-aware smoothness. Given the refined scene depth d_{r} , the transmittance is obtained through t = \mathrm {exp}[-\beta _{sc}d_{r}] , with \beta _{sc}=1 . Generally, most image dehazing algorithms in the literature adopted two fixed limits to constrain t , expressed as t_{0} \leq t \leq 1 , with t_{0} being a small positive number. The following, on the contrary, describes two NBP constraints for posing an adaptive lower limit on t .

2) NBP Constraints

From (7), the dehazed image (or, equivalently, scene radiance) \mathbf {J} can be obtained as:\begin{equation*} \mathbf {J} = \frac {\mathbf {I}_{e} - \mathbf {A}}{t} + \mathbf {A}. \tag{15}\end{equation*} View SourceRight-click on figure for MathML and additional features.

The first NBP constraint, \mathbf {J} \geq 0 , is relatively evident because it can reduce the number of black pixels that occur after dehazing due to underflows. Hence, it is derived from (15) that:\begin{equation*} t \geq 1 - \min _{c\in \{R,G,B\}}\left ({\frac {I_{e}^{c}}{A^{c}}}\right), \tag{16}\end{equation*} View SourceRight-click on figure for MathML and additional features. where \min _{c\in \{R,G,B\}}(\cdot) denotes a channel-wise minimum operation.

The second NBP constraint is inspired by [33] that the local mean intensity of \mathbf {J} must be greater than or equal to its local standard deviation, as expressed by:\begin{equation*} \mathop{\mathrm{mean}}\limits _{\forall y\in \Omega (x)}[Y_{p}(y)] \geq q\cdot \mathop{\mathrm{std}}\limits _{\forall y\in \Omega (x)}[Y_{p}(y)], \tag{17}\end{equation*} View SourceRight-click on figure for MathML and additional features. where Y_{p} represents the luminance channel of \mathbf {J} , q is a positive number to adjust the strictness, and \mathrm {mean}_{\forall y\in \Omega (x)}(\cdot) and \mathrm {std}_{\forall y\in \Omega (x)}(\cdot) denote the mean and standard deviation filters, respectively, with \Omega (x) being a square patch centered at x . It is worth noting that Y_{p} is related to Y_{e} through (15), and this relation can be exploited to approximate the two terms of (17) as follows:\begin{align*} \mathop{\mathrm{mean}}\limits _{\forall y\in \Omega (x)}[Y_{p}(y)]\approx&\frac {1}{t}\left [{Y_{e} \circledast \left ({\frac {U_{k}}{k^{2}}}\right) - \bar {A}}\right] + \bar {A}, \tag{18}\\ \mathop{\mathrm{std}}\limits _{\forall y\in \Omega (x)}[Y_{p}(y)]\approx&\frac {1}{t}\sqrt {Y_{e}^{2}\circledast \left ({\frac {U_{k}}{k^{2}}}\right) \!-\! \left [{Y_{e}\circledast \left ({\frac {U_{k}}{k^{2}}}\right)}\right]^{2}}, \tag{19}\end{align*} View SourceRight-click on figure for MathML and additional features. where \bar {A} = (A^{R}+A^{G}+A^{B})/3 is the average intensity of atmospheric light \mathbf {A} . Hence, (18) and (19) are substituted back into (17) to obtain the second NBP constraint, as below:\begin{align*}&\hspace {-0.5pc} t \geq 1 - \left ({{\bar {A}}}\right)^{-1}\Bigg \{Y_{e} \circledast \left ({\frac {U_{k}}{k^{2}}}\right) \\&- q\sqrt {Y_{e}^{2}\circledast \left ({\frac {U_{k}}{k^{2}}}\right) - \left [{Y_{e}\circledast \left ({\frac {U_{k}}{k^{2}}}\right)}\right]^{2}}\Bigg \}. \tag{20}\end{align*} View SourceRight-click on figure for MathML and additional features.

Let t_{\mathrm {NBP}_{1}} and t_{\mathrm {NBP}_{2}} denote expressions on the right-hand side of (16) and (20). The NBP constraint t_{\mathrm {NBP}} is then expressed as:\begin{equation*} t_{\mathrm {NBP}} = \max \left ({t_{\mathrm {NBP}_{1}}, t_{\mathrm {NBP}_{2}}}\right), \tag{21}\end{equation*} View SourceRight-click on figure for MathML and additional features. where \max (a,b) returns the greater number between a and b . Thus, the transmittance t is constrained between t_{\mathrm {NBP}} and unity; that is, \begin{equation*} t_{\mathrm {NBP}} \leq t \leq 1, \tag{22}\end{equation*} View SourceRight-click on figure for MathML and additional features. and the scene radiance \mathbf {J} is recovered using (15).

As underflows and overflows are inevitable in digital computations, the recovered image suffers from color gamut reduction, rendering a post-processing step highly relevant. The following describes an efficient method for luminance enhancement and color gamut expansion [36]. This method also produces a positive ramification that eases the atmospheric light estimation. More precisely, it can be observed from (15) that \mathbf {A} is proportional to the dehazing power, and \mathbf {A}=\{255,255,255\} corresponds to the maximum. Dehazing at this extreme level may worsen the color gamut reduction. It is, however, feasible to use \mathbf {A}=\{255,255,255\} in the proposed method because the post-processing step will compensate for the looming problem. Therefore, we adopted \mathbf {A}=\{255,255,255\} as the atmospheric light.

C. Post-Processing

Fig. 6 shows the overall block diagram, where the input image is the recovered scene radiance \mathbf {J}\in \mathbb {R}^{H\times W\times 3} , and the final output image is denoted as \mathbf {J}_{f}\in \mathbb {R}^{H\times W\times 3} .

FIGURE 6. - Block diagram of the post-processing step.
FIGURE 6.

Block diagram of the post-processing step.

1) Luminance Enhancement

Existing enhancement methods generally operate on the entire luminance range, which may result in over-enhancement. Accordingly, the method in [36] adopted an adaptive limit point (ALP) to constrain the range scene-wisely. Given the luminance channel Y_{p} of \mathbf {J} , ALP is calculated from the mean \bar {Y_{p}} and the cumulative distribution function CDF of Y_{p} as follows:\begin{align*} \mathrm {ALP} = \begin{cases} 0.04 + \dfrac {0.02}{255}\left ({L_{\mathrm {CDF}_{0.9}} - L_{\mathrm {CDF}_{0.1}}}\right) & \bar {Y_{p}} > 128\\[8pt] 0.04 - \dfrac {0.02}{255}\left ({L_{\mathrm {CDF}_{0.9}} - L_{\mathrm {CDF}_{0.1}}}\right) & \bar {Y_{p}} \leq 128, \end{cases} \\{}\tag{23}\end{align*} View SourceRight-click on figure for MathML and additional features. where L_{\mathrm {CDF}_{k}} denotes the luminance value at which \mathrm {CDF}(L_{\mathrm {CDF}_{k}})=k , with k\in \mathbb {R} and 0\leq k \leq 1 .

It is worth noting that over-enhancement is avoidable by assigning higher gains to smaller luminance values, and ALP can be exploited for that purpose, as (24) shows:\begin{align*} g_{1}(Y_{p})=&\frac {Y_{p}}{2^{21}}\left [{255\left ({1 - \frac {Y_{p}-\mathrm {ALP}}{255}}\right)^{\theta} \left ({\frac {255-Y_{p}}{255}}\right)}\right]^{2}, \\{}\tag{24}\\ \theta=&\frac {1.5\left ({L_{\mathrm {CDF}_{0.4}}-L_{\mathrm {CDF}_{0.1}}}\right)}{\bar {Y_{p}}-L_{\mathrm {CDF}_{0.1}}} - 0.55, \tag{25}\end{align*} View SourceRight-click on figure for MathML and additional features. where g_{1} is the non-linear luminance gain, 2−21 is a normalization factor, and the exponent \theta is empirically determined to maximize the tone-mapped image quality index (TMQI) [37]. A linear weight g_{2} in (26) is also adopted, where SL and IN are user-defined parameters to adjust the slope and intercept, respectively. The enhanced luminance Y_{f} is then obtained using (27).\begin{align*} g_{2}(Y_{p})=&\frac {\mathrm {SL}}{255}Y_{p} + \mathrm {IN}, \tag{26}\\ Y_{f}=&Y_{p} + g_{1}(Y_{p}) \cdot g_{2}(Y_{p}). \tag{27}\end{align*} View SourceRight-click on figure for MathML and additional features.

2) Color Gamut Expansion

The first block of color space conversion in Fig. 6 produces Y_{p} , Cbp, and Crp as the luminance, blue-difference chroma, and red-difference chroma components of \mathbf {J} . However, processing Cbp and Crp separately is unnecessary because the human visual system is less sensitive to color differences than luminance differences [38]. Thus, chrominance subsampling with a 4:2:2 ratio is adopted to combine these two into Chp for computational efficiency, as (28), (29), and (30) demonstrate, where D_{\mathrm {dec}}=[{1,2,1}]/4 is the decimation filter, and i\in \{1,2,\ldots,H\} and j\in \{1,2,\ldots,W\} are pixel coordinates. Also, as Cbp, Crp, and Chp are two-dimensional data, \mathrm {cb}_{ij} , \mathrm {cr}_{ij} , and \mathrm {ch}_{ij} are adopted to denote their component in the i -th row and j -th column, respectively.\begin{align*} \mathrm {Cb}_{d}=&\mathrm {Cb}_{p} \circledast D_{\mathrm {dec}} = \left \{{\mathrm {cb}_{ij}\in \mathbb {R}}\right \}, \tag{28}\\ \mathrm {Cr}_{d}=&\mathrm {Cr}_{p} \circledast D_{\mathrm {dec}} = \left \{{\mathrm {cr}_{ij}\in \mathbb {R}}\right \}, \tag{29}\\ \mathrm {Ch}_{p}=&\left \{{\mathrm {ch}_{ij}\in \mathbb {R} \, \big |\, \mathrm {ch}_{ij}=\mathrm {cb}_{ij}, }\right. \\&\left.{ \forall i,j \;\mathrm {s.t.}\; j=\{2n+1 \, \big |\, n\in \mathbb {Z}_{0}^{+}\}, }\right. \\&\left.{ \mathrm {otherwise}\; \mathrm {ch}_{ij}=\mathrm {cr}_{ij}}\right \}. \tag{30}\end{align*} View SourceRight-click on figure for MathML and additional features.

According to the Helmholtz-Kohlrausch effect [39], the luminance Y_{p} is related to the chrominance Chp, and luminance enhancement narrows the color gamut in the chromaticity coordinates. Hence, the expansion should be proportional to the ratio between Y_{f} and Y_{p} , as expressed by the color gain g_{3} below:\begin{equation*} g_{3}(Y_{p},\mathrm {Ch}_{p}) = \frac {Y_{f}}{Y_{p}}\mathrm {Ch}_{p}. \tag{31}\end{equation*} View SourceRight-click on figure for MathML and additional features.

Moreover, an additional weight g_{4} is adopted to maximize the TMQI, and its expression in (32) is determined through experiments.\begin{align*} g_{4}(Y_{p}) \!=\!\! \begin{cases} 0.7 & Y_{p} < \mathrm {TH}_{1}\\ 0.7 - 0.26\dfrac {Y_{p} - \mathrm {TH}_{1}}{\mathrm {TH}_{2} - \mathrm {TH}_{1}} & \mathrm {TH}_{1} \leq Y_{p} \leq \mathrm {TH}_{2}\\ 0.44 & Y_{p} > \mathrm {TH}_{2}, \end{cases}\qquad \tag{32}\end{align*} View SourceRight-click on figure for MathML and additional features. where TH1 and TH2 are user-defined parameters to adjust the expansion range. The expanded chrominance Chf is then obtained using the following:\begin{equation*} \mathrm {Ch}_{f} = \mathrm {Ch}_{p} + g_{3}(Y_{p},\mathrm {Ch}_{p}) \cdot g_{4}(Y_{p}). \tag{33}\end{equation*} View SourceRight-click on figure for MathML and additional features.

Next, the chrominance interpolation block separates Chf into two temporary variables, Cbt and Crt, for the final block of color space conversion. To describe chrominance interpolation, we reused \mathrm {ch}_{ij} , \mathrm {cb}_{ij} , and \mathrm {cr}_{ij} to denote a component of Chf, Cbt, and Crt in the i -th row and j -th column, respectively. Given \mathrm {Ch}_{f}=\{\mathrm {ch}_{ij}\in \mathbb {R}\} , Cbt and Crt are obtained by interlacing Chf with zeros, as (34) and (35) show.\begin{align*} \mathrm {Cb}_{t}=&\left \{{\mathrm {cb}_{ij}\in \mathbb {R} \, \big |\, \mathrm {cb}_{ij}=\mathrm {ch}_{ij}, }\right. \\&\left.{ \forall i,j \;\mathrm {s.t.}\; j=\{2n+1 \, \big |\, n\in \mathbb {Z}_{0}^{+}\} }\right. \\&\left.{ \mathrm {otherwise}\; \mathrm {cb}_{ij}=0}\right \}, \tag{34}\\ \mathrm {Cr}_{t}=&\left \{{\mathrm {cr}_{ij}\in \mathbb {R} \, \big |\, \mathrm {cr}_{ij}=\mathrm {ch}_{ij}, }\right. \\&\left.{ \forall i,j \;\mathrm {s.t.}\; j=\{2n \, \big |\, n\in \mathbb {Z}^{+}\} }\right. \\&\left.{ \mathrm {otherwise}\; \mathrm {cr}_{ij}=0}\right \}. \tag{35}\end{align*} View SourceRight-click on figure for MathML and additional features.

After that, Cbt and Crt are convolved with the interpolation filter D_{\mathrm {int}}=[{1,2,1}]/2 to get Cbf and Crf. Finally, the image information in the YCbCr color space \{Y_{f},\mathrm {Cb}_{f},\mathrm {Cr}_{f}\} is converted back to the RGB color space using (9), yielding the final image \mathbf {I}_{f} .

SECTION IV.

Results and Discussions

This section presents a comparative evaluation of the proposed method against nine state-of-the-art benchmarks selected from the three image dehazing categories discussed in Section II. These nine are proposed by He et al. [15], Tarel and Hautiere [17], Zhu et al. [19], Cai et al. [21], Ren et al. [3], Liu et al. [25], Dong et al. [26], Li et al. [27], and Yang et al. [28], respectively.

A. Qualitative Evaluation on Real-World Hazy Images

Fig. 7 demonstrates a qualitative comparison of ten methods on a real-world hazy image from the IVC dataset [40]. Results by He et al. [15], Cai et al. [21], Liu et al. [25], Dong et al. [26], Li et al. [27], and Yang et al. [28] exhibit a good dehazing performance as the scene radiance has been recovered without any unpleasant artifacts. The result by Zhu et al. [19] is slightly over-dehazed, losing dark and distant details. Results by Tarel and Hautiere [17] and Ren et al. [3] are less favorable than others because a portion of haze persists.

FIGURE 7. - Dehazing results of ten methods on a real-world hazy image. From left to right: input image and results by He et al. [15], Tarel and Hautiere [17], Zhu et al. [19], Cai et al. [21], Ren et al. [3], Liu et al. [25], Dong et al. [26], Li et al. [27], Yang et al. [28], and the proposed method. The input image was duplicated for ease of comparison.
FIGURE 7.

Dehazing results of ten methods on a real-world hazy image. From left to right: input image and results by He et al. [15], Tarel and Hautiere [17], Zhu et al. [19], Cai et al. [21], Ren et al. [3], Liu et al. [25], Dong et al. [26], Li et al. [27], Yang et al. [28], and the proposed method. The input image was duplicated for ease of comparison.

Above all, it can be observed that nine benchmark methods are ineffective in recovering image details, as witnessed by the traffic light and the man’s face in the red-cropped and blue-cropped regions. This common drawback can be explained as follows. Dehazing is fundamentally the subtraction of haze from the input image, and the subtraction degree depends on the transmittance. However, estimating a transmittance with rich details is challenging because spatial filtering usually attenuates high-frequency information. Although an outstanding guided filter [41] has been adopted to refine the transmittance estimate, it is noted that the best guidance image in single image dehazing methods is the input image itself. Accordingly, the lack of an informative guidance image constrains the refinement.

The proposed method, in contrast, effectively removes haze while enhancing the sharpness and the color gamut, as witnessed by the man’s face and the facial skin color in the blue-cropped region. This definite advantage is attributed to the pre-processing (unsharp masking) and post-processing (color gamut expansion) steps. The intermediate results in Fig. 8 show that the former has improved image details to such an extent that the contours of distant objects have become noticeable. Meanwhile, the latter, as claimed, has successfully remedied the post-dehazing problem of color gamut reduction.

FIGURE 8. - Intermediate results of the proposed method on a real-world hazy image. From left to right: input image and results after pre-processing, dehazing, and post-processing steps.
FIGURE 8.

Intermediate results of the proposed method on a real-world hazy image. From left to right: input image and results after pre-processing, dehazing, and post-processing steps.

Fig. 9 shows more qualitative comparison results on real-world hazy images. It can be observed from the first row that the result by He et al. [15] is satisfactory, albeit with the post-dehazing false enlargement of the train’s headlight. This problem occurs when the atmospheric light is less than pixel intensities around the headlight, as discussed in [42]. Accordingly, the maximum value of \mathbf {A}=\{255,255,255\} adopted in the proposed method ensures that it is free of that problem. Next, the method of Tarel and Hautiere [17] demonstrates an acceptable performance, but halo artifacts arise at image edges due to the use of large median filters. The result by Zhu et al. [19] suffers from a loss of dark details owing to excessive haze removal. The remaining six deep-learning-based methods perform relatively well, in which results by Cai et al. [21] and Yang et al. [28] are more favorable than others. Our result, as expected, exhibits three desirable outcomes: haze removal, sharpness enhancement, and color gamut expansion.

FIGURE 9. - Qualitative evaluation of ten methods on real-world hazy images. From left to right: input images and results by He et al. [15], Tarel and Hautiere [17], Zhu et al. [19], Cai et al. [21], Ren et al. [3], Liu et al. [25], Dong et al. [26], Li et al. [27], Yang et al. [28], and the proposed method. We abbreviated the method of Tarel and Hautiere [17] to T & H and the proposed method to PM.
FIGURE 9.

Qualitative evaluation of ten methods on real-world hazy images. From left to right: input images and results by He et al. [15], Tarel and Hautiere [17], Zhu et al. [19], Cai et al. [21], Ren et al. [3], Liu et al. [25], Dong et al. [26], Li et al. [27], Yang et al. [28], and the proposed method. We abbreviated the method of Tarel and Hautiere [17] to T & H and the proposed method to PM.

Similar observations also emerge from the second to the fourth rows of Fig. 9. The dark channel assumption of the method of He et al. [15] does not hold for the sky region, causing severe color distortion in the second row. As with the interpretation of results in the first row, the method of Tarel and Hautiere [17] suffers from halo artifacts, and the method of Zhu et al. [19] suffers from a loss of dark details. Results by deep-learning-based methods, on the contrary, do not exhibit any unpleasant artifacts, which is attributed to the powerful representation capability and the flexibility of CNNs. Compared with these benchmarks, the proposed method exhibits an almost comparative or even better performance.

B. Quantitative Evaluation on Public Datasets

This section presents an objective assessment of the proposed method against nine benchmarks on public image datasets. It is worth noting that there are numerous metrics for image quality assessment, such as the conventional peak signal-to-noise ratio (PSNR), the structural similarity (SSIM) [43], the feature similarity index extended to color images (FSIMc) [44], and the TMQI [37]. The first is pixel-based and thus does not correlate well with subjective ratings. The second, in contrast, is structure-based and can better quantify the perceived image quality. However, it has a drawback in that it utilizes a uniform weight to pool a single quality score. Accordingly, the third improves the SSIM by adopting an adaptive pooling weight and taking account of chrominance information. The fourth improves the SSIM in another direction by considering multi-scale structural similarity and naturalness. Therefore, we selected the FSIMc and the TMQI for our quantitative evaluation due to their high correlation with subjective assessment. These two metrics vary from zero to unity, and higher scores are more favorable in image dehazing. Also, as they are full-reference, their computation requires datasets comprising pairs of hazy and haze-free images.

Table 1 summarizes five public datasets utilized in this evaluation, including FRIDA2 [33], D-HAZY [45], O-HAZE [46], I-HAZE [47], and DENSE-HAZE [48]. The FRIDA2 consists of 66 graphics-generated images depicting road scenes, from which four corresponding sets of hazy images are generated, thus a total of 66 haze-free and 264 hazy images. The D-HAZY is generated from the Middlebury [49] and NYU Depth [50] datasets according to the degradation model in (7) with scene depths captured by a Microsoft Kinect camera. It is composed of 1472 pairs of synthetic indoor images, but this evaluation only utilizes 23 pairs from Middlebury due to their substantial variation in image scenes. The other 1449 pairs from NYU Depth portray relatively similar scenes and thus may bias the evaluation results. In contrast, the O-HAZE, I-HAZE, and DENSE-HAZE comprise 45, 30, and 55 pairs depicting real-world indoors, outdoors, and both, respectively.

TABLE 1 Summary of Image Datasets for Quantitative Evaluation. The Symbol # Represents Quantities
Table 1- 
Summary of Image Datasets for Quantitative Evaluation. The Symbol # Represents Quantities

Tables 2 and 3 demonstrate average evaluation scores in FSIMc and TMQI on five public datasets. It can be observed from Table 2 that the proposed method is ranked fourth overall, below the deep dehazing models of Yang et al. [28], Ren et al. [3], and Dong et al. [26]. However, it is worth noting that the difference between its FSIMc score and the best is subtle. Specifically, it outperforms other benchmarks on FRIDA2 and is within the top four methods for dehazing real-world images in O-HAZE and I-HAZE. Nevertheless, its performance is slightly under-par on D-HAZY and DENSE-HAZE. This observation can be related to the fact that FSIMc quantifies the degradation rather than the enhancement. Thus, it is unsurprising that two models of Ren et al. [3] and Dong et al. [26], trained on fully annotated datasets to minimize the difference between their predictions and corresponding ground-truth references, have achieved the top scores. This interpretation can be further supported by the unimpressive score of the unsupervised model of Li et al. [27], which does not require ground-truth references. However, this and the best-performing model of Yang et al. [28] have shown the great potential of unsupervised and unpaired learning in computer vision.

TABLE 2 Average Evaluation Scores in Terms of the Feature Similarity Index Extended to Color Images (FSIMc) on Five Public Datasets. Best Results are Boldfaced, and Second-Best Results are Underlined
Table 2- 
Average Evaluation Scores in Terms of the Feature Similarity Index Extended to Color Images (FSIMc) on Five Public Datasets. Best Results are Boldfaced, and Second-Best Results are Underlined
TABLE 3 Average Evaluation Scores in Terms of the Tone-Mapped Image Quality Index (TMQI) on Five Public Datasets. Best Results are Boldfaced, and Second-Best Results are Underlined
Table 3- 
Average Evaluation Scores in Terms of the Tone-Mapped Image Quality Index (TMQI) on Five Public Datasets. Best Results are Boldfaced, and Second-Best Results are Underlined

According to Table 3, the proposed method is ranked second overall, and its TMQI score only differs from the best at the fourth decimal place. More specifically, the proposed method exhibits a comparative performance on FRIDA2 and an under-par performance on D-HAZY. In contrast, it outperforms benchmark methods on real-world datasets, such as O-HAZE and I-HAZE, as witnessed by a significant difference in TMQI scores. Hence, unsharp masking and color gamut expansion appear to benefit real-world images. However, such benefits as these two steps offer do not suffice for handling densely hazy images owing to the under-performance of the dehazing step.

As a result, it can be concluded that the proposed method demonstrates a comparative performance with state-of-the-art benchmarks, notably the deep learning models of Yang et al. [28], Ren et al. [3], and Dong et al. [26].

C. Processing Time Comparison

Notwithstanding a comparative performance, the proposed method possesses a linear time complexity, \mathcal {O}(N) or \mathcal {O}(H\times W) , where H and W denote the image’s height and width. According to Section III, the most computationally intensive operations are the mean filter in the pre-processing step and the modified hybrid median filter in the dehazing step. The implementation of these two filters affects the entire algorithm’s complexity directly. Let us assume that the kernel size is S_{h}\times S_{w} . Naive implementations result in \mathcal {O}(H\times W\times S_{h}\times S_{w}) complexity. Fortunately, \mathcal {O}(H\times W) implementations of those two filters are available in [51] and [18]; therein lies the proposed method’s linear time complexity.

Table 4 summarizes the processing time of ten methods on different image resolutions, ranging from VGA (640\times 480 ) to 8K UHD (7680\times 4320 ). As source codes of nine benchmarks are publicly available, we used them and adopted the parameter configuration provided by their authors. This measurement was conducted in MATLAB R2019a and Python 3.9.9 (with PyTorch 1.12.0+ cu116), both running on a computer with an Intel Core i9-9900K (3.6 GHz) CPU, 64 GB RAM, and an Nvidia TITAN RTX.

TABLE 4 Processing Time in Seconds of Ten Methods on Different Image Resolutions. Best Results are Boldfaced, Second-Best Results are Underlined, and NA Stands for Not Available With the Underlying Cause, REx (RAM Exhaustion), MEx (Memory Exhaustion), or RTE (Run-Time Error), in Parentheses
Table 4- 
Processing Time in Seconds of Ten Methods on Different Image Resolutions. Best Results are Boldfaced, Second-Best Results are Underlined, and NA Stands for Not Available With the Underlying Cause, REx (RAM Exhaustion), MEx (Memory Exhaustion), or RTE (Run-Time Error), in Parentheses

It emerges that the method of He et al. [15] is the least efficient in terms of time and memory. This finding is consistent with its widely known drawback rooted in soft-matting. Notably, RAM was exhausted when invoking this algorithm on an 8K UHD image, and the processing time, in this case, was denoted as not available. Similarly, the unsupervised model of Li et al. [27] could not process DCI 4K and 8K UHD images owing to memory exhaustion. This model progressively refines the dehazing result, and its default configuration is to run through 800 iterations. Therefore, its processing time is significantly larger than those of other methods.

Next on the list are two models of Cai et al. [21] and Ren et al. [3]. As discussed earlier, they are deemed to be overkill due to the high computational cost inherent in them. Measurements in Table 4 then verified that claim. However, recent models of Liu et al. [25], Dong et al. [26], and Yang et al. [28] benefited from batch processing and parallel computing. Under these mechanisms, PyTorch needs to initialize the GPU, for example, making replicas of the model on each GPU worker. Accordingly, the execution time of the first image was prolonged, whereas those of the remaining images were substantially shortened. Notably, the model of Liu et al. [25] utilized four GPU workers and thus consumed the least processing time for SVGA, HD, FHD, and DCI 4K resolutions. It is also worth noting that batch processing and parallel computing generally cause a jump in memory consumption, proportional to the number of GPU workers. Thus, the model of Liu et al. [25] suffered from memory exhaustion when handling an 8K UHD image. Conversely, the model of Dong et al. [26] was free of that problem but returned a run-time error when processing an SVGA image.

The two methods of Tarel and Hautiere [17] and Zhu et al. [19] are well-recognized for their computational efficiency, thus accounting for their fast processing speed recorded in Table 4. Additionally, it is noteworthy that Zhu et al. [19] adopted the fast implementation of the guided filter, which downscaled the input image to ease the computational burden. If they utilize the standard guided filter, their method would be slower than that of Tarel and Hautiere [17].

The proposed method is ranked second overall, notably without batch processing or parallel computing but simply sequential computing. Although it is slower than the fastest model of Liu et al. [25], Tables 2 and 3 demonstrate that it outperforms this model under FSIMc and TMQI. Also, compared with the fast sequential method of Zhu et al. [19], it achieved approximately 2.2\times speedup for two main reasons. Firstly, the proposed method skips the atmospheric light estimation. Secondly, it only makes six calls to three different \mathcal {O}(N) spatial filters, including a call to a 3\times 3 Laplacian filter in (13), four calls to the box filter in (11) and (20), and a call to the modified hybrid median filter in the scene depth refinement step. On the contrary, the method of Zhu et al. [19] needs to estimate the atmospheric light and make eighteen calls to the box filter inside the fast-guided filter. This difference accounts for a big gap between the processing time of the two methods. Hence, the definite advantage of a low computational cost is attributed to the elegant partition of image dehazing into three essential steps: pre-processing, dehazing, and post-processing, where each can be implemented using traditional computer vision techniques.

D. Ablation Study

The proposed method consists of three steps that operate in a complementary manner. To verify the individual contribution of each step, we conduct ablation studies by considering three variants of our algorithm. They are created by dropping the pre-processing step, the post-processing step, and both, respectively. Table 5 summarizes the evaluation results on five public datasets under FSIMc and TMQI. It can be observed that the pre-processing step (unsharp masking) contributes more to the structural information, while the post-processing step (color gamut expansion) contributes more to the naturalness. Hence, each of these two steps plays an essential role in the proposed method, which justifies its significance.

TABLE 5 Comparisons on Five Public Datasets for Different Variants of the Proposed Method. Best Results are Boldfaced
Table 5- 
Comparisons on Five Public Datasets for Different Variants of the Proposed Method. Best Results are Boldfaced

Moreover, the contributions of the pre-processing and post-processing steps can also be verified by qualitative results in Fig. 10. Excluding the former causes a loss of image details, and excluding the latter gives rise to color gamut reduction. Image quality further worsens when neither of them is included. Accordingly, these observations verify the whole algorithm with the pre-processing, dehazing, and post-processing steps.

FIGURE 10. - Comparisons on real-world hazy images for different variants of the proposed method.
FIGURE 10.

Comparisons on real-world hazy images for different variants of the proposed method.

SECTION V.

Conclusion

This paper presented an efficient method for single image dehazing in linear time. It began with a literature review arguing that deep learning models were overkill, and dehazing by traditional computer vision techniques could achieve a comparative performance at a much lower computational cost. After that, a detailed description of the proposed method was followed. The pre-processing step enhances image sharpness, and the dehazing step recovers the scene radiance according to the improved color attenuation prior. Finally, the post-processing step compensates for the color gamut reduction. Subjective and objective evaluation against nine benchmarks demonstrated that the proposed method was substantially fast while achieving a comparative performance.

Nonetheless, under-performance was observed on densely hazy images. This drawback is common for methods developed from hand-engineered features, which are not abstract enough to reflect how human visual systems recognize dense haze. In this case, image inpainting and conditional image generation may be viable alternatives, but it appears challenging to implement them computationally efficiently. Thus, this task is left for future research.

Author image of Dat Ngo
Technical Support and Development Center for Display Device Convergence Technology, Busan, South Korea
Dat Ngo received the B.S. degree in computer engineering from The University of Danang—University of Science and Technology, Danang, Vietnam, in 2016, and the M.S. and Ph.D. degrees in electronic engineering from Dong-A University, Busan, South Korea, in 2018 and 2021, respectively.
He is a Postdoctoral Researcher with the Technical Support and Development Center for Display Device Convergence Technology, Busan. His research interests include image/video processing, machine learning, deep learning, and FPGA prototypes.
Dat Ngo received the B.S. degree in computer engineering from The University of Danang—University of Science and Technology, Danang, Vietnam, in 2016, and the M.S. and Ph.D. degrees in electronic engineering from Dong-A University, Busan, South Korea, in 2018 and 2021, respectively.
He is a Postdoctoral Researcher with the Technical Support and Development Center for Display Device Convergence Technology, Busan. His research interests include image/video processing, machine learning, deep learning, and FPGA prototypes.View more
Author image of Gi-Dong Lee
Technical Support and Development Center for Display Device Convergence Technology, Busan, South Korea
Department of Electronic Engineering, Dong-A University, Busan, South Korea
Gi-Dong Lee received the B.S., M.S., and Ph.D. degrees in electronic engineering from Busan National University, Busan, South Korea, in 1989, 1991, and 2000, respectively.
He was a Postdoctoral Researcher at the Liquid Crystal Institute, Kent State University, OH, USA, until 2003. Since 2003, he has been with the Department of Electronic Engineering, Dong-A University, Busan, where he is a Full Professor. He is also the Director of the Technical Support and Development Center for Display Device Convergence Technology, Busan. His research interests include 3D stereoscopic, auto-stereoscopic, transparent, public, advanced/high-technology displays, and image processing for high-performance displays.
Gi-Dong Lee received the B.S., M.S., and Ph.D. degrees in electronic engineering from Busan National University, Busan, South Korea, in 1989, 1991, and 2000, respectively.
He was a Postdoctoral Researcher at the Liquid Crystal Institute, Kent State University, OH, USA, until 2003. Since 2003, he has been with the Department of Electronic Engineering, Dong-A University, Busan, where he is a Full Professor. He is also the Director of the Technical Support and Development Center for Display Device Convergence Technology, Busan. His research interests include 3D stereoscopic, auto-stereoscopic, transparent, public, advanced/high-technology displays, and image processing for high-performance displays.View more
Author image of Bongsoon Kang
Technical Support and Development Center for Display Device Convergence Technology, Busan, South Korea
Department of Electronic Engineering, Dong-A University, Busan, South Korea
Bongsoon Kang received the B.S. degree in electronic engineering from Yonsei University, Seoul, South Korea, in 1985, the M.S. degree in electrical engineering from the University of Pennsylvania, Philadelphia, USA, in 1987, and the Ph.D. degree in electrical engineering from Drexel University, Philadelphia, USA, in 1990.
He was a Senior Staff Researcher at Samsung Electronics, Suwon, South Korea, from 1989 to 1999. Since 1999, he has been with the Department of Electronic Engineering, Dong-A University, Busan, South Korea, where he is a Full Professor. He is also a member with the Technical Support and Development Center for Display Device Convergence Technology, Busan. His research interests include image/video processing, machine learning, deep learning, FPGA prototypes, and SoC/VLSI designs.
Bongsoon Kang received the B.S. degree in electronic engineering from Yonsei University, Seoul, South Korea, in 1985, the M.S. degree in electrical engineering from the University of Pennsylvania, Philadelphia, USA, in 1987, and the Ph.D. degree in electrical engineering from Drexel University, Philadelphia, USA, in 1990.
He was a Senior Staff Researcher at Samsung Electronics, Suwon, South Korea, from 1989 to 1999. Since 1999, he has been with the Department of Electronic Engineering, Dong-A University, Busan, South Korea, where he is a Full Professor. He is also a member with the Technical Support and Development Center for Display Device Convergence Technology, Busan. His research interests include image/video processing, machine learning, deep learning, FPGA prototypes, and SoC/VLSI designs.View more

References

References is not available for this document.