Conferences >ICASSP 2025 - 2025 IEEE Inter...

BIAWDiff: Enhancing Low-Light Images with Bio-Inspired Attention and Wavelet Diffusion

Abstract:

Low-light image enhancement aims to improve visual quality under challenging lighting conditions while preserving details and color fidelity. Existing traditional algorit...Show More

Metadata

Abstract:

Low-light image enhancement aims to improve visual quality under challenging lighting conditions while preserving details and color fidelity. Existing traditional algorithms and deep learning approaches, often struggle with balancing brightness enhancement and detail preservation, leading to issues such as overexposure, artifacts, and loss of high-frequency details. To address these challenges, we propose a novel method, Bio-Inspired Attention and Wavelet Diffusion (BIAWDiff), that integrates Retinex theory with bio-inspired attention and wavelet-based diffusion models to enhance low-light Images. BIAWDiff consists of three key modules: the Initial Light Restoration (ILR) module for brightness enhancement and noise reduction, the Rod Cell-Inspired Attention Refinement (RCAR) module for luminance refinement, and the Detail Refinement (DR) module for restoring high-frequency details. Experimental results demonstrate that BIAWDiff outperforms existing techniques, achieving superior results in brightness enhancement, noise reduction, and detail preservation, with an average PSNR increase of 5.1% and SSIM improvement of 3.2% on paired datasets, and a reduction in NIQE and BRISQUE by 7.4% and 8.6% on unpaired datasets.

Published in: ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Date of Conference: 06-11 April 2025

Date Added to IEEE Xplore: 07 March 2025

ISBN Information:

ISSN Information:

DOI: 10.1109/ICASSP49660.2025.10888793

Conference Location: Hyderabad, India

Contents

SECTION I.

Introduction

Low-light image enhancement is a longstanding and significant topic in computer vision, it plays a crucial role in night photography, surveillance, and medical imaging. The goal is to improve image quality under low-light conditions while preserving details and color fidelity.

However, traditional algorithms often fail to balance brightness enhancement with detail preservation, leading to overexposure and artifacts. For instance, histogram equalization can result in amplifying noise and distorting colors. Deep learning methods for low-light image enhancement have significantly advanced noise reduction and global illumination correction but still face challenges such as RGB color shifts and the loss of high-frequency details. Unsupervised methods, such as unpaired learning [1]–[3] and zero-shot approaches [4]–[8], offer greater flexibility and generalization but still struggle with robustness in detail recovery and color consistency. Retinexbased methods [9]–[12] leverage the decomposition of illumination and reflectance to enhance brightness; however, they frequently result in oversmoothing, leading to a loss of critical texture details, particularly in regions with complex patterns. Diffusion-based methods [13]–[16] represent a more recent advancement, using iterative refinement to balance noise reduction and detail enhancement. Nevertheless, these methods still struggle to achieve an optimal trade-off between recovering fine details and maintaining global consistency. The pursuit of a comprehensive solution that addresses both global illumination and fine detail preservation remains an ongoing challenge.

To address these challenges, we propose a method, Bio-Inspired Attention and Wavelet Diffusion (BIAWDiff) which integrates Retinex theory with rod cell-inspired attention and wavelet-based diffusion models. BIAWDiff includes three modules: (1) Initial Light Restoration (ILR) module: the ILR module uses Retinex theory to decompose illumination, reflectance, and noise, then enhances the decomposed illumination, effectively preserving low-frequency information and avoiding overexposure. (2) Rod Cell-Inspired Attention Refinement (RCAR) module: the RCAR module refines the image by focusing on luminance, mimicking rod cells in low-light conditions. (3) Detail Refinement (DR) module: the DR module applies wavelet-domain diffusion for high-frequency detail restoration and uses neural networks for low-frequency reconstruction, ensuring structural integrity and color consistency. As shown in in Figure 1, our BIAWDiff outperforms several SOTA approaches on paired and unpaired datasets. Overall, our contributions can be summarized as follows:

We propose an advanced low-light image enhancement framework that integrates brightness correction, attention-driven luminance refinement, and wavelet-based detail recovery, achieving enhanced visual quality and structural preservation.
To better reconstruct the lost details in low-light photography, we propose to utilize bionics to filter out secondary information and retain high-frequency information, which aids in the diffusion of high-frequency parameters in the wavelet domain, thereby generating richer scene details.
We conducted extensive experiments on both paired and unpaired low-light datasets. The experimental results show that BIAWDiff consistently outperforms other low-light enhancement methods, which fully demonstrates the effectiveness and generalization capability of our framework.

Fig. 1.

Our method demonstrates excellent performance across paired and unpaired datasets.

Show All

SECTION II.

Proposed Method

BIAWDiff enhances low-light images using a three-stage network, as shown in Figure 2. The key modules are Initial Light Restoration (ILR), Rod Cell-Inspired Attention Refinement (RCAR), and Detail Refinement (DR). ILR boosts brightness, reduces noise, and preserves low-frequency details. RCAR uses transformer-based attention to enhance luminance. DR applies wavelet-domain diffusion for high-frequency denoising and neural networks for low-frequency reconstruction.

Fig. 2.

The overview of our BIAWDiff network architecture. (a) BIAWDiff begins with a brightness enhancement and noise reduction module that prepares the input image for further processing. (b) The next stage involves selective luminance refinement, where luminance adjustments improve visual coherence. (c) The final stage restores high-frequency details and structural integrity to produce a coherent and visually enhanced output. Here, · represents the Hadamard product, + denotes matrix addition, and − denotes matrix subtraction.

Show All

A. Initial Light Restoration (ILR)

We observe that images captured in low-light conditions often suffer from insufficient brightness and detail loss, primarily due to the uneven distribution of illumination and reflectance components. By effectively separating and estimating these components, image details can be better preserved, and overall quality enhanced. The ILR module enhances low-light images by estimating illumination and reflectance components and improving low-frequency information. It consists of two parts: Light Difference Estimation and Preliminary Image Restoration.

Light Difference Estimation. The ILR module first predicts the light difference ΔL, reflectance R, and illumination features F_L from the input low-light image I_low:

$\begin{equation*}(\Delta {\mathbf{L}},{\mathbf{R}},{{\mathbf{F}}_{\mathbf{L}}}) = LightEst\left( {{{\mathbf{I}}_{low}}} \right)\tag{1}\end{equation*}$ View Source

Preliminary Image Restoration. The predicted light difference and reflectance are used to restore the image:

$\begin{equation*}{{\mathbf{I}}_{pred}} = {{\mathbf{I}}_{low}} + \Delta {\mathbf{L}}\cdot{\mathbf{R}}\tag{2}\end{equation*}$ View Source

B. Rod Cell-Inspired Attention Refinement (RCAR)

Rod cells in human vision enhance brightness perception in low-light conditions by emphasizing luminance over color. Inspired by this, the RCAR module refines the preliminary image I_pred using transformer-based attention guided by illumination features F_L and the grayscale map G_low.

The attention mechanism computes the query Q, key K, and value V matrices as:

$\begin{equation*}{\mathbf{Q}} = {{\mathbf{W}}_Q}\cdot{{\mathbf{I}}_{pred}},{\mathbf{K}} = {{\mathbf{W}}_K}\cdot{\mathbf{M}},{\mathbf{V}} = {{\mathbf{W}}_V}\cdot{\mathbf{M}},\tag{3}\end{equation*}$ View Source

where W_Q, W_K, and W_V are learnable matrices. The modulated feature map M is:

$\begin{equation*}{\mathbf{M}} = \sigma \left( {{{\mathbf{F}}_{\mathbf{L}}}\cdot{{\mathbf{G}}_{low}} + {{\mathbf{b}}_m}} \right),\tag{4}\end{equation*}$

View Source

where σ is the activation function and b_m is a bias term.

Attention scores A are computed using scaled dot-product attention:

$\begin{equation*}{\mathbf{A}} = \operatorname{softmax} \left( {\frac{{{\mathbf{Q}}\cdot{{\mathbf{K}}^ \top } + \lambda {{\mathbf{G}}_{{\text{low }}}}\cdot{\mathbf{H}}}}{{\sqrt {{d_k}} }}} \right),\tag{5}\end{equation*}$ View Source

where d_k is the key vector dimension, λ is a scaling factor. Finally, the enhanced image I_RCAR is computed as:

$\begin{equation*}{{\mathbf{I}}_{RCAR}} = {{\mathbf{I}}_{{\text{pred }}}} + \frac{{{\mathbf{A}}\cdot{\mathbf{V}} + \eta \left( {{{\mathbf{I}}_{{\text{pred }}}}\cdot{{\mathbf{W}}_Z} + {{\mathbf{b}}_z}} \right)}}{{1 + {e^{ - \left( {{{\mathbf{w}}_\gamma }\cdot{{\mathbf{F}}_{\mathbf{L}}} + {{\mathbf{b}}_\gamma }} \right)}}}},\tag{6}\end{equation*}$

View Source

where η is a scaling factor, W_Z and W_γ are learnable weight matrices, and b_z and b_γ are bias terms.

C. Detail Refinement (DR)

High-frequency details are crucial for preserving sharpness and fine textures, which are essential for visual clarity and overall image quality. The DR module enhances these details and reduces noise using the wavelet domain.

Wavelet Transform. The RCAR output I_RCAR is decomposed into low-frequency A and high-frequency components D_H, D_V , and D_D using 2D discrete wavelet transform (2D-DWT):

$\begin{equation*}A,{D_H},{D_V},{D_D} = {\text{ }}2D - DWT\left( {{{\mathbf{I}}_{RCAR}}} \right),\tag{7}\end{equation*}$ View Source

TABLE I Quantitative comparison on LOL-v1 dataset.

TABLE II Quantitative comparison on LOL-v2-real dataset.

TABLE III Quantitative comparison on LOL-v2-synthetic dataset.

Fig. 3.

Visual comparisons of different methods on LOL-v1, LOL-v2-real and LOL-v2-synthetic datasets.

Show All

Diffusion Process. The diffusion process is applied to the diagonal high-frequency component D_D for denoising and detail enhancement:

$\begin{align*} & {\hat D_{{D_{t + 1}}}} = \sqrt {1 - {\alpha _t}} \cdot{\hat D_{{D_t}}} + \sqrt {{\alpha _t}} \cdot{\varepsilon _t},\tag{8} \\ & {\hat D_{{D_{t - 1}}}} = \frac{1}{{\sqrt {1 - {\alpha _t}} }}\left( {{{\hat D}_{{D_t}}} - \frac{{{\alpha _t}}}{{\sqrt {1 - {{\bar \alpha }_t}} }}\cdot{\varepsilon _\theta }\left( {{{\hat D}_{{D_t}}},{D_D},t} \right)} \right),\tag{9}\end{align*}$ View Source

where

${\hat D_{{D_t}}}$

is the estimated high-frequency component at time step t, and α_t, ϵ_t, and ϵ_θ are noise-related terms.

Structural Information Reconstruction. The low-frequency component A and high-frequency components D_H and D_V are processed by the Structural Information Reconstruction (SIR) module:

$\begin{equation*}{A^{\prime}},D_H^{\prime},D_V^{\prime} = \operatorname{SIR} \left( {A,{D_H},{D_V}} \right),\tag{10}\end{equation*}$ View Source

Inverse Wavelet Transform. The reconstructed components are converted back to the spatial domain using 2D inverse discrete wavelet transform (2D-IDWT):

$\begin{equation*}{{\mathbf{I}}_{DR}} = 2{\text{D}} - \operatorname{IDWT} \left( {{A^{\prime}},D_H^{\prime},D_V^{\prime},D_D^{\prime}} \right)\tag{11}\end{equation*}$ View Source

Loss Function. The loss function ℒ_freq is a combination of high-frequency loss and low-frequency regularization:

$\begin{equation*}\begin{array}{l} {{\mathcal{L}_{{\text{freq }}}} = \left\| {\mathcal{H}\left( {{{\mathbf{I}}_{DR}} - {{\mathbf{I}}_{RCAR}}} \right) - \mathcal{H}\left( {{{\mathbf{I}}_{GT}} - {{\mathbf{I}}_{RCAR}}} \right)} \right\|_2^2} \\ { + \lambda \cdot\left\| {\mathcal{L}\left( {{{\mathbf{I}}_{GT}} - {{\mathbf{I}}_{RCAR}}} \right)} \right\|_2^2,} \end{array}\tag{12}\end{equation*}$ View Source

where ℋ(•) and ℒ(•) denote high- and low-frequency components, respectively, I_GT is the ground truth, and λ is a balancing factor.

SECTION III.

Experiment

A. Implementation Details

Framework Development. Our framework was developed using PyTorch and trained on an NVIDIA RTX 3090 GPU. The training process involved a batch size of 7, and the model was trained for a total of 3000 epochs. For optimization, we employed the Adam optimizer with an initial learning rate set to 1 × 10⁻⁴, which was gradually reduced by a factor of 0.8 every 50 epochs. Additionally, we applied Exponential Moving Average (EMA) with a decay rate of 0.9999 to stabilize training. The training procedure utilized a time step of 200, with implicit sampling conducted every 10 steps.

Benchmark Settings. The datasets used in our experiments include: LOL-v1, LOL-v2-real, LOL-v2-synthetic, and five commonly used real-world unpaired benchmarks, including DICM, LIME, NPE, MEF, and VV.

TABLE IV Quantitative comparison on unpaired datasets.

B. Quantitative Results

We quantitatively compare the proposed method with a wide range of SOTA enhancement algorithms. Table I, Table II, and Table III show that BIAWDiff achieves competitive results on the paired datasets. We also evaluated BIAWDiff on unpaired datasets. As shown in Table IV, BIAWDiff consistently ranks among the top in NIQE, BRISQUE, and PI, demonstrating its robustness and effectiveness in low-light enhancement.

C. Qualitative Results

In this section, we provide a comprehensive qualitative analysis of BIAWDiff in comparison with several state-of-the-art enhancement techniques. As shown in Figures 3 and 4, our method demonstrates superior visual quality, with higher contrast, more precise details, and improved color consistency. The enhanced brightness in our results further emphasizes the clarity of the output. Notably, in areas with complex textures, BIAWDiff produces significantly fewer visual artifacts compared to other methods, showcasing its robustness and effectiveness across various datasets.

Fig. 4.

Visual comparisons of different methods on LIME, MEF and DICM datasets.

Show All

D. Ablation Study

Ablation studies on LOL-v1, LOL-v2-real, and LOL-v2-synthetic datasets confirm the effectiveness of each component in our framework. Table V shows that removing any component (ILR, RCAR, DR, or Diffusion) significantly reduces performance, with the full model consistently achieving the best PSNR, SSIM, and LPIPS values. Additionally, Table VI demonstrates that applying diffusion to the diagonal coefficient (c_D) yields the best results, highlighting its importance in image enhancement.

TABLE V Ablation Study on Component Effectiveness.

TABLE VI Ablation Study on Diffusion Across Different Wavelet Coefficients.

SECTION IV.

Conclusion

We introduce BIAWDiff, a low-light enhancement method integrating Retinex theory, rod cell-inspired attention, and wavelet-based diffusion. Comprising three modules for light restoration, luminance refinement, and detail recovery, BIAWDiff achieves superior results Input RetinexNet CIDNet-wP DiffLL Ours in brightness, noise reduction, and detail preservation, as confirmed by experimental benchmarks. Future work will focus on refining diffusion processes and attention mechanisms to further enhance performance.

References is not available for this document.

BIAWDiff: Enhancing Low-Light Images with Bio-Inspired Attention and Wavelet Diffusion

Abstract:

Metadata

Abstract:

ISSN Information:

Introduction