Introduction
Low-light image enhancement is a longstanding and significant topic in computer vision, it plays a crucial role in night photography, surveillance, and medical imaging. The goal is to improve image quality under low-light conditions while preserving details and color fidelity.
However, traditional algorithms often fail to balance brightness enhancement with detail preservation, leading to overexposure and artifacts. For instance, histogram equalization can result in amplifying noise and distorting colors. Deep learning methods for low-light image enhancement have significantly advanced noise reduction and global illumination correction but still face challenges such as RGB color shifts and the loss of high-frequency details. Unsupervised methods, such as unpaired learning [1]–[3] and zero-shot approaches [4]–[8], offer greater flexibility and generalization but still struggle with robustness in detail recovery and color consistency. Retinexbased methods [9]–[12] leverage the decomposition of illumination and reflectance to enhance brightness; however, they frequently result in oversmoothing, leading to a loss of critical texture details, particularly in regions with complex patterns. Diffusion-based methods [13]–[16] represent a more recent advancement, using iterative refinement to balance noise reduction and detail enhancement. Nevertheless, these methods still struggle to achieve an optimal trade-off between recovering fine details and maintaining global consistency. The pursuit of a comprehensive solution that addresses both global illumination and fine detail preservation remains an ongoing challenge.
To address these challenges, we propose a method, Bio-Inspired Attention and Wavelet Diffusion (BIAWDiff) which integrates Retinex theory with rod cell-inspired attention and wavelet-based diffusion models. BIAWDiff includes three modules: (1) Initial Light Restoration (ILR) module: the ILR module uses Retinex theory to decompose illumination, reflectance, and noise, then enhances the decomposed illumination, effectively preserving low-frequency information and avoiding overexposure. (2) Rod Cell-Inspired Attention Refinement (RCAR) module: the RCAR module refines the image by focusing on luminance, mimicking rod cells in low-light conditions. (3) Detail Refinement (DR) module: the DR module applies wavelet-domain diffusion for high-frequency detail restoration and uses neural networks for low-frequency reconstruction, ensuring structural integrity and color consistency. As shown in in Figure 1, our BIAWDiff outperforms several SOTA approaches on paired and unpaired datasets. Overall, our contributions can be summarized as follows:
We propose an advanced low-light image enhancement framework that integrates brightness correction, attention-driven luminance refinement, and wavelet-based detail recovery, achieving enhanced visual quality and structural preservation.
To better reconstruct the lost details in low-light photography, we propose to utilize bionics to filter out secondary information and retain high-frequency information, which aids in the diffusion of high-frequency parameters in the wavelet domain, thereby generating richer scene details.
We conducted extensive experiments on both paired and unpaired low-light datasets. The experimental results show that BIAWDiff consistently outperforms other low-light enhancement methods, which fully demonstrates the effectiveness and generalization capability of our framework.
Proposed Method
BIAWDiff enhances low-light images using a three-stage network, as shown in Figure 2. The key modules are Initial Light Restoration (ILR), Rod Cell-Inspired Attention Refinement (RCAR), and Detail Refinement (DR). ILR boosts brightness, reduces noise, and preserves low-frequency details. RCAR uses transformer-based attention to enhance luminance. DR applies wavelet-domain diffusion for high-frequency denoising and neural networks for low-frequency reconstruction.
The overview of our BIAWDiff network architecture. (a) BIAWDiff begins with a brightness enhancement and noise reduction module that prepares the input image for further processing. (b) The next stage involves selective luminance refinement, where luminance adjustments improve visual coherence. (c) The final stage restores high-frequency details and structural integrity to produce a coherent and visually enhanced output. Here, · represents the Hadamard product, + denotes matrix addition, and − denotes matrix subtraction.
A. Initial Light Restoration (ILR)
We observe that images captured in low-light conditions often suffer from insufficient brightness and detail loss, primarily due to the uneven distribution of illumination and reflectance components. By effectively separating and estimating these components, image details can be better preserved, and overall quality enhanced. The ILR module enhances low-light images by estimating illumination and reflectance components and improving low-frequency information. It consists of two parts: Light Difference Estimation and Preliminary Image Restoration.
Light Difference Estimation. The ILR module first predicts the light difference ΔL, reflectance R, and illumination features FL from the input low-light image Ilow:
\begin{equation*}(\Delta {\mathbf{L}},{\mathbf{R}},{{\mathbf{F}}_{\mathbf{L}}}) = LightEst\left( {{{\mathbf{I}}_{low}}} \right)\tag{1}\end{equation*}
Preliminary Image Restoration. The predicted light difference and reflectance are used to restore the image:
\begin{equation*}{{\mathbf{I}}_{pred}} = {{\mathbf{I}}_{low}} + \Delta {\mathbf{L}}\cdot{\mathbf{R}}\tag{2}\end{equation*}
B. Rod Cell-Inspired Attention Refinement (RCAR)
Rod cells in human vision enhance brightness perception in low-light conditions by emphasizing luminance over color. Inspired by this, the RCAR module refines the preliminary image Ipred using transformer-based attention guided by illumination features FL and the grayscale map Glow.
The attention mechanism computes the query Q, key K, and value V matrices as:
\begin{equation*}{\mathbf{Q}} = {{\mathbf{W}}_Q}\cdot{{\mathbf{I}}_{pred}},{\mathbf{K}} = {{\mathbf{W}}_K}\cdot{\mathbf{M}},{\mathbf{V}} = {{\mathbf{W}}_V}\cdot{\mathbf{M}},\tag{3}\end{equation*}
\begin{equation*}{\mathbf{M}} = \sigma \left( {{{\mathbf{F}}_{\mathbf{L}}}\cdot{{\mathbf{G}}_{low}} + {{\mathbf{b}}_m}} \right),\tag{4}\end{equation*}
Attention scores A are computed using scaled dot-product attention:
\begin{equation*}{\mathbf{A}} = \operatorname{softmax} \left( {\frac{{{\mathbf{Q}}\cdot{{\mathbf{K}}^ \top } + \lambda {{\mathbf{G}}_{{\text{low }}}}\cdot{\mathbf{H}}}}{{\sqrt {{d_k}} }}} \right),\tag{5}\end{equation*}
\begin{equation*}{{\mathbf{I}}_{RCAR}} = {{\mathbf{I}}_{{\text{pred }}}} + \frac{{{\mathbf{A}}\cdot{\mathbf{V}} + \eta \left( {{{\mathbf{I}}_{{\text{pred }}}}\cdot{{\mathbf{W}}_Z} + {{\mathbf{b}}_z}} \right)}}{{1 + {e^{ - \left( {{{\mathbf{w}}_\gamma }\cdot{{\mathbf{F}}_{\mathbf{L}}} + {{\mathbf{b}}_\gamma }} \right)}}}},\tag{6}\end{equation*}
C. Detail Refinement (DR)
High-frequency details are crucial for preserving sharpness and fine textures, which are essential for visual clarity and overall image quality. The DR module enhances these details and reduces noise using the wavelet domain.
Wavelet Transform. The RCAR output IRCAR is decomposed into low-frequency A and high-frequency components DH, DV , and DD using 2D discrete wavelet transform (2D-DWT):
\begin{equation*}A,{D_H},{D_V},{D_D} = {\text{ }}2D - DWT\left( {{{\mathbf{I}}_{RCAR}}} \right),\tag{7}\end{equation*}
Visual comparisons of different methods on LOL-v1, LOL-v2-real and LOL-v2-synthetic datasets.
Diffusion Process. The diffusion process is applied to the diagonal high-frequency component DD for denoising and detail enhancement:
\begin{align*} & {\hat D_{{D_{t + 1}}}} = \sqrt {1 - {\alpha _t}} \cdot{\hat D_{{D_t}}} + \sqrt {{\alpha _t}} \cdot{\varepsilon _t},\tag{8} \\ & {\hat D_{{D_{t - 1}}}} = \frac{1}{{\sqrt {1 - {\alpha _t}} }}\left( {{{\hat D}_{{D_t}}} - \frac{{{\alpha _t}}}{{\sqrt {1 - {{\bar \alpha }_t}} }}\cdot{\varepsilon _\theta }\left( {{{\hat D}_{{D_t}}},{D_D},t} \right)} \right),\tag{9}\end{align*}
Structural Information Reconstruction. The low-frequency component A and high-frequency components DH and DV are processed by the Structural Information Reconstruction (SIR) module:
\begin{equation*}{A^{\prime}},D_H^{\prime},D_V^{\prime} = \operatorname{SIR} \left( {A,{D_H},{D_V}} \right),\tag{10}\end{equation*}
Inverse Wavelet Transform. The reconstructed components are converted back to the spatial domain using 2D inverse discrete wavelet transform (2D-IDWT):
\begin{equation*}{{\mathbf{I}}_{DR}} = 2{\text{D}} - \operatorname{IDWT} \left( {{A^{\prime}},D_H^{\prime},D_V^{\prime},D_D^{\prime}} \right)\tag{11}\end{equation*}
Loss Function. The loss function ℒfreq is a combination of high-frequency loss and low-frequency regularization:
\begin{equation*}\begin{array}{l} {{\mathcal{L}_{{\text{freq }}}} = \left\| {\mathcal{H}\left( {{{\mathbf{I}}_{DR}} - {{\mathbf{I}}_{RCAR}}} \right) - \mathcal{H}\left( {{{\mathbf{I}}_{GT}} - {{\mathbf{I}}_{RCAR}}} \right)} \right\|_2^2} \\ { + \lambda \cdot\left\| {\mathcal{L}\left( {{{\mathbf{I}}_{GT}} - {{\mathbf{I}}_{RCAR}}} \right)} \right\|_2^2,} \end{array}\tag{12}\end{equation*}
Experiment
A. Implementation Details
Framework Development. Our framework was developed using PyTorch and trained on an NVIDIA RTX 3090 GPU. The training process involved a batch size of 7, and the model was trained for a total of 3000 epochs. For optimization, we employed the Adam optimizer with an initial learning rate set to 1 × 10−4, which was gradually reduced by a factor of 0.8 every 50 epochs. Additionally, we applied Exponential Moving Average (EMA) with a decay rate of 0.9999 to stabilize training. The training procedure utilized a time step of 200, with implicit sampling conducted every 10 steps.
Benchmark Settings. The datasets used in our experiments include: LOL-v1, LOL-v2-real, LOL-v2-synthetic, and five commonly used real-world unpaired benchmarks, including DICM, LIME, NPE, MEF, and VV.
B. Quantitative Results
We quantitatively compare the proposed method with a wide range of SOTA enhancement algorithms. Table I, Table II, and Table III show that BIAWDiff achieves competitive results on the paired datasets. We also evaluated BIAWDiff on unpaired datasets. As shown in Table IV, BIAWDiff consistently ranks among the top in NIQE, BRISQUE, and PI, demonstrating its robustness and effectiveness in low-light enhancement.
C. Qualitative Results
In this section, we provide a comprehensive qualitative analysis of BIAWDiff in comparison with several state-of-the-art enhancement techniques. As shown in Figures 3 and 4, our method demonstrates superior visual quality, with higher contrast, more precise details, and improved color consistency. The enhanced brightness in our results further emphasizes the clarity of the output. Notably, in areas with complex textures, BIAWDiff produces significantly fewer visual artifacts compared to other methods, showcasing its robustness and effectiveness across various datasets.
D. Ablation Study
Ablation studies on LOL-v1, LOL-v2-real, and LOL-v2-synthetic datasets confirm the effectiveness of each component in our framework. Table V shows that removing any component (ILR, RCAR, DR, or Diffusion) significantly reduces performance, with the full model consistently achieving the best PSNR, SSIM, and LPIPS values. Additionally, Table VI demonstrates that applying diffusion to the diagonal coefficient (cD) yields the best results, highlighting its importance in image enhancement.
Conclusion
We introduce BIAWDiff, a low-light enhancement method integrating Retinex theory, rod cell-inspired attention, and wavelet-based diffusion. Comprising three modules for light restoration, luminance refinement, and detail recovery, BIAWDiff achieves superior results Input RetinexNet CIDNet-wP DiffLL Ours in brightness, noise reduction, and detail preservation, as confirmed by experimental benchmarks. Future work will focus on refining diffusion processes and attention mechanisms to further enhance performance.