Journals & Magazines >IEEE Access >Volume: 12

MSRFSR: Multi-Stage Refining Face Super-Resolution With Iterative Collaboration Between Face Recovery and Landmark Estimation

Multi-stage Refining Face Super-resolution with Iterative Collaboration between Face Recovery and Landmark Estimation (MSRFSR)

Abstract:

Face Super-resolution (FSR) models encounter a significant challenge related to extremely low-dimensional (

$16\times 16$ pixels) and degraded input images. This deficie...Show More

Metadata

Abstract:

Face Super-resolution (FSR) models encounter a significant challenge related to extremely low-dimensional (

$16\times 16$ pixels) and degraded input images. This deficiency in crucial facial details within the low-level and intermediate levels of the FSR model presents obstacles in tasks such as face alignment, landmark detection, and consequently, difficulty in recovering high-frequency details, resulting in unfaithful and unrealistic super-resolved face images. This research proposes an innovative FSR model with strategically designed multi-attention techniques to enhance facial attribute recovery capabilities. The model incorporates a Non-local Module (NL) and residual pixel attention technique at the low-level stage of the FSR model. Simultaneously, a Spatial Feature Transfer (SFT) module refines mid-level features by leveraging spatial information through an iterative interaction process between an attentive module and a landmark estimation network. By strategically utilizing these modules under an iterative collaboration framework, our method effectively addresses challenges in facial detail recovery, demonstrating enhanced model understanding and refined representation. The proposed model is rigorously examined on CelebA, Helen, AFLW2000, and WFLW datasets at scale factors of

$\times 8$ and

$\times 16$ . The results consistently demonstrate the superiority of our proposed Multi-Stage Refining Face Super-Resolution (MSRFSR) model over state-of-the-art methods through extensive quantitative and qualitative experiments on four datasets and both scales.

Multi-stage Refining Face Super-resolution with Iterative Collaboration between Face Recovery and Landmark Estimation (MSRFSR)

Published in: IEEE Access ( Volume: 12)

Page(s): 56951 - 56972

Date of Publication: 16 April 2024

Electronic ISSN: 2169-3536

DOI: 10.1109/ACCESS.2024.3389702

Funding Agency:

Contents

CCBY - IEEE is not the copyright holder of this material. Please follow the instructions via https://creativecommons.org/licenses/by/4.0/ to obtain full-text articles and stipulations in the API documentation.

SECTION I.

Introduction

Face super-resolution (FSR) has attracted more attention and has been used in various image-based applications. FSR, also known as face hallucination, seeks to reconstruct a high-resolution (HR) face image from a low-resolution (LR) input. Due to the constraints in acquiring high-quality images and the influence of imaging conditions, face images are almost captured in poor perceptual quality in real-world scenarios. The low-quality face image issue negatively affected the performance of face-image-based applications such as face detection [1] and face recognition [2]. FSR is a particular instance of the single image super-resolution (SISR) technique [3], and it is considered an ill-posed problem because of the ambiguity in reconstructing face images’ pixels. The FSR model is designed to capture the unique characteristics of facial attributes and optimizes the recovery of face attribute detail. On the other hand, the SISR model focuses on enhancing a wide variety of image content without considering the features of face attributes [4]. Therefore, the face configuration is considered a highly structured object and utilized as confident prior knowledge for reconstructing the global face structure. According to the reconstructed global face attributes, the local face information is also recovered [5]. To utilize the facial prior guide approach in the FSR model, different techniques such as face parsing maps, landmarks heatmaps, Spatial attention maps, and the three-dimensional facial guide [6], [7], [8], [9], [10], [11] are used in the face attributes (eyes, lips, nose, and eyebrows) recovering process. Due to the prior-guide technique utilization in FSR models, they outperform at large scales ( $\times 8$ and more) compared to the SISR models.

Although the facial prior guide approach models outperform in recovering global face information compared to the non-prior guide approach, they suffer from inaccurate or even wrong prior guide information due to the lack of sharp detail in the input face images. A noticeable challenge encountered in FR models pertains to the inadequate quality of the shallow dimensions of input LR face image (often at sizes of $16\times 16$ or $8\times 8$ pixels). This issue manifests as a deficiency in valuable facial details within the FSR model’s intermediate levels (mid-level). The constrained information in such LR images poses a significant hurdle, impacting the model’s ability to effectively face alignment, landmark detection, and recover facial features. Additionally, the LR face images may contain some artifacts, including a lack of sharpness and any other degradation, making it difficult to recover the human face’s global and local details in a super-resolved image. Unfaithful and unrealistic super-resolved face images in different face poses are the adverse impact of this problem.

To improve the capability of the FSR model in recovering more accurate global information from poor-quality input images, Ma et al. [5] proposed a face hallucination model comprising two recurrent networks. These networks are designed for iterative operation and improve the performance of facial component recovery and landmark detection tasks. Furthermore, to improve the restoration of local details, they enhanced the landmark information guidance utilizing an attentive module that aggregates the individual face attributes meticulously.

Although utilizing an iterative collaboration approach between the attentive module and estimated facial landmarks enhanced the performance of the FSR model to recover more global and local detail compared to the other prior guide approaches, the obstacles of poor-quality input images and consequently lack of recovering high-frequency details at mid-level of FSR model remain.

To enhance the mid-level information recovery capabilities of the FSR model and achieve a more realistic texture for face images, the viable approach involves leveraging the spatial information of features. This goal is effectively addressed by incorporating the Spatial Feature Transfer [12] (SFT). Nevertheless, the sub-optimal quality of the extracted low-level features from unclear and degraded LR input hinders the SFT module’s efficient recovery of intricate facial details. To mitigate the degradation effects associated with the low-level extracted features, a feasible solution is to employ a non-local [13] module (NL) in conjunction with the residual channel attention [14] technique. Integrating the non-local module captures long-range dependencies within the features, promoting a more comprehensive understanding of facial structures. Simultaneously, the residual channel attention technique enhances the focus on critical facial details by selectively emphasizing informative channels. This combined approach addresses the shortcomings in the quality of low-level features, fostering a more refined and context-aware representation. By incorporating the non-local module and residual channel attention technique, we aim to significantly enhance the efficacy of the SFT module in recovering facial details, ultimately contributing to improving the capability of the attentive module to boost the guidance of landmarks and the overall improvement of our FSR model on large up-sampling scales ( $\times 8$ and $\times 16$ ).

Our research presents a novel approach to FSR by employing multi-stage refinement techniques that harness the potential of an iterative collaboration process between the recovery and landmark estimation networks. This innovative framework enhances the model’s capability to recover higher fidelity and more detailed facial attributes compared to baseline models. We introduce three key contributions to our FSR model:

Propose an NL module at the early stage of our FSR network to reduce the noise effects of low-resolution face images and produce an enhanced feature representation. This module effectively addresses the noise degradation effect in low-quality inputs, improving feature quality.
Employs a residual pixel attention module on low-resolution features at the early stage of the network to capture the inter-channel relationships in low-resolution feature maps and emphasizes the importance of specific channels, enhancing the model’s ability to capture intricate facial details.
Develop an SFT module involving an affine transformation that spatially adjusts the features according to facial characteristics derived from facial heatmaps. The SFT applies at the mid-level of our proposed model before the upsampling layer and improves the effectiveness of the upscaling process by ensuring relevant facial details are brought into focus.

The illustration in Figure 1 showcases the effectiveness of our multi-stage refinement process in producing higher fidelity and more detailed face images. In Figure 1, Sample (a) demonstrates a fidelity comparison of the generated SR image with closed eyes, highlighting its superior performance to the DIC [5] model. Sample 2 further illustrates the model’s capability to recover additional facial details, surpassing the performance of the DIC [5] model.

$FIGURE 1. - Visual Comparison: Our MSRFSR model compared to DIC [5] on CelebA [15] and Helen [16] datasets at a scale factor of $\times 8$ reveals distinctive performance in fidelity and yields more detailed results.$

FIGURE 1.

Visual Comparison: Our MSRFSR model compared to DIC [5] on CelebA [15] and Helen [16] datasets at a scale factor of $\times 8$ reveals distinctive performance in fidelity and yields more detailed results.

MSRFSR: Multi-Stage Refining Face Super-Resolution With Iterative Collaboration Between Face Recovery and Landmark Estimation

Alerts

Abstract:

Metadata

Abstract:

Funding Agency:

Introduction

Related Works

A. Conventional Face Super-Resolution

B. Deep Learning Based Face Super-Resolution

Methodology

A. Multi-Stage Refinement Network

B. Face Recovery Network

C. Face Alignment Network

Experiment and Discussions

A. Implementation Settings

B. Investigation of Different Iterative Steps in Training

C. Visual Comparisons of Face Recovery in Different Iteration Steps

D. Iterative Collaboration in Landmark Estimation

E. Ablation Study

F. Visual Comparisons of Contributing Different Attention Modules

G. Comparison With Other Methods

H. Comparison of Network Complexity and Performance

I. User Study

J. Quantitative and Qualitative Comparison on AFlW2000 and WFLW Datasets

Discussion and Future Work

Conclusion

ACKNOWLEDGMENT

References

IEEE Account

Purchase Details

Profile Information

Need Help?