Loading [MathJax]/extensions/MathMenu.js
Self-Supervised Burst Super-Resolution | IEEE Conference Publication | IEEE Xplore

Self-Supervised Burst Super-Resolution


Abstract:

We introduce a self-supervised training strategy for burst super-resolution that only uses noisy low-resolution bursts during training. Our approach eliminates the need t...Show More

Abstract:

We introduce a self-supervised training strategy for burst super-resolution that only uses noisy low-resolution bursts during training. Our approach eliminates the need to carefully tune synthetic data simulation pipelines, which often do not match real-world image statistics. Compared to weakly-paired training strategies, which require noisy smartphone burst photos of static scenes, paired with a clean reference obtained from a tripod-mounted DSLR camera, our approach is more scalable, and avoids the color mismatch between the smartphone and DSLR. To achieve this, we propose a new self-supervised objective that uses a forward imaging model to recover a high-resolution image from aliased high frequencies in the burst. Our approach does not require any manual tuning of the forward model's parameters; we learn them from data. Furthermore, we show our training strategy is robust to dynamic scene motion in the burst, which enables training burst super-resolution models using in-the-wild data. Extensive experiments on real and synthetic data show that, despite only using noisy bursts during training, models trained with our self-supervised strategy match, and sometimes surpass, the quality of fully-supervised baselines trained with synthetic data or weakly-paired ground-truth. Finally, we show our training strategy is general using four different burst super-resolution architectures.
Date of Conference: 01-06 October 2023
Date Added to IEEE Xplore: 15 January 2024
ISBN Information:

ISSN Information:

Conference Location: Paris, France

1. Introduction

Recent RAW burst super-resolution pipelines have significantly improved the quality of modern smartphones photos [33], [8]. State-of-the-art algorithms use specialized deep learning models that learn to merge the burst frames into a single high-resolution image [3], [19], [18], [23], [24]. Training them requires paired datasets, in which each noisy burst is matched to a clean reference. Most approaches synthesize realistic bursts from the reference using carefully tuned degradation models [2], [3], [19], [24]. But because of low-level mismatches between the real and synthetically generated bursts (e.g., noise distribution, blur kernels, camera trajectories, scene motions, etc), models trained synthetically often do not generalize well to real-world inputs (Figure 1). To avoid this, other works collect weakly-paired datasets in which the reference is a high-resolution image of the same scene captured using a DSLR and a zoom lens on tripod [2], [39]. However, this capture process is tedious and time-consuming, and the resulting image pairs are often misaligned, exhibit color and detail mismatches because of the different sensors, and permit limited scene motion.

Contact IEEE to Subscribe

References

References is not available for this document.