Processing math: 100%
Dream360: Diverse and Immersive Outdoor Virtual Scene Creation via Transformer-Based 360° Image Outpainting | IEEE Journals & Magazine | IEEE Xplore

Dream360: Diverse and Immersive Outdoor Virtual Scene Creation via Transformer-Based 360° Image Outpainting


Abstract:

360° images, with a field-of-view (FoV) of 180^{\circ}\times 360^{\circ}, provide immersive and realistic environments for emerging virtual reality (VR) applications, s...Show More

Abstract:

360° images, with a field-of-view (FoV) of 180^{\circ}\times 360^{\circ}, provide immersive and realistic environments for emerging virtual reality (VR) applications, such as virtual tourism, where users desire to create diverse panoramic scenes from a narrow FoV photo they take from a viewpoint via portable devices. It thus brings us to a technical challenge: ‘How to allow the users to freely create diverse and immersive virtual scenes from a narrow FoV image with a specified viewport?’ To this end, we propose a transformer-based 360° image outpainting framework called Dream360, which can generate diverse, high-fidelity, and high-resolution panoramas from user-selected viewports, considering the spherical properties of 360° images. Compared with existing methods, e.g., [3], which primarily focus on inputs with rectangular masks and central locations while overlooking the spherical property of 360° images, our Dream360 offers higher outpainting flexibility and fidelity based on the spherical representation. Dream360 comprises two key learning stages: (I) codebook-based panorama outpainting via Spherical-VQGAN (S-VQGAN), and (II) frequency-aware refinement with a novel frequency-aware consistency loss. Specifically, S-VQGAN learns a sphere-specific codebook from spherical harmonic (SH) values, providing a better representation of spherical data distribution for scene modeling. The frequency-aware refinement matches the resolution and further improves the semantic consistency and visual fidelity of the generated results. Our Dream360 achieves significantly lower Frechet Inception Distance (FID) scores and better visual fidelity than existing methods. We also conducted a user study involving 15 participants to interactively evaluate the quality of the generated results in VR, demonstrating the flexibility and superiority of our Dream360 framework.
Page(s): 2734 - 2744
Date of Publication: 05 March 2024

ISSN Information:

PubMed ID: 38437117

Funding Agency:

References is not available for this document.

1 Introduction

The ability to observe complete surroundings in one shot and choose any viewport for immersive experience (see Fig. 1(a)) has led to the growing popularity of 360° images (a.k.a. panoramas)

With equirectangular projection (ERP) images by default.

flourishing virtual reality (VR) applications [4], [5], [30], [35], [36], [52], [58], such as virtual tourism. In this case, users wish to create diverse virtual panoramic scenes from a narrow FoV (NFoV) photo they take from a viewpoint via portable devices (e.g., smartphones). This way, it offers the users personalized contents and enables them to perceive and interact with virtual environments. This motivates us to study diverse high-fidelity and high-resolution panorama generation while users provide NFoV images at desired viewpoints.

Select All
1.
H. Ai, Z. Cao, Y.-P. Cao, Y. Shan and L. Wang, "Hrdfuse: Monocular 360° depth estimation by collaboratively learning holistic-with-regional depth distributions", 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 13273-13282, 2023.
2.
N. Akimoto, S. Kasai, M. Hayashi and Y. Aoki, "360-degree image completion by two-stage conditional gans", 2019 IEEE International Conference on Image Processing (ICIP), pp. 4704-4708, 2019.
3.
N. Akimoto, Y. Matsuo and Y. Aoki, "Diverse plausible 360-degree image outpainting for efficient 3dcg background creation", 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11431-11440, 2022.
4.
J. Ardouin, A. Lécuyer, M. Marchal and E. Marchand, "Stereoscopic rendering of virtual environments with wide field-of-views up to 360°", 2014 IEEE Virtual Reality (VR), pp. 3-8, 2014.
5.
N. Arora, M. Suomalainen, M. Pouke, E. G. Center, K. J. Mimnaugh, A. P. Chambers, et al., "Augmenting immersive telepresence experience with a virtual body", IEEE Transactions on Visualization and Computer Graphics, vol. 28, pp. 2135-2145, 2022.
6.
M. Cai, H. Zhang, H. Huang, Q. Geng, Y. Li and G. Huang, "Frequency domain image translation: More photo-realistic better identity-preserving", Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 13930-13940, 2021.
7.
M. Cao, C. Mou, F. Yu, X. Wang, Y. Zheng, J. Zhang, et al., "Ntire 2023 challenge on 360° omnidirectional image and video super-resolution: Datasets methods and results", 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 1731-1745, 2023.
8.
H. Chang, H. Zhang, L. Jiang, C. Liu and W. T. Freeman, "Maskgit: Masked generative image transformer", 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11305-11315, 2022.
9.
Y. Chen and X. Wang, "Transformers as meta-learners for implicit neural representations", European Conference on Computer Vision, 2022.
10.
Z. Chen, G. Wang and Z. Liu, "Text2light: Zero-shot text-driven hdr panorama generation", ACM Trans. Graph., vol. 41, pp. 195:1-195:16, 2022.
11.
Y.-C. Cheng, C. H. Lin, H.-Y. Lee, J. Ren, S. Tulyakov and M.-H. Yang, "Inout: Diverse image outpainting via gan inversion", 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11421-11430, 2022.
12.
M. Eder, M. Shvets, J. Lim and J.-M. Frahm, "Tangent images for mitigating spherical distortion", 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 12423-12431, 2019.
13.
A. A. Efros and T. Leung, "Texture synthesis by non-parametric sampling", international conference on computer vision, 1999.
14.
P. Esser, R. Rombach and B. Ommer, "Taming transformers for high-resolution image synthesis", 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 12868-12878, 2021.
15.
F. B. Fuchs, D. E. Worrall, V. Fischer and M. Welling, "Se(3)-transformers: 3d roto-translation equivariant attention networks", Advances in Neural Information Processing Systems 34 (NeurIPS), 2020.
16.
D. Fuoli, L. V. Gool and R. Timofte, "Fourier space losses for efficient perceptual image super-resolution", 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 2340-2349, 2021.
17.
R. Gal, D. C. Hochberg, A. Bermano and D. Cohen-Or, "Swagan: A style-based wavelet-driven generative model", ACM Transactions on Graphics (TOG), vol. 40, no. 4, pp. 1-11, 2021.
18.
I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, et al., "Generative adversarial nets", NIPS, 2014.
19.
S. W. Han and D. Y. Suh, "Piinet: A 360-degree panoramic image inpainting network using a cube map", ArXiv, 2020.
20.
T. Hara, Y. Mukuta and T. Harada, "Spherical image generation from a single image by considering scene symmetry", AAAI, 2021.
21.
M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler and S. Hochreiter, "Gans trained by a two time-scale update rule converge to a local nash equilibrium", NIPS, 2017.
22.
S. Iketani, M. Sato and M. Imura, "Augmented reality image generation with optical consistency using generative adversarial networks", 2020 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops (VRW), pp. 615-616, 2020.
23.
P. Isola, J.-Y. Zhu, T. Zhou and A. A. Efros, "Image-to-image translation with conditional adversarial networks", 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5967-5976, 2016.
24.
L. Jiang, B. Dai, W. Wu and C. C. Loy, "Focal frequency loss for image reconstruction and synthesis", 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 13899-13909, 2021.
25.
X. Jin, P. Deng, X. Li, K. Zhang, X. Li, Q. Zhou, et al., "Sun-sky model estimation from outdoor images", Journal of Ambient Intelligence and Humanized Computing, vol. 13, pp. 5151-5162, 2020.
26.
S. Jung and M. Keuper, "Spectral distribution aware image generation", Proceedings of the AAAI conference on artificial intelligence, no. 2, pp. 1734-1742, 2021.
27.
K. Kim, Y. Yun, K.-W. Kang, K. Kong, S. Lee and S. Kang, "Painting outside as inside: Edge guided image outpainting via bidirectional rear-rangement with progressive step learning", 2021 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 2121-2129, 2021.
28.
D. P. Kingma and J. Ba, "Adam: A method for stochastic optimization", CoRR, vol. abs/1412.6980, 2014.
29.
D. P. Kingma and M. Welling, "Auto-encoding variational bayes", CoRR, vol. abs/1312.6114, 2014.
30.
Y. Li, J.-C. Shi, F.-L. Zhang and M. Wang, "Bullet comments for 360°video", 2022 IEEE Conference on Virtual Reality and 3D User Interfaces (VR), pp. 1-10, 2022.

Contact IEEE to Subscribe

References

References is not available for this document.