Loading [MathJax]/extensions/MathMenu.js
Generative Image Dynamics | IEEE Conference Publication | IEEE Xplore

Generative Image Dynamics


Abstract:

We present an approach to modeling an image-space prior on scene motion. Our prior is learned from a collection of motion trajectories extracted from real video sequences...Show More

Abstract:

We present an approach to modeling an image-space prior on scene motion. Our prior is learned from a collection of motion trajectories extracted from real video sequences de-picting natural, oscillatory dynamics of objects such as trees, flowers, candles, and clothes swaying in the wind. We model dense, long-term motion in the Fourier domain as spectral volumes, which we find are well-suited to prediction with diffusion models. Given a single image, our trained model uses a frequency-coordinated diffusion sampling process to predict a spectral volume, which can be converted into a motion texture that spans an entire video. Along with an image-based rendering module, the predicted motion representation can be used for a number of downstream applications, such as turning still images into seamlessly looping videos, or allowing users to interact with objects in real images, producing realistic simulated dynamics (by interpreting the spectral volumes as image-space modal bases). See our project page for more results: generative-dynamics.github.io.
Date of Conference: 16-22 June 2024
Date Added to IEEE Xplore: 16 September 2024
ISBN Information:

ISSN Information:

Conference Location: Seattle, WA, USA

1. Introduction

The natural world is always in motion, with even seemingly static scenes containing subtle oscillations as a result of wind, water currents, respiration, or other natural rhythms. Emulating this motion is crucial in visual content synthesis-human sensitivity to motion can cause imagery without motion (or with slightly unrealistic motion) to seem uncanny or unreal.

We model a generative image-space prior on scene motion: from a single RGB image, our method generates a spectral volume [23], a motion representation that models dense, long-term pixel trajectories in the Fourier domain. Our learned motion priors can be used to turn a single picture into a seamlessly looping video, or into an interactive simulation of dynamics that responds to user inputs like dragging and releasing points. On the right, we visualize output videos as space-time X -t slices (along the input scanline shown on the left).

Contact IEEE to Subscribe

References

References is not available for this document.