1. Introduction
The natural world is always in motion, with even seemingly static scenes containing subtle oscillations as a result of wind, water currents, respiration, or other natural rhythms. Emulating this motion is crucial in visual content synthesis-human sensitivity to motion can cause imagery without motion (or with slightly unrealistic motion) to seem uncanny or unreal.
We model a generative image-space prior on scene motion: from a single RGB image, our method generates a spectral volume [23], a motion representation that models dense, long-term pixel trajectories in the Fourier domain. Our learned motion priors can be used to turn a single picture into a seamlessly looping video, or into an interactive simulation of dynamics that responds to user inputs like dragging and releasing points. On the right, we visualize output videos as space-time X -t slices (along the input scanline shown on the left).