1. Introduction
Optical flow, a critical area in computer vision, plays a key role in various real-world applications like video in-painting [21], action recognition [58], and video prediction [23], [67]. In essence, it captures the displacement vector field for each pixel between successive video frames. Re-cent advances in optical flow estimation, as highlighted by works such as FlowNet [28], PWC-Net [57], RAFT [60], SKFlow [59], FlowFormer [26], and a rethinking training approach by MatchFlow[17], have been successful. This success is attributed to advancements in model architectures [26], [57], [60] and dedicated datasets [17], [19], [42].
End-point-error on Sintel (clean) vs. inference time (ms) and model size (M). All models are trained on FlyingChairs and FlyingThings3D, and tested with one NVIDIA A100 GPU. MemFlow(-T) (x it) indicates running our network with only x iterations of GRU. Our MemFlow(-T) achieves significant reductions in computational overhead as well as substantial performance boosts over the state-of-the-art methods.