1. Introduction
General-purpose motion estimation is a long-standing research topic in computer vision, due to its potential difficulties and broad applications. Optical flow estimation (e.g. [1]–[3], to name a few) and point tracking [4]–[7] are two representative methods, which have found their wide usages in motion/video segmentation [8] [9], object tracking [10], human pose estimation [11]–[13] and action recognition [7], etc. Despite the widespread utility, these methods provide only low level motion information, making further motion analysis or recognition from their results challenging. The related results lack compact yet descriptive information about the shape and the motion of objects in the scene, where optical flow is too dense to explicitly reflect structures, while point trajectories are too sparse or unorganized to represent sophisticated structures.