I. Introduction
Collective motion, which is the primary component that makes up a crowd, is one of the most attractive phenomena in both nature and human society. Individuals in a collective motion tend to share consistent property, which is fundamentally important for analyzing the underlying pattern of crowd behavior. Since collective motion provides a mid-level representation of crowds, it has drawn increasing attentions in the field of computer vision, and involves a wide range of applications, such as crowd tracking [1]–[3], crowd counting [4], [5] and action recognition [6]–[9]. However, due to the complex spatial distribution and time-varying dynamics in crowd scenes, both the quantification and detection of collective motion are still difficult tasks.