I. Introduction
Crowd counting recently has drawn lots of attention from researchers due to its great importance in a wide array of real-world applications including video surveillance, public crowd monitoring, and traffic control. The main objective of crowd counting is to infer the number of people in congested images. Despite the exploration of pioneer works [1]–[6], crowd understanding is still a challenging issue for scenes exhibiting drastic scale variations, density inconsistency, or complex background.