1. Introduction
Crowd counting aims to automatically count the number of individuals in images and has been widely applied in many areas, e.g., video surveillance, traffic estimation, and congestion control. Most recent approaches [58], [59], [5], [19] rely mainly on fully-supervised annotation for individuals in the crowd (i.e., placing a dot at the center of each individual) to estimate crowd density. Yet, such an annotation process is extremely time-consuming and laborious. Especially for extremely dense scenarios, it is almost senseless to manually label over-heaped dots just for the purpose of representing crowd density in a scene. Such a tedious annotation process hinders the scale and diversity of crowd datasets and thus slows down the development of this area.