I. Introduction
In recent years, deep crowd counting has gained increasing attention in the computer vision community [1], [2], [3], [4], [5], [6], [7], [8]. Many deep models obtain supreme performance [6], [7], [8], [9], [10], [11], [12], [13], [14], [15] but rely heavily on large-scale real-world datasets along with high-quality annotations. Unfortunately it is time-consuming and laborious to obtain annotations for real-world datasets, and there may exist abundant imprecise head positions especially in crowded scenes.