I. Introduction
With the rapid increase of the population of major cities, crowd analysis has drawn remarkable attention and already become an important security technique in video surveillance and intelligent transportation systems [1]–[8]. Crowd counting can help to improve the emergency planning and prevent congestion in train stations and airports. Various methods have been proposed to tackle this task [9]–[12]. Early works employ low-level features as region descriptors, followed by a classifier for classification [13], [14]. Benefiting from the recent progress in deep learning [15], [16], crowd counting approaches have seen a great success. However, due to problems such as the heavy occlusions and cluttered background, it still remains a challenging task in practical applications [17].