I. Introduction
Object counting by inferring the number of objects in images or video contents is a crucial yet challenging computer vision task. This paper is primarily motivated to address human crowd counting problems whilst being applicable to other domains such as vehicle counting. Due to the occurrence of crowd gatherings in many scenarios such as parades, concerts, and stadiums, a robust and accurate crowd counting model plays an essential role in multimedia applications for security alerts, public space design, transportation management . [1].