1. Introduction
Collecting test videos of sufficient quantity and generating high quality ground truth for meaningful performance evaluation are significant challenges faced by visual surveillance researchers. Publicly available datasets such as the PETS and iLIDS series have made great strides towards alleviating this problem. However, insights into algorithm performance that can be gained from an evaluation of the effects of parameters such as camera placement, lighting changes and crowd density are only coarsely quantized in publicly available data sets, and usually not independently. Ground truth pixel-wise foreground segmentation and annotations such as target class and geo-location are very labour-intensive or difficult to measure in practice and are thus typically absent from available data sets. Nevertheless, this data is desirable for evaluation of even the most basic surveillance tasks such as foreground detection and classification. Finally, available data sets have limited applicability for testing active tracking algorithms or surveillance networks that span tens or hundreds of cameras.