I. Introduction
The surveillance robot [1], [2], [3], [4] is being popular topic in robotics field with the development of robotics [5], [6], [7]. Over the past several decades, researchers have studied indoor/outdoor surveillance algorithms applied to multiple fixed cameras [4], [8] and to a mobile robot equipped with a camera [3]. In the recent decade, researchers have started focusing on deep-learning methods to deal with harsh outdoor environments [2], [9], [10], [11], [12]. Moreover, recent methods leverage multi-modal sensors because the RGB camera-only methods for outdoor surveillance have an obvious limitation caused by image quality degradation when handling various environmental changes [9], [13], [14], [15].
Examples of the first-person view multi-modal dataset: (a) an RGB image, (b) a thermal image, (c) an IR image, and (d) a night vision image. Our dataset provides mask images (e) corresponding to an RGB image (f). It also provides annotations of bounding boxes and tracking IDs. The dataset has been collected for a long period (’’21) and consists of more than 2.5 million images.
Environmental diversity in our dataset: running at night, lying in fog, a walking person with an umbrella in the rain, a fallen person beside a chair in the heavy rain at night with various props (towel, umbrella, hard cap, etc.) and vehicles (car, bicycle, motorcycle, etc.). Our dataset also consists of various places (pavements, roads, warehouses, etc.).