I. Introduction
As evidenced in an early Google self-driving car report [24], the 10% of their self-driving malfunctions on streets were due to incorrect behavior predictions of other road users, including pedestrians. While there have been significant efforts to improve the accuracy of pedestrian intention prediction [15], [3], [16], [23], [18], [36], [5], there is still ample room for improvement. Currently, two datasets, JAAD [28] and PIE [27], are being used to benchmark such prediction models. In these datasets, the core ground truth (GT) consists of labeling if pedestrians are crossing or are going to cross in front of the ego vehicle. As for other onboard perception tasks (e.g., object detection and tracking [4], semantic segmentation [39], monocular depth estimation [17]), synthetic datasets have been proposed to train C/NC prediction models [1], [2]. We propose to go beyond these datasets by introducing a framework, named ARCANE
ARCANE stands for adversarial cases for autonomous vehicles, the generic project supporting the development of the framework.
, where traffic scenarios of pedestrian behavior can be programmatically defined. This opens the possibility of introducing underrepresented vehicle-to-pedestrian traffic situations. For being aligned with the research community, ARCANE has been developed on top of the CARLA simulator [11]. As an example, we have used ARCANE to generate PedSynth which is a large and diverse synthetic dataset with pedestrian C/NC labels. Note that this type of labeling is not provided by the CARLA simulator, but it is generated by ARCANE. PedSynth consists of 947 video clips of pedestrian C/NC situations. Each video clip runs ~ 20s at 30fps, so resulting in approximately 5 H and 26 min of labeled videos. Figure 1 shows several frames of two video clips from PedSynth. On the other hand, users can generate their own datasets by working with ARCANE.Summary of two video clips from PedSynth. Top rows: a pedestrian crosses the road perpendicularly to the ego-vehicle moving direction. Bottom rows: a pedestrian change the intention of crossing the road at mid-lane. In both examples, the pedestrians enter the road at locations not enabled for crossing.