1. Introduction
Trajectory prediction aims to predict the future trajectory seconds to even a minute prior from a given trajectory history. It plays an indispensable role in a large number of real world applications such as autonomous driving, robotics, navigation, video surveillance, and so on. In self-driving scenario, accurate pedestrian trajectory prediction is essential for planning [3], [42], decision making [81], environmental perception [52], [64], person identification [40], and anomaly detection [50], [78]. Trajectory prediction is a challenging task. For instance, strangers tend to walk alone trying to avoid collisions but friends tend to walk as a group [49]. In addition, pedestrians can interact with surrounding objects or other pedestrians, while such interaction is too complex and subtle to quantify. To consider such interactions, a pooling layer is designed in work Social-LSTM [1] to pass the interaction information among pedestrians, and then a long short-term memory (LSTM) network is applied to predict future trajectories. Following this pat-tern, many methods [24], [38], [75], [82], [86] have been proposed for sharing information via different mechanisms, i.e., attention mechanism or similarity measure. Instead of predicting one determined future trajectory, some generative adversarial network-based (GAN) [11], [16], [21], [35], [56] and encoder-decoder-based methods [7]–[9], [47], [58], [59], [74] have been proposed to generate multiple feasible trajectories.
An example that reveals the limitation of original learning strategy. These two frames are extracted from two different scenes and there is a huge difference between these trajectories.
Statistics of five different scenes, ETH, HOTEL, UNIV, ZARA1, and ZARA2. NoS denotes the number of sequences to be predicted, NoP denotes the number of pedestrians, AN denotes the average number of pedestrians in each sequence, AV denotes the average velocity of pedestrians in each sequence, and AA denotes the average acceleration of pedestrians in each sequence. E-D represents extreme deviation and S-D represents standard deviationMetric | Trajectory Domains | E-D | S-D | ||||
---|---|---|---|---|---|---|---|
ETH | HOTEL | UNIV | ZARA1 | ZARA2 | |||
NoS | 70 | 301 | 947 | 602 | 921 | 877 | 383.63 |
NoP | 181 | 1053 | 24334 | 2253 | 5833 | 24153 | 10073.07 |
AN | 2.586 | 3.498 | 25.696 | 3.743 | 6.333 | 23.11 | 9.78 |
AV | 0.437 | 0.178 | 0.205 | 0.369 | 0.206 | 0.259 | 0.11 |
AA | 0.131 | 0.06 | 0.035 | 0.039 | 0.026 | 0.105 | 0.04 |