Introduction
With the emergence of autonomous vehicles and advances in the field of intelligent robots in general, the task of human trajectory prediction gained a significant amount of research interest in recent years. Besides more classical, physics-based prediction approaches, e.g. building on the Kalman filter [1] or the social forces model [2], a range of deep learning approaches have been proposed to tackle the problem. The most common deep learning models either build around long short-term memory networks (e.g. [3]), convolutional neural networks (e.g. [4]), generative adversarial networks (e.g. [5]) or transformers (e.g. [6]) and vary in contextual cues considered for prediction. Commonly used contextual cues include social (e.g. [7], [8]) and environmental (e.g. [9], [10]) cues. For a comprehensive overview of existing human trajectory prediction approaches, the reader may be referred to [11], [12] [13].
Inherent to prediction model development is the need for proper benchmarks aimed at measuring a model’s prediction performance. Due to the direct relation between dataset complexity and model capacity, creating a not too simple or too hard-to-solve benchmark for human trajectory prediction is still a difficult task. On large datasets, even simple models overfit, while in other cases the prediction performance is poor on individual samples, even for high capacity models. Here, one of the difficulties is the open question of how the complexity of a given dataset for trajectory prediction can be quantified. As a consequence, current attempts in standardized benchmarking originate from heuristics or experience-based criteria when assembling the data basis. Recent examples are the TrajNet challenge [14] or its extension TrajNet ++ [11], [13].
Currently, in human trajectory prediction, the analysis of benchmarking data only takes on a minor role and focuses on a specific aspect of the data. The most common subject for analysis is the existence of social interaction resulting in non-linear behavior with respect to motion. For such analyses, social force models and collision avoidance methods [15] or deviations from regression fits [3], [16] are employed for example. Such methods are also used in TrajNet++ for splitting up the benchmark into interaction and non-interaction tasks. Besides that, basic analyses include dissecting velocity profiles or positional distributions [17]. Following that, there is still a lack of approaches trying to analyze datasets as a whole, with the goal of quantifying the overall complexity of the dataset.
When targeting a qualitative analysis of data complexity, a common approach are low-dimensional embeddings for data visualization, like for example t-SNE [18] or variations of PCA [19]. While such approaches are viable for non-sequential, high-dimensional data, a prototype-based clustering approach seems more viable for sequential data. This is especially true for trajectory datasets, where each dataset can be reduced to a small number of prototypical sub-sequences specifying distinct motion patterns, where each sample can be assumed to be a variation of these prototypes. Additionally, in the context of statistical learning, the complexity of a dataset can be closely related to the entropy of a given dataset. Measuring the dataset entropy, i.e. the amount of information contained in a dataset, is still an open question in the context of trajectory datasets gaining interest recently [20], [21].
Towards this end, an approach for estimating an entropy-inspired measure, the pseudo-entropy, building on a dataset decomposition generated by an adequate pre-processing and clustering method is proposed. For decomposing a given dataset into clusters of distinct velocity-agnostic motion patterns, a spatial alignment1 step followed by vector quantization is applied. Given the dataset decomposition, the pseudo-entropy is estimated by analyzing the prediction performance of a simple trajectory prediction model when gradually enriching its training data with additional motion patterns. This paper is an extension of [20] and focuses mainly on dataset complexity. The main contributions are:
A learning and heuristics-based approach for finding a velocity-agnostic, prototype-based representation of trajectory datasets.
An approach for estimating trajectory dataset complexity in terms of an entropy-inspired measure.
A coarse complexity-based ranking of standard benchmarking datasets for human trajectory prediction.
The paper is structured as follows. Sections II and III present a data pre-processing approach necessary for the actual dataset entropy estimation detailed in section IV. In addition to a coarse dataset ranking, the evaluation section V discusses the ranking, as well as the approach and methods used throughout this paper. Further, some interesting findings resulting from the analysis are discussed. Section VI summarizes the paper, lists potential implications for the state of benchmarking and gives a brief discussion on potential future research directions.
For convenience, several definitions and notations used throughout the paper are listed below:
A trajectory
is defined as an ordered set2 of pointsX .\{\mathbf {x}_{1}, \ldots, \mathbf {x}_{N}\} The length of a (sub-)trajectory
always refers to its cardinalityX , rather than the spatial distance covered.|X| The distance between two trajectories
andX of the same lengthY is defined as|X| = |Y| .d_{tr}(X, Y) = \sum _{j=1}^{|X|} \| \mathbf {x}_{j} - \mathbf {y}_{j} \|_{2} The number of samples, the trajectory length and the number of prototypes are denoted as
,N andM , respectively. For indexing,K ,i andj are used.k The
-quantile, withq , of a set of numbersq \in [{0, 1}] is denoted as\{\cdot \} .Q_{q}(\{ \cdot \})
Spatial Sequence Alignment
With the goal of reaching a velocity-agnostic, prototype-based trajectory dataset representation in mind, the trajectory alignment approach proposed in this section fulfills two integral roles as a pre-processing for the subsequent clustering step. On the one hand, it aligns the data, such that similar patterns are pooled together. On the other hand, it removes variations in velocity among trajectories, therefore generating a dataset with normalized velocity. This, in turn, is essential in obtaining a velocity-agnostic dataset representation. In addition, velocity normalization ensures that similar motion patterns, which only vary in velocity, i.e. in scale, can be pooled together.
Given a set of trajectories (samples) \begin{equation*} X^{norm}_{i} = \left \{{\frac {\mathbf {x}^{i}_{j} - \bar {\mathbf {x}}}{\mathbf {x}^{i}_{M} - \mathbf {x}^{i}_{1}} \mid j \in [1,M] }\right \}.\tag{1}\end{equation*}
Here, \begin{equation*} \mathcal {L}_{\mathrm {align*}}(\phi (X^{norm}_{i}), \hat {Y}) = \frac {1}{M} \sum ^{M}_{j=1} \|\mathbf {x}^{i}_{j} - \hat {\mathbf {y}}_{j}\|^{2}_{2}\tag{2}\end{equation*}
An exemplary result of this alignment approach is depicted in figure 1. By aligning all samples with a single prototype, aligned samples have a common orientation and form clusters of similar samples.
Learning Vector Quantization
Clustering approaches can be applied after the spatial alignment, as it distributes random errors homogeneously over the sequence and exposes clusters of motion patterns. In the landscape of clustering approaches, there exists a wide range of approaches to choose from, ranging from simple established approaches (e.g. k-means [22], DBSCAN [23] or learning vector quantization [24]) to more sophisticated or specialized neural models (e.g. [25], [26] [27]). In the context of this paper, the choice of the clustering approach itself only plays a minor role, thus a learning vector quantization (LVQ) approach is employed, as it can directly be integrated into a deep learning framework. For a more comprehensive review on clustering approaches, the reader may be referred to recent surveys, for example [28] or [29].
The resulting pipeline corresponds to the encoder part of an auto-encoding architecture for representation learning inspired by [30], which is capable of learning meaningful representations. Here, aligned samples are mapped onto \begin{equation*} \mathcal {L} = \underbrace {\frac {1}{N} \sum ^{N}_{i=1} d_{tr}(X^\phi _{i}, Z_{z(i)})}_{\mathcal {L}_{LVQ}} + \gamma L_{reg},\tag{3}\end{equation*}
\begin{equation*} z(i) = \mathop {\mathrm {arg\,min}} _{k} d_{tr}(X^\phi _{i}, Z_{k}).\tag{4}\end{equation*}
Note that due to the fixed trajectory length, the mean squared error is a suitable similarity measure. If the length would vary, a more sophisticated measure would be necessary [31].
Using
As the value for
is unknown a priori, it should in general be chosen larger than expected.K Due to the winner takes all strategy,
only updates prototypes that have supporting samples.\mathcal {L}_{LVQ}
For describing these approaches, the support of a prototype plays an integral role. The support \begin{align*} \pi (\mathcal {Z})=&\left \{{ \sum ^{N}_{i = 1} \mathbf {1}_{k}(i) \mid k \in [1, \lvert \mathcal {Z} \rvert] }\right \}, \\ \mathrm {with} \\ \mathbf {1}_{k}(i)=&\begin{cases} 1 & \mathrm {if}~z(i) = k \\ 0 & \mathrm {else} \end{cases}.\tag{5}\end{align*}
A resulting set of prototypes using the approach described in this section and following subsections, for the dataset shown in figure 1, is depicted in figure 2. It can be seen, that the prototypes cover a certain range of motion patterns: constant velocity, curvilinear motion, acceleration and deceleration.
Possible set of prototypes (red) for a given aligned dataset (top row). The prototypes represent different motion patterns (from left to right): Constant velocity, curvilinear motion, acceleration and deceleration.
A. Initialization
The main objective of the initialization step is two-fold. On the one hand, the number of out-of-distribution prototypes should be reduced in the initial set of prototypes
Taking this into account, the alignment prototype
Left: Aligned dataset
B. Regularization
While the initialization helps in increasing the average support for each prototype,4 some out-of-distribution samples5 might be assigned to individual prototypes, resulting in little support from other samples.
To ensure optimization of all prototypes, a regularization term \begin{equation*} L_{reg} = \frac {1}{K - 1} \sum _{Z_{k} \in \mathcal {Z} \setminus Z_{*}} d_{tr}(Z_{k}, Z_{*}).\tag{6}\end{equation*}
Under ideal conditions, \begin{equation*} L_{reg} = \frac {1}{N \cdot K} \sum _{i=1}^{N}\sum _{k=1}^{K} d_{tr}(X_{i}, Z_{k}).\tag{7}\end{equation*}
Intuitively, by choosing an appropriate value for the regularization weight
As a side-note, very imbalanced prototype sets, in terms of many low-support prototypes, can also be measured by the perplexity score \begin{align*} \mathcal {P}_{\mathcal {Z}}=&\exp \left \{{ -\sum _{k=1}^{K} \pi _{\mathrm {norm}}(\mathcal {Z})_{k} \odot \log \{ \pi _{\mathrm {norm}}(\mathcal {Z})_{k} \} }\right \}, \\ \mathrm {with} \\ \pi _{\mathrm {norm}}(\mathcal {Z})_{k}=&\frac {\pi (\mathcal {Z})_{k}}{\sum _{k=1}^{K} \pi (\mathcal {Z})_{k}}. \tag{8}\end{align*}
Due to
C. Refinement
Finally, a heuristic refinement scheme, building on the expected results when using the regularization approach presented in section III-C, is employed in order to remove unnecessary prototypes when \begin{equation*} \mathcal {Z}' = \mathcal {Z} \setminus \left \{{ Z_{k} \mid \pi (\mathcal {Z})_{k} < \tau _{\mathrm {phase1}} }\right \}.\tag{9}\end{equation*}
The second phase revolves around removing prototypes similar to the most supported prototype \begin{align*} \mathrm {sim}(Z'_{k}, Z_{*})=&\begin{cases} 1 & \mathrm {if}~|\mathcal {S}(Z'_{k}, Z_{*})| \geq \epsilon _{\mathrm {phase2}} \cdot \lvert Z_{*} \rvert \\ 0 & \mathrm {else} \end{cases}, \\ \mathrm {with} \\ \mathcal {S}(Z'_{k}, Z_{*})=&\left \{{ \mathbf {z}^{k}_{j} \mid \| \mathbf {z}^{k}_{j} - \mathbf {z}^{*}_{j} \|_{2} < \tau (Z_{*})_{j}, j \in [1, M] }\right \} \\ \tau (Z_{*})_{j}=&Q_{0.99}\left ({\left \{{ \| \mathbf {z}^{*}_{j} - \mathbf {x}_{j} \|_{2} \mid \mathbf {x}^{i}_{j} \in X_{i},~X_{i} \in \mathcal {X}^\phi }\right \} }\right). \\\tag{10}\end{align*}
Example of assessing prototype similarity in the second phase of the refinement scheme. The first row shows the alignment of a prototype
Estimating Trajectory Dataset Entropy
This section discusses an attempt in moving towards enabling a thorough complexity analysis of human trajectory prediction benchmarking datasets. While previous work (e.g. [3], [15], [17]) focuses on statistics directly derived from the datasets, like histograms or deviations from linear prediction, the approach proposed in the following relies on a dataset decomposition
As an initial proof of concept, \begin{equation*} \mathcal {M}(X_{f}) = WX_{f} + b,\tag{11}\end{equation*}
Estimate Dataset Pseudo-Entropy
Decomposed, sorted datasets
for
if
if
PseudoEntropy++
end if
end if
end for
return PseudoEntropy
Evaluation
This section starts with a setup common to all experiments, followed by a qualitative evaluation of the simple prediction model described in section IV and the LVQ model for dataset decomposition. Next, a coarse ranking of standard benchmarking datasets for long-term human trajectory prediction based on pseudo-entropy is given. The section closes with a discussion on the approach and methodology itself, other possible factors contributing to dataset complexity and interesting findings.
Evaluations are conducted on scenes taken from the following frequently used benchmarking datasets: BIWI Walking Pedestrians ([33], abbrev.: biwi), Crowds by example (also known as the UCY dataset, [34], abbrev.: crowds) and the Stanford Drone Dataset ([15], abbrev.: sdd). Besides being typically used for evaluating human trajectory prediction models in the literature, the original TrajNet challenge was built around these datasets. The scenes in the datasets are denoted as Dataset: Scene Recording, e.g. recording 01 of the zara scene in the crowds dataset is denoted as crowds:zara01. Note that for sdd, different recordings of the same scene do not necessarily capture the same campus area (but there might be some overlap). An overview of statistical details of the datasets is given in appendix A.
For trajectory prediction tasks targeted in this section, trajectories are split in half in order to obtain observation and target sequences. The prediction error is reported in terms of the average displacement error.
A. Setup
First, the datasets are augmented to have a common sample frequency. The biwi and crowds scenes already have the same sample frequency of 2.5 samples per second, thus the sample frequency of all the sdd scenes is adjusted accordingly.
Next, as the prototype-based representation only works with trajectories of the same length, an appropriate sequence length has to be chosen for each dataset in the evaluation. The most commonly used sequence length in recent benchmarks is
Then, trajectories not exceeding a dataset-dependent minimum speed6 \begin{align*} s_{\mathrm {min}}=&\frac {\max _{i} \mathrm {m}_{\mathrm {speed}}(i) - \min _{i} \mathrm {m}_{\mathrm {speed}}(i)}{M} \\ \mathrm {with} \\ \mathrm {m}_{\mathrm {speed}}(i)=&\frac {1}{M - 1} \sum _{j=2}^{M} \| \mathbf {x}^{i}_{j} - \mathbf {x}^{i}_{j - 1} \|_{2}\tag{12}\end{align*}
Here,
Finally, for each dataset and sequence length, the training of the alignment and LVQ networks are run 10 times. The number of initial prototypes is set to
B. Simple Prediction Model Capabilities
The pseudo-entropy estimation approach assumes that the learned linear transformation prediction model
Prediction results of a learned linear transformation (blue), given the first 9 points of a smooth prototypical trajectory (red). Each example resembles a basic motion pattern: constant (a.), accelerated (b.) and curvilinear (c.) motion.
C. LVQ Dataset Decomposition: Quality, Consistency and Sensitivity
In this section, the viability of the LVQ model given an aligned dataset is evaluated, as it builds the basis of the proposed approach for estimating the information content. This evaluation employs three exemplary datasets of varying assumed complexity, the biwi:eth, crowds:zara01 and the sdd:hyang04 dataset. The evaluation targets the quality and consistency of resulting dataset decompositions, as well as the approach sensitivity to its refinement parameters. Due to the similarities of datasets used throughout the evaluation, it is assumed that the findings of this section will carry over to the other datasets.
Starting with consistency, the 10 available training iterations are examined. The first row of figure 6 depicts the number of components identified by the LVQ model before (blue) and after (orange) refinement. Although there are some small fluctuations, there are no strong deviations which cannot be compensated by averaging.
Number of resulting clusters after multiple training runs before (blue) and after (orange) refinement (1st row) and the influence of different values for the parameters
The influence of the heuristic refinement when varying its parameters
Finally, analyzing the quality of the resulting decomposition, two things have to be considered: the motion patterns represented by each prototype and the significance of each cluster. As the first point is hard to verify quantitatively, a visual inspection is employed. Looking at, for example figure 2 or 7, the learned prototypes and identified motion patterns appear reasonable for bird’s eye view datasets. This is also discussed briefly in section V-E. For evaluating the quality of the decomposition itself, an approach using a simple prediction model similar to the one described in algorithm 1 can be employed. While training models on increasingly complex combinations of clusters remains, the test set errors are now compared to the prediction errors on the remaining clusters. Then, ideally, a significant difference between these errors justifies the existence of the remaining clusters next to the ones combined in the training dataset. Table 1 depicts all mean test set and prediction errors (including standard deviations) for clusters generated for sdd:hyang04. Here, all pairwise differences in each row are significant, thus verifying the learned decomposition. Significance is determined by using a t-test for independent samples using a significance level
Data and aligned data (a. & b.: biwi:eth, c. & d: sdd:nexus01) with similar sequence lengths (12 & 15), as well as learned prototypes for biwi:eth (e. & f.) and sdd:nexus01 (g. – l.). Having a higher complexity score, sdd:nexus01 consists of a higher variety of motion patterns, including constant velocity (g.), curvilinear motion (h. & i.), acceleration (j.), deceleration (k.) and a mixed pattern (l.). The fraction of samples assigned to each prototype are 93% (a.) and 7% (b.) for biwi:eth and 43.5% (g.), 19.4% (h.), 13.7% (i.), 8.6% (j.), 5.6% (k.) and 9.2% (l.) for sdd:nexus01.
D. Coarse Dataset Ranking
Table 2 lists commonly used datasets, grouped by their average rounded pseudo-entropy values, calculated according to algorithm 1 for each trained pair of alignment and LVQ models. Again, a t-test for independent samples with a significance level of
E. Discussion
Conclusively, selected aspects of the proposed approach and employed methodology are discussed. Then, potential factors for creating a more fine-grained ranking and some insights gained from the analyses are discussed.
1) Approach and Methodology
In the context of dataset complexity assessment, normalizing the velocities using the alignment model should be discussed. Given a prototypical motion pattern, variations in velocity can be generated by scaling it accordingly, thus it is assumed that the pattern itself is the main contributor to a higher dataset entropy. In case the original motion patterns are required, recall that the clustering pipeline corresponds to the encoding part of a well-established auto-encoding architecture, thus the original velocities can be recovered from the dataset representation when employing the full auto-encoding architecture.
Further, a dataset-dependent trajectory length
The pseudo-entropy aims to reveal the average amount of information contained in a given dataset. Looking at the coarse ranking in section V-D, the results appear reasonable from an experience point of view. For verifying this coarse ranking in an experimental setup using state-of-the-art trajectory prediction models, it would be necessary to put all datasets in a common reference frame and re-sample trajectories to achieve a matching ground resolution, i.e. the distance between subsequent trajectory points of objects moving at the same real-world speed must be equal, for all datasets. This can be an interesting experiment for future work on the topic of dataset complexity analysis.
Lastly, the presented approach only allows for a coarse complexity ranking of given datasets. For achieving a more fine-grained ranking, additional factors need to be considered. Possible factors are discussed next.
2) Potential Factors Affecting Complexity
Multiple factors contributing to dataset complexity could be derived from a learned dataset decomposition. First, the diversity between motion patterns, covered implicitly in the pseudo-entropy, could be considered explicitly. In case of distinguishable patterns, statistical models need to be capable of capturing multiple modes in the data, requiring a higher modeling capacity. Thus, a higher pattern diversity is expected to correspond to a higher dataset complexity. The second factor considers occurring variations of the same pattern. This factor looks promising, as a higher variation implies a higher uncertainty when modeling specific motion patterns, making it harder to capture by using statistical models. Lastly, the relevance distribution of identified motion pattern could be considered. This mainly focuses on biases in the data, and thus answers the question if there is a prevalent motion pattern or if the occurrence of all patterns is equally likely. Then, less biased datasets can be considered as more complex, as less biased data enables statistical models to capture different patterns in the first place.
Beyond that, agent-agent interaction, as well as the environmental cues can play an integral role in assessing trajectory dataset complexity. Looking at interaction, its influence on the shape of single trajectories can be significant, though this heavily depends on the chosen sample frequency as well as the ground resolution of a given dataset. More specifically, the influence of agent-agent interaction becomes less relevant, the sparser a trajectory is sampled, due to interactions mainly occurring on short time scales. The same applies to the ground resolution, where interactions become less visible when the spatial distance between subsequent trajectory points increases. As a final note, sensor noise must be considered, as there is a risk of interactions being indistinguishable from noise. For environmental cues, positional biases caused for example by junctions can heavily impact the occurrence of specific motion patterns, especially curvilinear motion. This leads to more diverse, and thus to potentially more complex datasets.
3) Interesting Findings
All datasets in this comparison are recorded from a birds-eye view. Inherent to this perspective is the expectation that there are common motion patterns in all datasets, independent of the time horizon. This fact has been, with a few exceptions, confirmed, in that almost all scenes contain slight variations of at least one basic motion pattern, including constant, accelerated, decelerated and curvilinear motion. Some datasets contain multiple variations of the same basic motion pattern or even mixed motion patterns, enabled by the, partially, high sequence length
Another aspect related to the motion patterns found in the data is, that in all datasets, the constant velocity pattern is the dominant, i.e. most supported, motion pattern, covering a large fraction of the entire dataset (see figure 8 for exemplary fractions for low to high complexity scene datasets). This has multiple implications related to common evaluation methodology in current state-of-the-art publications. On the one hand, it is a perfect explanation for the difficulties in beating a simple linear extrapolation model in the task of human trajectory prediction. This phenomenon could for example be observed during the TrajNet challenge [14], where multiple of the first submissions failed to beat the linear model. On the other hand, this fact indicates, that an arbitrarily assembled benchmarking data basis poses the risk of rendering corner cases, i.e. motion patterns different from the constant velocity pattern, statistically irrelevant. This leads to statistical models that are incapable of modeling more complex motion and also struggle with beating a linear model.
Depicts the fraction of samples assigned to the constant velocity prototype (sample ratio) for a range of low to high complexity scene datasets. The fractions are calculated by taking the mean sample ratio, taking multiple iterations and sequence lengths into account. Even with a high number of distinct motion patterns in the data (e.g. sdd:nexus01), constant velocity samples make up 52% of all samples in the dataset.
4) Comparison With OpenTraj
Dataset complexity estimation in the context of human trajectory prediction still poses an unexplored topic in the literature. Because of that, there is no quantitative approach for measuring and comparing the performance of different approaches for dataset complexity estimation. As a result, this section resorts to a qualitative comparison of the pseudo-entropy approach presented in this paper and the OpenTraj approach and aims to serve as a verification of results. Using the result provided in [21], a superficial comparison can be made by comparing the coarse dataset ranking based on pseudo-entropy with the clustering and entropy analyses in OpenTraj (figure 3 in [21]). It has to be noted, that in this paper every scene of each dataset has been treated independently (e.g. sdd:deathcircle - scene 1), while OpenTraj pooled the results for each individual dataset. Being common to both evaluation sections, the eth hotel, zara and sdd:deathcircle datasets can be used as a sample for the comparison. Looking at the pseudo-entropy based coarse ranking, these datasets provide a dataset of low, medium and high entropy, respectively. This relative ranking is consistent with the findings provided by OpenTraj, confirming the plausibility of both approaches.
Concluding Remarks
In the context of statistical learning, dataset complexity is closely related to the entropy of a given dataset. Thus, an approach for estimating the amount of information contained in trajectory datasets was proposed. The approach relies on a velocity-agnostic dataset representation generated by an alignment followed by vector quantization. Using this approach, a coarse complexity ranking of commonly used benchmarking datasets has been generated. A following discussion addressed the results and methods used, as well as interesting findings based on the analyses, stressing the importance of a well-rounded data basis.
5) Implications for the State of Benchmarking
The approach, methods and results presented in this paper can be valuable in the context of dataset and prediction model analysis, as well as benchmarking in general. First of all, the spatial sequence alignment combined with the LVQ approach can be used for analyzing datasets on different timescales, e.g. for extracting underlying motion patterns. This analysis especially benefits the selection of observation and prediction sequence lengths in benchmarking, as well as the selection of an appropriate prediction model in terms of model capacity. The latter is motivated by the fact, that low-capacity models usually suffice for less complex data, which might in turn reduce cases of over-fitting and unnecessary computational effort. Further, the resulting dataset decomposition can be used to enhance qualitative analyses of prediction model capabilities in cases where the model might struggle with specific subsets of the data. Lastly, a hierarchy of tasks within a benchmark with increasing difficulty could be built on the dataset decomposition in combination with the presented coarse dataset ranking.
6) Future Research Direction
This paper aims to constitute a step towards thorough dataset complexity analysis. The following paragraphs try to give some open research directions in order to expand on the approach and findings of this paper.
Consider Model Uncertainty. Currently, model uncertainty is only considered implicitly by averaging multiple instances of the presented pipeline when estimating the dataset pseudo-entropy. However, the variance of the ensemble is disregarded in this proof of concept. Thus it could be interesting to examine if explicitly incorporating the models ’ uncertainty about its output could benefit the entropy estimation.
Birds-eye View. So far, all compared datasets are birds-eye view datasets. While this view is the common case7 for long-term human trajectory prediction datasets, trajectory complexity analysis is also relevant for other views (e.g. frontal view). Considering the structure of the presented approach, the entropy estimation should work as long as the spatial sequence alignment is applicable to the scenario of interest. In the current state, the alignment model expects complete tracklets as input and thus does not have to cope with missing observations, for example arising through occlusions when using a frontal view. Following this, the alignment model should be extended accordingly and be evaluated on a range of different datasets with varying views and object types.
Appendix ADataset Details
Dataset Details
In order to give more details on the datasets used throughout this paper, table 3 lists the number of samples included in each dataset, the recording conditions (location and acquisition) as well the average trajectory length (with standard deviation). In accordance with the evaluation section V, the annotation rate has been aligned for all datasets to a fixed rate of 2.5 annotations per second.