Introduction
Urban areas worldwide grapple with an enduring challenge—persistent traffic congestion. In 2018, the average American spent a staggering 97 hours annually ensnared in traffic, resulting in a cumulative cost approaching
Traditionally, estimating departure times for home-to-work trips has primarily relied on conventional methods within the context of traffic forecasting. One common approach is the use of parametric models, such as the four-step traffic assignment model, which includes trip generation, trip distribution, mode choice, and traffic assignment steps [4]. These models often make assumptions about the distribution of departure times, typically assuming fixed peak hours. However, these assumptions may not accurately capture the evolving nature of departure time patterns, particularly in the aftermath of the COVID-19 pandemic. Moreover, conventional models tend to rely on a limited set of covariates, such as time-of-day and historical traffic data, while overlooking factors like the behavior of similar commuters or hybrid work patterns. These limitations underscore the need for more sophisticated and adaptable approaches, which our model addresses by integrating deep learning [5], survival analysis [6], and collaborative filtering [7] to provide more accurate and flexible predictions of departure times.
A. Our Contributions
We develop a hierarchical modeling framework to integrate travel behavior insights derived from survival analysis as layers of deep neural networks (DNNs). This methodology combines the robustness of time-to-event statistical analysis with the flexibility and learning capabilities of neural networks, offering a promising tool for analyzing and predicting commuter departure times.
We employ collaborative filtering through K-means clustering to segment the commuter population into distinct behavioral classes. This segmentation allows our models to address individual variations in commuting patterns effectively. By identifying these behavioral classes, our approach can tailor predictions to specific commuter profiles, enhancing the model’s relevance and applicability to diverse urban settings.
We highlight the promise of the proposed approach via empirical experiments in which the SA-informed neural architectures outperform the vanilla models across different training scenarios.
Related Work
A. Analyses on Commuter Departure Time
In the realm of travel behavior, various factors have been identified as pivotal in influencing the commuter departure time choice. These include work schedule flexibility, mode of transportation, demographic variables such as income and age, and the marginal utilities of activities [8]. The complexity of these decisions is further nuanced by gender differences, with distinct commuting patterns observed between male and female commuters [9]. Additionally, congestion levels, travel time, and scheduling costs, particularly concerning activities at both the origin and destination, play significant roles [10]. The methodologies used to explore these factors are diverse, encompassing logit model formulations [8], hazard-based duration modeling [9], Bayesian estimates of continuous departure time models [11], and utility maximization models [10].
Sociodemographic variables have been intricately linked with travel behavior with two studies analyzing this relationship with respect to the morning commute. Abkowitz [8] developed a commuter departure time choice model using data from 991 commuters in the San Francisco Bay Area under 1972 conditions. His results highlighted the importance of work schedule flexibility, mode of transportation, occupation, income, and age in influencing departure time choices. His logit model formulation linked the probability of choosing a particular departure time as a function of the above exogenous factors. On the other hand, Dissanayake [9] analyzed commuting patterns in the U.K. using the national travel survey, employing a Hazard-based duration modeling method. Her results emphasized the role of gender and transport model in commuting patterns-she found that male commuters were more likely to use cars, while female commuters were more inclined towards public transport for shorter trips.
Other works took a more econometics-based approach. Ettema and Timmermans [10] used travel diaries from the Netherlands to fit a model based on the maximization of utility derived from trips and activities. Their approach, which emphasizes the continuous nature of time, allowed a clearer interpretation of how individuals allocate time to various activities and travel, considering their interdependencies. They found that scheduling costs and the trade-off between travel time and activity participation had a significant impact on the departure time choice. Methodologically, Gadda et al. [11] introduced a Bayesian approach to estimate continuous departure time models for various trip purposes, incorporating various distributional specifications like the log-normal Weibull and mixtures of normals. Their approach allowed for a more nuanced understanding of unimodal and multimodal departure time data while accounting for unobserved heterogeneity.
B. Machine Learning in Transportation and Travel Behavior
Machine learning methods have become integral to transportation research over the past decade, providing state-of-the-art accuracy and flexibility across various data types, including textual, numeric, and categorical data. Many DNN-based models work by replacing unknown functions with a neural network and minimizing a loss function with respect to the network’s parameters. While these models have shown empirical success in solving complex problems, their performance may falter on simpler tasks if the architecture is not appropriately aligned with the problem domain [12], [13].
On the demand side, neural network architectures have been developed to explicitly account for nonlinear effects from social and behavioral factors. For instance, socially-aware models integrate sociodemographic attributes to improve demand prediction accuracy, addressing fairness concerns in transportation modeling [14]. Other work has developed censorship-aware models, which adjust for demand that is unobserved due to supply limitations, improving our understanding of latent demand patterns [15]. Autoregressive neural networks have advanced the state-of-the-art in probabilistic time-series forecasting, with applications to traffic forecasting [16] and food demand for grocery chains [17]. Similarly, physics-informed neural networks, which incorporate physical constraints into the model architecture, have shown great potential in accurately simulating traffic flows [18].
On the supply side, machine learning techniques have been used to model transportation network phenomena, particularly in the context of Transportation Network Companies (TNCs) and other shared mobility services. Graph neural networks have been used to represent complex transportation networks, capturing the nonlinear relationships between nodes (e.g., neighborhoods) and edges (e.g., routes) to facilitate traffic forecasting [19]. Other studies have employed deep reinforcement learning to model driver allocation in TNC frameworks, improving the efficiency of supply matching and reducing rider wait times [20]. These models help optimize transportation networks by learning from dynamic and large-scale data streams.
In addition to traditional neural networks, recent advances in transformer-based models, specifically Large Language Models (LLMs), have shown promise in transportation research. These models, which excel in processing sequential data, have been applied to human mobility modeling with significant success [21]. LLMs’ ability to model complex temporal dependencies makes them particularly well-suited for capturing the intricacies of travel behavior over time. A number of previous efforts explore the potential applications of LLMs in and around the tourism industry [22], [23].
Bayesian machine learning techniques have also made significant contributions to the field, particularly in large-scale human mobility analysis [24], [25]. Bayesian models offer a probabilistic framework that can incorporate uncertainty and prior knowledge, making them valuable tools for handling noisy or incomplete transportation data. To this end, novel temporal factorization frameworks promise advances in probabilistic time-series forecasting [26], while Bayesian dynamic regression methods demonstrate benefits for car-following models [27].
For a more comprehensive review of recent developments in the application of machine learning to transportation, we refer the reader to Chen [28], who provides an in-depth discussion of current trends and future directions in this rapidly evolving field.
C. Survival Analysis in Deep Learning
Survival models are widely acknowledged for their efficacy in predicting the occurrence time of events across diverse domains, encompassing medical patient prognosis, user churn projection, and machinery maintenance scheduling, etc. Commonly used methods in Survival Analysis encompass non-parametric techniques like Kaplan-Meier estimators, semi-parametric models such as Cox Regression, and parametric approaches like linear regression.
The integration of machine learning methodologies with survival models presents an opportunity to harness the robust predictive capabilities inherent in machine learning while preserving the structural framework and characteristic outcomes of survival models, such as the temporal evolution of survival probabilities or hazard curves. Despite the advancements, conventional machine learning-based survival models necessitate meticulous feature engineering. Leveraging neural networks’ adeptness in extracting pertinent insights efficiently from extensive data sets, the fusion of deep learning models with survival analysis offers promising avenues.
For instance, the DeepSurv [29] model amalgamates the Cox Proportional Hazard model with a multilayer perceptron (MLP), leveraging various hyperparameters of feed-forward neural networks while adhering to proportional Hazard assumptions. Lads [30] (LSTM Attention Deep Survival), capitalizes on LSTM for feature extraction and time-series data pre-processing, augmented by an Attention mechanism that bolsters model interpretability. RNN-Surv [31] employs Recurrent Neural Networks (RNNs) to sequentially forecast a distribution over time to anticipate the subsequent event. DeepHit [32] incorporates a shared sub-network alongside a family of cause-specific sub-networks, facilitating the accommodation of multiple competing risks.
Methodology
We first provide some background on survival analysis in the context of our problem, namely the morning departure behavior of individuals who commute. Then, we introduce the proposed framework to estimate a solution.
A. Preliminaries
1) Survival Function
Let D represent the time of departure for an individual from a specific location, which is a random variable that can take any non-negative value. The survival function \begin{equation*} S(t) = Pr(D \gt t)\end{equation*}
This function provides the probability that an individual’s departure time is later than a specific time t. In other words, it’s the likelihood that a person has not departed by time t. The following are some properties of this function
$0 \leq S(t) \leq 1$ , where$F_{D}(t) = 1 - S(t)$ is the Cumulative Distribution Function (CDF) of D$F_{D}(t)$ is a non-increasing function of t.$S(t)$
Furthermore, the hazard function at time t, denoted as \begin{equation*} \lambda (t) = \lim _{\delta t \xrightarrow {} 0} \frac {Pr\left ({{t \leq D \leq t + \delta t | D \gt t}}\right)}{\delta t} \tag {1}\end{equation*}
This function measures the likelihood of a commuter departing at time t, conditional on not having departed up to that point. A key relationship that can be shown is that\begin{equation*} \lambda (t) = \frac {-S^{\prime }(t)}{S(t)} \tag {2}\end{equation*}
\begin{equation*} S(t) = \exp \left ({{- \int _{0}^{t} \lambda (z) dz}}\right) = \exp (-H(t)) \tag {3}\end{equation*}
In the context of departure times, this framework allows us to understand the dynamics of when individuals are most likely to depart, considering the time elapsed and the probability of not having departed until that point.
2) Cox Proportional Hazards Model
The Cox proportional hazards model uses the hazard function to capture interactions between a set of predictors and risk of the event occurring. The idea behind this model is that the log-hazard of an individual is a linear function of their covariates and a population-level baseline hazard that changes over time. Given a vector \begin{equation*} \lambda (t|x) = \lambda _{0}(t)\exp \left ({{\sum _{i=1}^{d} \beta _{i} x_{i} }}\right) \tag {4}\end{equation*}
When the Cox model is used for inference, it must satisfy the proportional hazards assumption, which states that the hazard ratio between any two samples remains constant over time [33]. This assumption can be evaluated by examining the p-values and testing the proportional hazards condition. However, in our study, the primary objective is to maximize prediction accuracy rather than to infer statistical relationships or establish significance. Therefore, strict adherence to the proportional hazards assumption is not required.
While p-values play an important role in assessing the statistical significance of individual covariates in traditional survival analysis, our focus is on model performance in predicting commuter departure times. If including certain variables improves accuracy on unseen data, we consider their contribution valuable regardless of statistical significance. Nonetheless, examining p-values can still offer insights into the interpretability of the model, revealing stronger or weaker associations that could inform future analysis or guide domain-specific research.
3) Model Estimation
The
To formulate the partial likelihood, the f unique failure times are ordered increasingly \begin{align*} \ell \left ({{\beta }}\right)=& \log \left ({{\prod _{i=1}^{f} \frac {\exp \left ({{\sum _{s\in I(i)} \beta ^{\top } x_{s}}}\right)}{\sum _{j \in R_{i}} \exp \left ({{\beta ^{\top } x_{j}}}\right)}}}\right) \tag {5}\\=& \sum _{i=1}^{f} \left [{{\left ({{\sum _{s \in I(i)} \beta ^{\top } x_{s} }}\right) - e_{i} \log \left ({{\sum _{j \in R_{i}} \exp \left ({{\beta ^{\top } x_{j}}}\right) }}\right) }}\right ] \tag {6}\end{align*}
\begin{equation*} \min _{\beta } -\ell \left ({{\beta }}\right) + \Lambda \|\beta \|_{1} \tag {7}\end{equation*}
\begin{equation*} \lambda _{0}(t) = \sum _{i: T_{i} \leq t} \frac {e_{i}}{\sum _{j \in R_{i}} \exp \left ({{ \beta ^{\top } x_{j} }}\right)} \tag {8}\end{equation*}
B. Survival Analysis-Informed DNNs With Collaborative Filtering
We explore integrating survival analysis within DNNs from a holistic perspective. Fully integrated models such as DeepSurv and DeepHit, while powerful in their capabilities, often obscure the interpretability inherent in traditional survival analysis. The intricate, non-linear interactions within these “opaque box” models make the direct influence of specific features on predictions less discernible. Though techniques like Shapley Additive Explanations (SHAP) [37] and feature importance analyses offer some interpretative insights, they fall short compared to the direct interpretability of conventional survival models.
In response, our methodology adopts a hierarchical modeling approach complemented by data augmentation in the hidden layers of DNNs. While survival analysis does not explicitly target our primary objective—predicting the departure time of morning commuters—it remains closely related. The coefficients derived from survival analysis, indicative of the relationship with the departure time, are concatenated to the rest of the inputs via separate hidden layers, thereby enhancing the overall model’s predictive power.
1) Collaborative Filtering Through K-Means Clustering
Central to our approach is the hypothesis that the morning commuter population can be segmented into distinct behavioral classes, capturing significant individual variations. We employ unsupervised learning, specifically K-means clustering, to identify these classes. K-means aims to partition n observations \begin{equation*} \text {argmin}_{\mathbf {S}} \sum _{i=1}^{k} \sum _{\mathbf {x}\in S_{i}} \| \mathbf {x} - \pmb {\mu }_{i} \|^{2} \tag {9}\end{equation*}
2) Derivation of Coefficients
Intuitively, each day contains exogenous characteristics that may affect the departure time choice. These may often be unobserved, such as particular weather conditions, road closures, or the proportion of population unable to commute that day (i.e., due to being sick). To relate the unique population-level departure characteristics of a particular date to an individual’s probability to depart, we derive date-specific coefficients. Specifically, we use the Cox Proportional Hazards model in deriving the partial hazards, linked observed trips’ characteristics, such as average speeds, trip distances and trip durations, to their departure time. This captures how these variables affected travel behavior at a population level on that particular day. For example, if the hazard ratio of trip duration on a given date is less than one, then the model will associate longer trips with later departures (and vice versa). If the hazard ratio of average speed is greater than one, then the model will associate faster trips with earlier departures, and so on.
On the other hand, we also derived partial hazard coefficients based on various temporal dimensions like the day of the week, week of the month, whether or not the day is a holiday, and the cluster label associated with user u. We refer to these as the global coefficients because these hazard ratios are irrespective of the date in consideration and derived using every observation in the dataset. Table 1 presents our estimation of these coefficients with respect to one split of our training data.
The convenient functional form of the partial hazards term in the Cox model allows us to calculate the combined hazard ratio for a set of categorical variables by simply multiplying the hazard ratios that are active in a given observation. For example, for a user u in date t, let
3) DNN Model Architectures
To study how survival analysis-informed data interacts with various DNNs, we implemented our method across a range of model architectures. These include:
Fully Connected (FC) Network: The FC network comprises several dense layers, where each layer is defined as
, with$\text {fc}_{i} = \text {Linear}(n_{i-1}, n_{i})$ representing the number of neurons in layer i. The network processes concatenated categorical and continuous variables, passing them through the layers with ReLU activation functions and dropout for regularization.$n_{i}$ LSTM Network: The LSTM model is designed to capture temporal dependencies in the data, essential for modeling departure times. It consists of an LSTM layer followed by a linear output layer. The LSTM layer processes the input sequence, capturing the temporal dynamics, and the linear layer maps the LSTM’s output to the departure time prediction.
Multi-Layer Perceptron (MLP) Network: We employed an MLP network with two hidden layers, each with 7 neurons. The network uses the ReLU activation function and is trained with the Adam optimizer. The input data is scaled using a MinMaxScaler to ensure all features contribute equally to the model’s learning process.
Figure 1 shows our implementation of survival analysis (SA-) informed DNNs. We separate SA coefficients into distinct hidden layers for several reasons. First, these coefficients capture different effects compared to trip-related features, as they are derived from time-to-event data, which introduces unique information about the temporal dynamics of commuter behavior. Additionally, this separation enables the model to capture interactions between SA-related and trip-related features in later layers, thereby improving the representation of joint effects without conflating disparate input types. Finally, incorporating SA inputs in this manner promotes interpretability by delineating the contribution of survival features to the model’s predictions.
Model architectures. (Top Left) SA-informed FC-DNN; (Top Right) SA-informed LSTM; (Bottom) SA-informed MLP. B denotes batch size,
Model Evaluation
A. Dataset and Data Preprocessing
We use privacy-protected GPS traces collected by an American geospatial data provider, covering the period from January 1st, 2020, to August 1st, 2020, and capturing individuals in the Greater Seattle Area. We preprocess these traces to compress trip-related information, including start and end times, start and end location clusters (represented by a unique identifier rather than coordinates), user ID, and other relevant information. Table 2 shows our preprocessed data variables.
To further eliminate noisy observations, we applied several additional filters. First, we dropped all rows containing any NaN values in the variables listed in Table 2. Outliers were then removed based on trip characteristics: trips longer than 6 hours, covering distances greater than 60,000 meters, or averaging speeds faster than 100 m/s were excluded.
Additionally, we applied time-based and location-based filtering criteria to solely capture home-to-work commuting trips observed in the mobile data. Specifically, we restricted the analysis to trips occurring between 3am and 4pm on weekdays, excluding weekends. Additionally, we filtered trips based on their departure and arrival locations, requiring them to start at an individual’s home (defined as the most frequently observed “startCluster” between 10pm and 4am) and end at their workplace (the most frequently observed “endCluster” between 10am and 6pm). These criteria aimed to ensure that the identified trips accurately represent typical commuting behavior while accommodating a variety of commuting schedules.
To evaluate the accuracy of a given model, we aggregate the commute departure time into 30-minute bins. This lets us use a more straightforward percent accuracy metric rather than classical regression metrics like RMSE, which are less interpretable. Additionally, it allows us to compare architectures like MLP to regression-based models like LSTMs and the FC DNN.
B. Results
1) K-Means Clustering
We cluster individuals based on both temporal and trip-specific variables associated with their observed home-to-work trips in order to capture differences in when people commute and the modes of transportation they use. Specifically, we utilize the following variables from Table 2: DoW, Day, WoM, Holiday, tripDuration, tripDistance. Note that we do clustering only using training data, as including the test set would risk data leakage, potentially leading to overly optimistic performance estimates by allowing information from the test set to influence the model during training.
Using the elbow method, we find that
KDE plots of cluster metrics, scaled by the number of observations (all densities sum to 1): (Left) Departure time in training data; (Center) Trip distance; (Right) Day of the month.
2) Survival Analysis
In addition to the coefficients shown in Table 1, analyzing survival curves with respect to particular covariates reveals interesting insights. The curves shown in Figure 3, for example, tell us that:
On weekdays except Monday, users in Cluster 1 tend to leave for work later than the average day
On the other hand, users not in Cluster 1 tend to leave for work earlier than average on Mondays
Users who embark on distant trips tend to leave earlier
Users who embark on short trips (in duration) tend to leave earlier
Partial Effects on Outcome: Survival curves as functions of categorical variables. (Left) Global Temporal Variables (Middle and Right) Trip-specific Variables.
While the last two observations may appear contradictory, we hypothesize that the difference can be explained by variations in the mode of transportation. Specifically, in the GPS data collected in the City of Seattle, it is possible that people traveling longer distances are more likely to use faster modes of transportation, such as cars, which allow them to reach their destinations more quickly compared to those using public transit or walking. However, further analysis would be needed to confirm whether this pattern consistently holds across different contexts and datasets.
3) Benchmarking
In our experimental design, we evaluated three distinct variations of each DNN architecture: one utilizing only the coefficients from survival analysis (SA coefficients only), another leveraging solely the original dataset features (Original inputs only), and a third integrating both sets of predictors through our novel data augmentation approach (SA-informed). The purpose of this setup was to assess the extent to which the augmented model architectures improve predictive power. We use the best-performing model parameters found in our sensitivity analysis (Section IV-B.4). We computed the average prediction accuracy across three randomly chosen training/testing splits. The results, as displayed in Table 3, provide insights into the efficacy of our data augmentation method in comparison to traditional input specifications.
Our analysis reveals a consistent pattern across all DNN architectures: the SA-informed models demonstrate superior performance compared to the other two specifications. This superiority, marked by higher average accuracy, suggests that survival analysis coefficients provide crucial insights into the temporal dynamics of commuter behavior, which are not entirely captured by the original input features alone. These coefficients likely encapsulate subtle patterns and relationships in the data that are otherwise difficult to discern. By integrating these coefficients into our DNN models, we effectively bridge the gap between deep learning’s predictive power and the nuanced understanding offered by survival analysis.
We also note that across the three models, the MLP (though being a simpler model) provides the best empirical results. This highlights the potential of overfitting to the training data when working with complex models like the LSTM and FC-DNN networks. To avoid overfitting, we suggest using early stopping criteria on a validation set during training. Nevertheless, known for their ability to capture sequential dependencies and complex non-linear relationships, the LSTM and FC-DNN still benefit significantly from this integration, as evidenced by the results. The improved performance of the SA-informed specification highlights the potential of combining traditional statistical methods with modern machine learning techniques to enhance predictive accuracy in complex real-world scenarios like commuter behavior analysis.
4) Sensitivity Analysis
We perform an exhaustive grid-search to explore the parameter space before benchmarking the SA-informed architectures with baselines. For all models, we vary the optimizer learning rate across three orders of magnitude (0.0001, 0.001, 0.01). For the FC-DNN and LSTM networks, we try variants with batch sizes of 10, 50, and 100, while for the MLP, we test the weight decay parameter across three orders of magnitude (0.0001, 0.001, 0.01).
Figure 4 shows the result of the sensitivity analysis. First, we find that a higher batch size during training leads to better average test accuracy for the FC-DNN and LSTM networks. We also observe that across all models, a higher learning rate (i.e., 0.01) tends to result in higher accuracy. Finally, the MLP architecture benefits from a larger weight decay parameter
Future Work and Discussion
In contrast to conventional amalgamations of deep learning with survival analysis, which predominantly emphasize binary prediction objectives, our research diverges to explore a domain less ventured. Our focus extends beyond the dichotomous outcomes characteristic of many survival analysis applications. We aim to predict the departure times of users, adopting a more granular and continuous perspective. This approach is particularly pertinent in the context of transportation systems, where our primary interest is not in pinpointing the exact departure time of individual users, but rather in estimating the distribution of departures within specific time windows across a population.
The nuanced nature of our objective necessitates a departure from traditional survival analysis metrics. While the survival probability output of our models provides valuable insights, it requires additional processing to translate into actionable predictions. Specifically, we need to develop methodologies for effectively using these probabilistic outputs to demarcate distinct departure time windows for users. This approach will enable us to capture the dynamic nature of commuter behavior within the broader structure of urban transportation systems.
Moreover, the concordance index (c-index), widely regarded as the benchmark evaluation metric in survival analysis, does not perfectly align with the intricacies of our specific objectives. The c-index, while adept at measuring the model’s ability to correctly rank event times, does not directly address the accuracy of predicting continuous departure times within specified windows. Future work will involve exploring or developing more appropriate metrics that can more accurately assess the performance of our models in the context of continuous time prediction.
In conclusion, our findings not only validate the proposed data augmentation method but also open avenues for further exploration into hybrid modeling techniques that synergize the strengths of diverse analytical approaches. This integration promises to bring forth more sophisticated and accurate predictive models, especially in fields where understanding temporal dynamics is crucial.
A key avenue for future research lies in integrating sociodemographic variables into our framework. By incorporating factors such as age, gender, income level, and employment status, we can enhance the predictive capacity of our model. Such integration could provide deeper insights into how different demographic segments exhibit varying patterns in departure times, potentially revealing underlying trends influenced by socio-economic status or lifestyle choices. This expansion would not only refine the model’s accuracy but also contribute to a more comprehensive understanding of urban mobility patterns, aiding in the development of more targeted and effective transportation policies and infrastructure improvements.
Data Access Statement
The authors note that only Ekin Ugurel had access to the privacy-protected GPS dataset, from which trip-related variables used in the training data were derived. Data supporting this study cannot be made available due to privacy-preserving research agreements with the data provider.
ACKNOWLEDGMENT
The authors would like to thank the anonymous reviewer for their valuable feedback, which significantly improved the quality of this manuscript.