Loading web-font TeX/Caligraphic/Regular
Dynamic Graph Convolutional Networks with Spatiotemporal Missing Pattern Awareness | IEEE Conference Publication | IEEE Xplore

Dynamic Graph Convolutional Networks with Spatiotemporal Missing Pattern Awareness


Abstract:

Missing data is ubiquitous phenomenon in the time series community, significantly challenging forecasting due to incomplete ground truth and sparse data. Most previous Mu...Show More

Abstract:

Missing data is ubiquitous phenomenon in the time series community, significantly challenging forecasting due to incomplete ground truth and sparse data. Most previous Multi-variate Time Series Forecasting with Missing Values (MTSFMV) approaches usually assume static missing patterns, neglecting the dynamic changes over time and space, leading to suboptimal forecasting results. To tackle these challenges, we propose novel STMPANets, which are capable of perceiving time-varying spatiotemporal missing patterns to refine the forecasting sequences. Specifically, we decompose the series into seasonal trend components, allowing STMPANets to highlight inherent sequence properties and adapt to missing patterns. We then propose a Multi-granularity Conditional Partial TCN (MGCPT) to regulate the imputation rate of missing values over time, modeling temporal correlation. Additionally, we design an Adaptive Dynamic GCN (ADGCN) to capture spatial dependencies by perceiving dynamic missing patterns. Extensive experiments demonstrate that STMPANets outperform state-of-the-art models.
Date of Conference: 06-11 April 2025
Date Added to IEEE Xplore: 07 March 2025
ISBN Information:

ISSN Information:

Conference Location: Hyderabad, India

SECTION I.

Introduction

The ongoing rapid deployment of IoT and edge device systems has led to the generation of a large amount of time series data [1]. Due to incomplete monitoring of sensors, these time series often contain a large number of missing values, making them difficult to directly utilize [2]. Effectively modeling time series with missing values for prediction is a challenging problem [3]. In this context, Multivariate Time Series Forecasting (MTSF) offers a more comprehensive approach by simultaneously considering the interdependencies among multiple time series. However, when missing values are prevalent, traditional MTSF models struggle to capture temporal and spatial dependencies from past to future effectively [4]. To this end, modeling MTSFMV involves two steps: (i) performing imputation around missing values to alleviate data sparsity, and (ii) extracting temporal and spatial dependencies using known and imputed values. Ignoring either of these two steps will lead to degraded performance [5].

Given the limited attention paid to MTSFMV, existing methods can be roughly divided into two categories: imputation models and forecasting models. The former focuses on reconstructing missing values using various techniques, such as bidirectional recurrent units [6], spatial attention [7], and graph neural networks [8] –​[10]. However, they lack secondary modeling of the relationship between known and imputed values, resulting in poor performance when directly applied to MTSFMV. The latter is designed for abundant data and tends to collapse when faced with a large number of missing values [11]. For example, GCN-M [12] proposes a memory network that considers local and global spatiotemporal features and focuses on traffic flow prediction with missing values. However, it does not consider the variation of missing patterns when modeling spatiotemporal correlations and relies on a predefined graph structure, which might lead to suboptimal solutions and lack certain generalization capabilities. BiTGraph [13] proposes a temporal convolutional network with a bias term to incorporate missing patterns into spatiotemporal relationship modeling. However, the missing pattern that changes over time is an uncertain factor, and simply incorporating partial convolution [14] into 1D-CNN will lead to unstable and unrobust interpolation. Additionally, the design of Biased GCN ignores the spatial relationship that changes over time, which might lead to certain limitations when facing scenes with complex spatiotemporal dynamic features.

Inspired by the above observations, we propose Spatiotemporal Missing Pattern Awareness Networks (STMPANets) that explicitly consider time-varying missing patterns when capturing temporal correlations and spatial dependencies. We first decompose the series into seasonal trend components to highlight the inherent properties. Subsequently, we carefully design two key modules: MGCPT and ADGCN. MGCPT performs partial convolution with conditional restrictions to mine temporal correlations of different granularities by adjusting the imputation rate of missing values. ADGCN captures dynamic spatial dependencies by constructing dynamic graphs and incorporating enhanced missing patterns. We also integrate the above modules into a multi-branch hierarchical framework. The main contributions can be summarized as follows:

  • We propose STMPANets, which can simultaneously capture temporal correlation and dynamic spatial dependence of time series forecasting with missing values.

  • We design MGCPT to perceive missing patterns from different time granularities and propose ADGCN to incorporate missing patterns for dynamic feature interaction.

  • Experimental results on three real-world datasets demonstrate the significant improvement of STMPANets over other baselines.

Fig. 1: - 
The overall framework of STMPANets and the detail of Multi-granularity Conditional Partial TCN.
Fig. 1:

The overall framework of STMPANets and the detail of Multi-granularity Conditional Partial TCN.

SECTION II.

Preliminaries

Multivariate Time Series Forecasting. Given a historical observation sequence {\mathcal{X}} \in {{\mathbb{R}}^{N \times C \times P}} of P time steps in history, the model can predict the value {\mathcal{Y}} \in {{\mathbb{R}}^{N \times Q}} of the future Q time steps. N is the number of variables. C is the feature dimension. The goal of MTSF is to establish a mapping relationship between {\mathcal{X}} and {\mathcal{Y}}.

Multivariate Time Series Forecasting with Missing Value. Not all historical observed variables have observed values. We define the mask matrix {\mathcal{M}} = \left( {{m^{(1)}},{m^{(2)}}, \ldots ,{m^{(N)}}} \right) \in {{\mathbb{R}}^{N \times P}} to represent the missing situation of the multivariate time series {\mathcal{X}} \in {{\mathbb{R}}^{N \times C \times P}} with partial observations, where m(n) ∈ {0, 1}P refers to the missing state of the n-th variable. m(n) = 1 indicates that the value is observed, while m(n) = 0 indicates that the value is missing. The goal of MTSFMV is to construct a mapping function between the input \left\{ {{\mathcal{X}},{\mathcal{M}}} \right\} and the output {\mathcal{Y}} \in {{\mathbb{R}}^{N \times Q}}.

SECTION III.

Methodology

The overall STMPANets framework, shown in Fig. 1(a). Details of each model component are described below.

A. Sequence Decomposition

Inspired by the traditional time series decomposition algorithm [15], we introduce a sequence decomposition module to separate the complex patterns of the input sequence to decouple the temporal pattern and highlight the inherent properties of the sequence [16]. Specifically, the input sequence {\mathcal{X}} \in {{\mathbb{R}}^{N \times C \times P}} can be decomposed into seasonal terms and trend terms as follows: \begin{align*} & {{\mathcal{X}}_t} = \operatorname{AvgPool} {(\operatorname{Padding} ({\mathcal{X}}))_{kernel}}\tag{1} \\ & {{\mathcal{X}}_s} = {\mathcal{X}} - {{\mathcal{X}}_t}\tag{2}\end{align*}

View SourceRight-click on figure for MathML and additional features. where the seasonal term {{\mathcal{X}}_s} focuses on seasonal information, and the trend term {{\mathcal{X}}_t} focuses on trend cycle information. AvgPool(•) represents average pooling, which can be padded to keep the sequence length unchanged.

B. Multi-granularity Conditional Partial TCN

Due to TCN having better sequence modeling capabilities than RNN in a variety of time series tasks, in this paper, we introduce the improved TCN as the primary backbone to capture temporal correlation. As illustrated in Fig. 1(b), MGCPT has two main layers: the dilated inception layer and the conditional partial layer. To simplify the description, we omit the superscripts in the following description.

Dilated Inception Layer. We adopt the dilated Inception layer structure proposed in [17], which can obtain a wide receptive field with less computational cost, thereby extracting high-level temporal features. Given an input {\mathcal{X}} \in {{\mathbb{R}}^{N \times C \times {P_i}}} and a one-dimensional convolution kernel {\mathcal{F}} \in {{\mathbb{R}}^{C \times 1 \times K}} of size K, the dilated convolution can be defined as: \begin{equation*}{{\mathcal{X}}^o}(n,c,t) = \sum\limits_{s = 0}^{K - 1} {\mathcal{F}} (c,s)\cdot {\mathcal{X}}(n,c,t - d\cdot s)\tag{3}\end{equation*}

View SourceRight-click on figure for MathML and additional features. where the hidden layer vector {{\mathcal{X}}^o} \in {{\mathbb{R}}^{N \times C \times P_i^o}} and d is the dilation factor. The length of the time dimension P° is determined by the stride and kernel size: P_i^o = \left\lfloor {\frac{{{P_i} - d\cdot (K - 1)}}{1}} \right\rfloor + 1.

Conditional Partial Layer. Motivated by the successful application of partial convolution in computer vision tasks [18] –​[20], we induce conditional partial convolution to model the temporal correlation with partial observations. Unlike ordinary partial convolution [14], we propose only considering aggregating observable information from a certain conditional proportion of neighbors for each time series variable with missing values. Specifically, for a specific time series variable {x^{\left( n \right)}} \in {{\mathbb{R}}^{C \times P_i^o}}, we interpret the missing values as \begin{equation*}{x^\prime } = {\begin{cases} {{{\mathbf{W}}^T}(x \odot {\mathbf{m}})\frac{K}{{\sum {({\mathbf{m}})} }} + {\mathbf{b}},if\frac{{\sum {({\mathbf{m}})} }}{{\sum {({\mathbf{1}})} }} \geq \tau } \\ {{\mathbf{0}},otherwise} \end{cases}}\tag{4}\end{equation*}

View SourceRight-click on figure for MathML and additional features. where ⊙ denotes the Hadamard product, x is the output hidden layer feature map, WT and b are convolution learnable parameters, 1 denotes a constant tensor with value 1. τ indicates the visibility of the mask area at this stage. As the convolution proceeds, the missing pattern {\mathcal{M}} is gradually filled. Specifically, we update the missing pattern as follows: \begin{equation*}{{\mathcal{M}}^\prime }(x) = {\begin{cases} {1,if\frac{{\sum {({\mathbf{m}})} }}{{\sum {({\mathbf{1}})} }} \geq \tau } \\ {0,otherwise} \end{cases}}\tag{5}\end{equation*}
View SourceRight-click on figure for MathML and additional features.

To capture the temporal correlations at multiple granularities hidden in the input sequence, we propose to aggregate multi-granularity convolutions with different scales into the Dilated Inception Layer and the Conditional Partial Layer. Specifically we use kernel sizes of 1 × 2, 1 × 3, 1 × 5 and 1 × 7 to aggregate temporal information of different granularities through maximum pooling. Subsequently, two gate functions, tanh(•) and sigmoid(•), are used to control the amount of information passed to the next module.

C. Adaptive Dynamic GCN

The MGCPT module captures temporal features but ignores dynamic spatial relations between sequences. Unlike existing methods [21] –​[23], we construct an adaptive dynamic graph to learn spatial node relations at each timestamp and incorporate the enhanced missing pattern into the learning process of graph structure to explain missing values.

Adaptive Dynamic Graph Construction. We first initialize two learnable random node embeddings E1, E2 ∈ ℝN×d to represent static node features, as shown in Fig. 2(a). In the absence of a predefined graph structure, we use a Gaussian kernel and dynamic node filter to learn dynamic node relations. This process is described as: \begin{equation*}{\mathcal{G}} = exp\left( { - \frac{{{{\left\| {{E_1} - {E_2}} \right\|}^2}}}{{2{\sigma ^2}}}} \right)\tag{6}\end{equation*}

View SourceRight-click on figure for MathML and additional features. where {\mathcal{G}} \in {{\mathbb{R}}^{N \times N}} is the Gaussian kernel output. {\mathcal{P}} = {\mathcal{G}}/rowsum\left( {\mathcal{G}} \right) \in {{\mathbb{R}}^{N \times N}} is the transition matrix. Next, we convolve the transfer matrix and the time fusion features at each time step through a dynamic filter: {{\mathcal{F}}_t} = \sum\nolimits_{h = 0}^H {{{\mathcal{P}}_h}} {x_t}{{\mathbf{W}}_h} where H is the number of diffusion steps, Wh is the parameter matrix, and xt is the sequence feature map at time t.

In particular, we aim for dynamic changes in one node to affect another, with the learned graph structure being unidirectional. Thus, we fuse random node embeddings and construct the dynamic adjacency matrix {{\mathcal{A}}_t} as follows: \begin{align*} & \hat E_1^t = tanh\left( {\alpha \left( {{{\mathcal{F}}_t}{E_1}} \right)} \right),\hat E_2^t = tanh\left( {\alpha \left( {{{\mathcal{F}}_t}{E_2}} \right)} \right)\tag{7} \\ & {{\mathcal{A}}_t} = ReLU\left( {tanh\left( {\alpha \left( {\hat E_1^t\hat E_2^{tT} - \hat E_2^t\hat E_1^{tT}} \right)} \right)} \right)\tag{8}\end{align*}

View SourceRight-click on figure for MathML and additional features. where \hat E_1^t, \hat E_2^t represent fused static and dynamic node embeddings, and {{\mathcal{A}}_t} captures the dynamic node relationships at time t. Note that the adaptive dynamic graph is constructed before the input sequence enters the hierarchy and is continuously updated during spatial structure learning in each layer.

Missing Pattern Enhancement. To enhance the mask matrix, we introduce a self-attention mechanism to better capture missing patterns, as shown in Fig. 2(b). We first compute the self-attention weight matrix {{\mathcal{W}}_{att}}: \begin{equation*}{{\mathcal{W}}_{att}} = softmax\left( {\frac{{{\mathbf{Q}}{{\mathbf{K}}^T}}}{{\sqrt {{d_k}} }}} \right){\mathbf{V}}\tag{9}\end{equation*}

View SourceRight-click on figure for MathML and additional features. where {\mathbf{Q}} = {{\mathcal{X}}^{(l)}}{{\mathbf{W}}_Q}, {\mathbf{K}} = {{\mathcal{X}}^{(l)}}{{\mathbf{W}}_K}, {\mathbf{V}} = {{\mathcal{X}}^{(l)}}{{\mathbf{W}}_V}, WQ, WK, WV are trainable matrices, and dk is the feature dimension. The missing pattern is then updated as: \begin{equation*}{{\mathcal{M}}^{\prime \prime (l)}} = sigmoid\left( {{{\mathcal{M}}^{\prime (l)}} \oplus {{\mathcal{W}}_{att}} \odot {{\mathcal{M}}^{\prime (l)}}} \right)\tag{10}\end{equation*}
View SourceRight-click on figure for MathML and additional features.
where ⊕ is the element-wise addition. Intuitively, the extraction of dynamic spatial node relationships should not only consider the changes in the missing pattern, but also combine the temporal features of the input sequence. Therefore, we choose the inner product to quantify the temporal feature map: {{\mathcal{S}}^{(l)}} = softmax\left( {{{\mathcal{X}}^{(l)}}{{\mathcal{X}}^{(l)T}}} \right). Finally, we propose to learn dynamic adjacency matrix as: \begin{equation*}{\mathcal{A}}_t^{\prime (l)} = {\mathcal{A}}_t^{(l)} \oplus \beta sigmoid\left( {{{\mathcal{S}}^{(l)}} \odot {{\mathcal{M}}^{\prime \prime (l)}}{{\mathcal{M}}^{\prime \prime (l)T}}} \right)\tag{11}\end{equation*}
View SourceRight-click on figure for MathML and additional features.
where {\mathcal{A}}_t^{\left( l \right)} \in {{\mathbb{R}}^{N \times N}} is the dynamic node relationship at the l layer and time t. β is a global learnable parameter controlling the correction strength by incorporating missing patterns and temporal dynamics during message passing.

Fig. 2: - 
The detail of Adaptive Dynamic GCN.
Fig. 2:

The detail of Adaptive Dynamic GCN.

Dynamic Feature Interaction. Combined with the updated dynamic adjacency matrix, we apply graph convolution to perform dynamic feature interaction: \begin{equation*}{{\mathcal{X}}^{(l + 1)}} = \left( {I + {\mathcal{D}}_o^{ - 1}{{\mathcal{A}}^{\prime (l)}} + {\mathcal{D}}_i^{ - 1}{{\mathcal{A}}^{\prime (l)T}}} \right){{\mathcal{X}}^{\prime (l)}}{{\mathbf{W}}^{(l)}} + {b^{(l)}}\tag{12}\end{equation*}

View SourceRight-click on figure for MathML and additional features. where I is the identity matrix, {\mathcal{D}}_o^{ - 1} and {\mathcal{D}}_i^{ - 1} are the out-degree matrix and in-degree matrix.

D. Loss and Training

To train our model, we select Mean Absolute Error (MAE) as the training object of our proposed method, and the loss function can be defined as: \begin{equation*}{\mathcal{L}}\left( {{\mathcal{Y}},\widehat {\mathcal{Y}},{{\mathcal{M}}_{l:l + Q}}} \right) = \frac{{\sum\nolimits_{h = l}^{t + Q - 1} {\sum\nolimits_{n = 1}^N {m_h^{(n)}} } \odot \left| {y_h^{(n)} - \hat y_h^{(n)}} \right|}}{{\sum\nolimits_{h = l}^{t + Q - 1} {\sum\nolimits_{n = 1}^N {m_h^{(n)}} } }}\tag{13}\end{equation*}

View SourceRight-click on figure for MathML and additional features. which measures the error between the predicted values and ground truths.

SECTION IV.

Experiment

A. Experimental Setting

Datasets. To evaluate the proposed STMPANets, we conduct extensive experiments on three popular real-world datasets from different domains: PEMS-BAY [24], Weather [15] and BeijingAir [6]. We randomly discard data according to the missing rate r, which ranged from 0.2 to 0.8.

Baselines. We compare STMPANets with several classical imputation-forecasting methods and the latest state-of-the-art baselines, including BRTIS [6], SPIN [7], GRIN [8], GCN-M [12], AGCRN [25], MTGNN [17], Autoformer [15], GinAR [26] and BiTGraph [13]. We fill the missing parts with zeros for models requiring complete input for prediction.

TABLE I: Performance comparison of different methods.
Table I:- 
Performance comparison of different methods.
Fig. 3: - 
The ablation study of STMPANets.
Fig. 3:

The ablation study of STMPANets.

Configurations and Evaluation Metrics. The datasets are split into training, validation, and test sets in a 6:2:2 ratio. All methods use a historical window P = 24 and a forecasting horizon Q = 24. The learning rate is set to 0.001, and the batch size is 32. Mean Absolute Error (MAE) and Root Mean Square Error (RMSE) are used as evaluation metrics, with lower values indicating better performance.

B. Experimenmtal Results

TABLE I shows the forecasting performance of STMPANets and other baselines across three datasets at missing rates of 0.2 and 0.8. The results indicate that: (i) SPIN and GRIN excel in handling time series imputation but are limited by insufficient modeling of the relationship between observed and missing data. (ii) In multivariate time series models explicitly considering missing values, GinAR may experience error accumulation due to variable missing, while BiTGraph, despite its bias correction for missing patterns, might overlook time-varying spatial relationships. (iii) Our proposed STMPANets outperforms all other models, achieving nearly an 8.92% improvement over the best baseline, thanks to its ability to integrate spatiotemporal missing patterns and capture temporal and spatial dependencies. Notably, its performance gain is more pronounced at a missing rate of 0.8.

Fig. 4: - 
The hyperparameter sensitivity of STMPANets.
Fig. 4:

The hyperparameter sensitivity of STMPANets.

C. Ablation Study

To validate the effectiveness of the key components in STMPANets, we conduct ablation experiments. We design four model variants: (i) w/o DIL: removing the Dilated Inception Layer, (ii) w/o CPL: removing the Conditional Partial Layer, (iii) w/o MPE: replacing the enhanced mask with the original mask and (iv) w/o DFI: replacing the updated dynamic adjacency matrix with the original matrix. As shown in Fig. 3 ,it is evident that all key components of STMPANets play crucial roles. The most promising components are Conditional Partial Layer and Missing Pattern Enhancement, suggesting that incorporating missing patterns into spatiotemporal modeling effectively captures sequence dependencies. Meanwhile, the introduction of a dynamic graph effectively captures dynamic spatial relationships.

D. Hyperparameter Sensitivity

We investigate the hyperparameter sensitivity of STM-PANets on the PEMS-BAY dataset, as shown in Fig. 4. The results indicate that the optimal performance is achieved with 3 layers, as too many layers may lead to overfitting, while too few may fail to capture complex spatiotemporal relationships. In particular, τ controls the size of the observable masked area and is highly sensitive to model performance, while setting τ = 0.5 yields the best imputation results. This is because the smaller τ is, the larger the area with missing values observed by the partial convolution, in other words, the more likely it is to focus on the local area of the predicted part at this stage. When the value of τ is large, it means that more mask areas are retained to the next level for filling. The similar values of the learnable β across different missing rates suggest that β is independent of the missing rate and functions to control the correction strength.

SECTION V.

Conclusion

In this paper, we propose a novel spatiotemporal missing pattern awareness network that simultaneously captures temporal correlations and dynamic spatial dependencies for time series forecasting with missing values. The STMPANets framework starts from the sequence decomposition perspective and combines the carefully designed MGCPT and ADGCN modules to perceive missing patterns along the time dimension and spatial dimension respectively. Extensive experiments on three real-world datasets demonstrate its superior performance across various missing value scenarios. In the future, we will improve the applicability of STMPANets to identify more complex dynamic missing value patterns in time series data.

References

References is not available for this document.