Introduction
Near
The first step in resolving the PM
When dealing with multidimensional time series data, machine learning (ML) methods often perform better. Particularly for the PM
Recently, deep learning (DL) models have gained popularity and have demonstrated impressive results in various applications, such as pattern recognition [23], [24], remote sensing images [25], and built-up land expansion [26]. Advanced versions of recurrent neural networks (RNNs) or combinations of convolutional neural networks (CNNs) are frequently used for PM
A. Related Works
In recent decades, researchers have predominantly utilized ML models for PM
Recently, numerous DL models have become more popular in time-series prediction applications. Zhang et al. [33] presented a DL system for forecasting PM
Chen et al. [41] proposed a physical model for estimating PM
A summary comparison of recently developed DL models is presented in Table I. The table provides an overview of these models in terms of the year of development, three comparison metrics, prediction of PM
B. Contributions of the Article
The previously described models are unable to extract deep features from PM
The BiLSTM module is used to solve vanishing gradient problems and to integrate the relationship between pollutant and meteorological data in the initial feature investigation. The vanishing gradient problem refers to the difficulty of RNNs propagating information over long sequences due to the diminishing gradients during backpropagation. This issue can hamper the network's capability to capture long-term dependencies within the data. BiLSTM addresses this issue by employing gates. A standard LSTM consists of three types of gates: the forget gate, the input gate, and the output gate. The forget gate decides whether to keep or discard information from the previous time step. The input gate decides which new information should be added to the cell state, and the output gate decides which information gets transferred to the next step. In a BiLSTM, there are separate sets of gates for the forward LSTM and the backward LSTM. When we compute gradients during backpropagation in a BiLSTM, it involves gradients from both directions. If one direction experiences vanishing gradients for a particular gate, the other direction might have nonvanishing gradients, which can help prevent complete information loss. Therefore, the bidirectional nature of a BiLSTM makes it less likely for both directions to simultaneously suffer from severe vanishing gradients. Similar to the forget gate, the input and output gates in a BiLSTM also have bidirectional counterparts. This bidirectional gating allows information to flow from both directions, potentially reducing the impact of vanishing gradients.
BiLSTM can also effectively integrate the relationship between pollutant and meteorological data by considering temporal dependencies and contextual information. By inputting both the pollutant and meteorological data into the BiLSTM model, the network can learn the complex relationships and patterns between them. The sequential nature of BiLSTM allows it to capture dependencies over time, enabling the model to understand how meteorological factors affect pollutant levels at different time points.
The bidirectional gated recurrent units (BiGRU) module not only mitigates the vanishing gradient problem but also extracts deeper features from the preliminary features. In a standard gated recurrent unit (GRU), there are reset and update gates that regulate the information flow within the network. In a BiGRU, there are two sets of these gates: one for the forward GRU and one for the backward GRU. When computing gradients during backpropagation in a GRU, the gradients are influenced by both directions. If one direction suffers from vanishing gradients for a particular gate, the other direction may exhibit a nonvanishing gradient for that gate, helping to prevent complete information loss. This bidirectional aspect can make it less likely for both directions to simultaneously encounter severe vanishing gradients.
“Preliminary features” refer to the initial set of features that act as the initial point of reference for the BiGRU module. These preliminary features are derived from the output of the BiLSTM modules. The selection of preliminary features is typically based on the hidden patterns found in the historical pollution data. These features capture relevant information and characteristics from the data that are deemed important for further analysis. Subsequently, the BiGRU module processes these preliminary features and extracts deep features from them. Deep features represent higher level representations of the data, capturing more complex patterns and relationships.
The FC layer is utilized to create an accurate long-term forecast of PM
concentrations across six monitoring locations. We are forecasting the next 180 h of lead time. We have measured the accuracy in terms of RMSE, MAE, and R$_{2.5}$ . The FC layer is a type of ANN layer where each neuron is linked to all neurons in the preceding layer. It is commonly used for mapping high-level features extracted from the preceding layers to the desired output. Here, the FC layer takes the extracted features from the preceding layers as input and performs computations to generate predictions of PM$^{2}$ concentrations.$_{2.5}$ By utilizing the FC layer, the model can capture complex relationships and patterns in the input data to create accurate long-term forecasts. The layer learns to combine the extracted features from earlier layers and map them to the desired output, which is the prediction of PM
concentrations at various monitoring locations. In addition, the experimental outcomes demonstrate the superior performance of our proposed model compared with several other well-known DL models.$_{2.5}$ In the spirit of transparent research, we make available the code used to produce the findings of this work on the https://github.com/Prasanjit-Dey/CombineDeepNet.
In this work, the name “CombineDeepNet” signifies the combining aspect of the model. The model incorporates three modules: BiLSTM, BiGRU, and FC layers. The BiLSTM module takes the pollutants and meteorological data as input and extracts initial features. The BiGRU module then takes these initial features as input and further extracts deep features. Finally, the FC layers utilize the deep features to make predictions. The combining aspect of our framework lies in the integration of these three architectural components for long-term prediction. By combining BiLSTM, BiGRU, and FC layers, we leverage the capabilities of these modules to enhance predictive performance. Therefore, we named our framework “CombineDeepNet” to reflect its combining nature.
The rest of this article is organized as follows. Section II includes the study area and dataset analysis. The proposed method is described in Section III. It includes the CombineDeepNet framework. Section IV incorporates the results of the proposed model and five other popular DL models. Section V analyzes the discussion of the CombineDeepNet model and the other five models. Finally, Section VI concludes this article.
Study Area and Dataset Analysis
A. Study Area
The six urban and suburban monitoring stations of China, namely, Aoti Zhongxin, Dongsi, Shunyicheng, Tiantan, Haidan Wanliu, and Wanshou Xigong, are selected as study areas. Table II shows the detailed description of six monitoring stations along with latitude and longitude details. Similarly, Fig. 1 depicts the physical location of the research region. These stations are in highly urbanized and industrialized regions. We have chosen the monitoring stations in these areas as our main study focus. We observe that there is still significant room for improvement in overall air quality in these chosen areas when compared with international standards. As a consequence, precise prediction of PM
Distribution of PM
B. Dataset Analysis
1) Data Description
In this research, our dataset contains past pollution levels and meteorological parameters collected from six monitoring stations (Aoti Zhongxin, Dongsi, Shunyicheng, Tiantan, Haidan Wanliu, and Wanshou Xigong) in China between March 1, 2013 and February 28, 2017. It contains 35 064 h of data for each monitoring site. The dataset contains 12 pollutant and meteorological variables, such as PM
2) Correlation of PM$_{2.5}$ With Other Parameters
We calculated the correlation coefficient between pollutants and meteorological parameters for six monitoring stations: (a) Aoti Zhongxin, (b) Dongsi, (c) Shunyicheng, (d) Tiantan, (e) Haidan Wanliu, and (f) Wanshou Xigong, as shown in Fig. 2. It shows that PM
Correlation coefficient between pollutants and meteorological parameters for six monitoring stations: (a) Aoti Zhongxin, (b) Dongsi, (c) Shunyicheng, (d) Tiantan, (e) Haidan Wanliu, and (f) Wanshou Xigong.
In addition, PM
3) Distribution Characteristic of Data
We identified six monitoring sites: Aoti Zhongxin, Dongsi, Shunyicheng, Tiantan, Haidan Wanliu, and Wanshou Xigong as the study targets. These monitoring locations were selected to investigate the properties of air pollution dispersion and meteorological data. Fig. 3 shows the numerical variations in PM
Distribution of PM
After statistical analysis, an average of 63.20% of the PM
Method
A. Overview of Proposed Framework
Fig. 4 depicts the proposed CombineDeepNet framework. The main structure is made up of the three stages listed below.
Frameworks of the CombineDeepNet for PM
1) Data Analysis and Preprocessing
First, some of the data from the six monitoring stations could be lost because of equipment failure, routine maintenance, bad transmission, or other things that cannot be controlled. However, data loss significantly affects the performance of the prediction models. Therefore, replacing the empty or “nan” values is critical to ensuring the model's performance [51]. Using “nan” values in data during DL model training can pose three main challenges as follows:
Data incompatibility: Many DL frameworks and libraries expect numeric data as input. The “nan” is a nonnumeric value, which can lead to compatibility issues when training DL models. It may result in errors or inconsistencies during computations, affecting the training process.
Inconsistent data shape: DL models typically require consistent data shapes for proper training. If “nan” values are present in the dataset, they can introduce inconsistencies in the data shape. This can cause difficulties in batching, vectorization, or applying certain operations, leading to training errors or unexpected behavior.
Gradient computation issues: DL models rely on gradient computations during backpropagation for updating model parameters. The “nan” values can disrupt these computations because gradients for nonnumeric values are undefined. This can introduce errors or inaccuracies during gradient descent optimization, hindering the training process.
To address these challenges, some common strategies for handling empty or “nan” values have been employed in previous research, such as: 1) using the mean value of the previous and next values for the current “nan” value, 2) k-nearest neighbor, etc. In this research, we replace a missing value with the existing data's next value. Next, the Pearson correlation coefficient is applied to the six monitoring stations to assess the degree of correlation between pollutants and meteorological data, which can be utilized for geographical analysis [52]. Fig. 2 displays the correlation coefficients between variables for the six monitoring stations. In addition, Table IV presents the grading tables for the Spearman correlation coefficient, which assesses the strength of the correlation between variables. Both the figures and tables indicate a strong correlation between PM
\begin{equation*}
x_{t}= \frac{x_{t}-\text{min}(x_{t})}{\text{max}(x_{t})-\text{min}(x_{t})}. \tag{1}
\end{equation*}
In this case,
2) Prediction Model
CombineDeepNet is a powerful DL framework that combines the BiLSTM, BiGRU, and FC layers to address the PM
To enhance accuracy and understand the relationships between various variables, CombineDeepNet combines multiple pollutants and meteorological factors to generate more accurate predictions and identify meaningful relationships between the different variables. The training method employs a combination of input data and target results. In this scenario, the input data comprises air pollutants and meteorological parameters collected from six monitoring stations over the past
The highest concentration of PM
While BiLSTMs are well-regarded for their ability to capture long-range dependencies in sequential data, they tend to impose a higher computational and memory demand compared with BiGRUs [53]. BiGRUs, being a simpler variant, offer computational efficiency. In our experiment, we tackled the challenge of making accurate long-term predictions of PM
To regularize the training process, dropout layers are added after each of the BiLSTM and BiGRU layers. Dropout is a strategy for preventing overfitting by randomly removing a particular proportion of neurons during training. It has proven to be a valuable technique for regularization in DL, helping networks generalize better and avoid overfitting. Researchers have found that dropout can lead to improved performance and generalization on a wide range of tasks [54], [55]. The third module is a FC layer that takes the output from the BiGRU layers and completes the PM
3) Model Evaluation
The last step of the proposed CombineDeepNet framework is model evaluation. The CombineDeepNet model, along with other popular DL models, is evaluated by comparing their predicted values to actual values using various metrics. These metrics are then compared to determine whether the proposed model is better at predicting PM
Overall, the combined neural network model considers both air pollutants and meteorological parameters, employing a three-module architecture to extract initial and deep features and make predictions. This approach was developed primarily to forecast PM
B. CombineDeepNet
Initially, air pollutants and meteorological parameters from six individual monitoring sites are fed into both the forward and backward directions of an LSTM model in chronological order using the time-series notation
\begin{align*}
i_{t} =& \sigma \left(W_{i} \left[ h_{t-1}, x_{t} \right] + b_{i} \right) \tag{2}\\
f_{t} =& \sigma \left(W_{f} \left[ h_{t-1}, x_{t} \right] + b_{f} \right) \tag{3}\\
O_{t} =& \sigma \left(W_{o} \left[ h_{t-1}, x_{t} \right] + b_{o} \right) \tag{4}
\end{align*}
The following are the equations for the candidate cell state, hidden cell state, and final output:
\begin{align*}
\tilde{C_{t}} =& \tanh \left(W_{c} \left[ h_{t-1}, x_{t} \right] + b_{c} \right) \tag{5}\\
C_{t} =& f_{t}*c_{t-1}+i_{t}*\tilde{C_{t}} \tag{6}\\
h_{t}=& O_{t}*\tanh (C_{t}) \tag{7}
\end{align*}
Finally, forward and backward LSTM combines the features to create preliminary features in time series order
\begin{align*}
h^{\xrightarrow {f}}_{t}=& \tanh \left(W^{\xrightarrow {f}}_{h}x_{t} +W^{\xrightarrow {f}}_{h} h^{\xrightarrow {f}}_{t-1}+ b^{\xrightarrow {f}}_{h}\right) \tag{8}\\
h^{\xleftarrow {b}}_{t}=& \tanh \left(W^{\xleftarrow {b}}_{h}x_{t} +W^{\xleftarrow {b}}_{h} h^{\xleftarrow {b}}_{t-1}+ b^{\xleftarrow {b}}_{h}\right) \tag{9}\\
O_{t}=& W^{\xrightarrow {f}} h^{\xrightarrow {f}}_{t}+W^{\xleftarrow {b}} h^{\xleftarrow {b}}_{t}+b_{o} \tag{10}
\end{align*}
Subsequently, the output features
\begin{align*}
z_{t} =& \sigma \left(W_{z} \left[ h_{t-1}, O_{t} \right] + b_{z} \right) \tag{11}\\
r_{t} =& \sigma \left(W_{r} \left[ h_{t-1}, O_{t} \right] + b_{r} \right) \tag{12}\\
\tilde{h_{t}} =& \tanh \left(W_{h}O_{t} + \left(r_{t} \odot h_{t-1} \right)*W_{h} + b_{h} \right) \tag{13}\\
h_{t} =& z_{t} \odot h_{t-1} + (1-z_{t}) \tag{14}
\end{align*}
Next, the forward and backward GRU merge the features and create the deep features in a temporal sequence
\begin{align*}
h^{\xrightarrow {f}}_{t}=& \tanh \left(W^{\xrightarrow {f}}_{h}O_{t} +W^{\xrightarrow {f}}_{h} h^{\xrightarrow {f}}_{t-1}+ b^{\xrightarrow {f}}_{h}\right) \tag{15}\\
h^{\xleftarrow {b}}_{t}=& \tanh \left(W^{\xleftarrow {b}}_{h}O_{t} +W^{\xleftarrow {b}}_{h} h^{\xleftarrow {b}}_{t-1}+ b^{\xleftarrow {b}}_{h}\right) \tag{16}\\
y_{t}=& W^{\xrightarrow {f}} h^{\xrightarrow {f}}_{t}+W^{\xleftarrow {b}} h^{\xleftarrow {b}}_{t}+b_{y}. \tag{17}
\end{align*}
Lastly, the FC layers use the extracted features to forecast the PM
C. Evaluation Metrics
In this study, the CombineDeepNet model was compared against other prediction models using the same dataset. The forecasting accuracy of the CombineDeepNet model was evaluated using the RMSE, MAE, and R
\begin{align*}
\text{RMSE} =& \sqrt{\frac{1}{n}\Sigma _{i=1}^{n}{({x_{i} -y_{i}})^{2}}} \tag{18}\\
\text{MAE}=& \frac{1}{n}\sum _{i=1}^{n}|x_{i}-y_{i}| \tag{19}\\
\text{ R}^{2} =& \frac{\sum _{i=1}^{n}(x_{i}-\overline{x})(y_{i}-\overline{y})}{\sqrt{\sum _{i=1}^{n}(x_{i}-\overline{x})^{2}}\sqrt{\sum _{i=1}^{n}(y_{i}-\overline{y})^{2}}} \tag{20}
\end{align*}
Results
A set of tests was carried out to assess the efficacy of the proposed model. The test dataset from six monitoring stations was employed in the CombineDeepNet model to observe the forecasting accuracy. The same dataset was then applied to five popular DL models (CNN, LSTM, GRU, CNN-LSTM, and CNN-GRU), and their prediction accuracy was recorded. Lastly, the CombineDeepNet model's prediction results were compared with those of the five popular DL models, confirming the model's efficacy. The Supplementary Material includes the results from comparisons with other models.
A. Parameter Setting
In this article, we employed 26 300 h (75%) of data for training the models and 8760 h (25%) of data for model testing. A number of experiments were used to identify the hyperparameter configuration for this research, which resulted in the selection of the ideal hyperparameter set. To assess the prediction effectiveness of the algorithm, a validation set was utilized to calculate the mean squared error (MSE) after each epoch. Then, we selected the optimal model based on the validation set's error. We used the following methodology for the model training time: we set the number of epochs to 100 and tested the trained model against the validation set after each epoch. We changed and saved the model parameters if the forecasting model's MSE on the validation set improved. We ended the training after making several adjustments to the parameters and conducting various experiments, when the forecasting model's performance on the validation set was at its best. Lastly, in this experiment, we employed dropout to prevent the issue of model overfitting.
Also, we carefully addressed the selection of the time lag parameter for prediction tasks. We have addressed the concept of time lag differently. Instead of specifying a fixed time lag, we use a window size of 4 during the training phase. This means that when making predictions, we considered the previous 4 data points (
The general hyperparameters used in this experiment are included in Table V. The layer-by-layer parameters are incorporated into the Supplementary Material. For further reference, we have included the code in the GitHub repository.1
B. Loss Function of the Proposed Model
Fig. 5 shows the training loss versus validation loss during training of the CombineDeepNet model over six monitoring stations, namely, (a) Aoti Zhongxin, (b) Dongsi, (c) Shunyicheng, (d) Tiantan, (e) Haidan Wanliu, and (f) Wanshou Xigong. The blue line in Fig. 5 shows the train loss, and the orange line shows the validation loss. The loss function is an important component of a DL model since it is used to evaluate the efficacy of a trained model on a training sample. The loss function's purpose is to minimize the disparity between the anticipated and real output. The performance of the model improves as the loss function value decreases. However, when a model is trained too closely on the input dataset, it can lead to an overfitting problem. It occurs when a model learns the noise in the data instead of the hidden patterns, which results in poor generalization performance on new data. To overcome overfitting, a loss strategy is used to remove the model's overfitting flaws. This involves adjusting the model's parameters to achieve a balance between minimizing the training loss function and maximizing the validation loss function. During the model training process, the training loss function is calculated during each iteration or epoch to monitor the model's performance on the training data. The validation loss function is then measured after each iteration or epoch to evaluate the model's generalization performance on new data. If the validation loss function starts increasing whereas the training loss function continues to decrease, it is a sign of overfitting, and the model needs to be adjusted to improve its generalization ability.
Train loss versus validation loss of the proposed CombineDeepNet model over six monitoring stations: (a) Aoti Zhongxin, (b) Dongsi, (c) Shunyicheng, (d) Tiantan, (e) Haidan Wanliu, and (f) Wanshou Xigong.
C. Multistep Prediction of PM$_{2.5}$ Concentration Over Six Monitoring Stations
This section summarizes the findings and results of the study. Our main focus in this article is on the long-term, multistep forecasting of PM
Plot of actual PM
Plot of actual PM
Figs. 6 and 7 demonstrate an interesting comparison of the proposed model's efficiency with five popular DL models: (a) CNN, (b) LSTM, (c) GRU, (d) CNN-LSTM, and (e) CNN-GRU. The results show that the CombineDeepNet model excels in terms of generalization for longer lead times compared with the other models. Specifically, the green curve, which represents the forecasting value of the proposed model, closely aligns with the actual test data represented by the red curve, demonstrating a high degree of accuracy. The comparison presented in the figures highlights the significance of the CombineDeepNet model's ability to generalize better than other popular DL models. Generalization plays a pivotal role in evaluating prediction models as it assesses a model's ability to effectively handle new, previously unobserved data. A model that generalizes well can provide more accurate and reliable predictions, which makes it more suited for forecasting PM
D. Comparison With Statistical Result for Long-Term PM$_{2.5}$ Concentration Prediction
To further confirm the validation of the CombineDeepNet model by incorporating three statistical analysis techniques: RMSE, MAE, and R
Fig. 8 presented a comparison of the CombineDeepNet model's effectiveness in terms of RMSE with other models for the prediction of PM
Comparison of the performance of the proposed model with other models in terms of RMSE for the prediction of PM
Similarly, Fig. 9 presented the MAE values of the proposed model and the other five models over the same six monitoring stations. The findings demonstrated that the CombineDeepNet model had lower MAE values than the other five models, indicating its better performance in accurately predicting PM
Comparison of the performance of the proposed model with other models in terms of MAE for the prediction of PM
To further assess the effectiveness of CombineDeepNet, Table VI provides a numerical analysis comparing the CombineDeepNet model with five popular DL models, emphasizing RMSE for forecasting PM
Similarly, Tables VII and VIII presented the numerical values of MAE and R
Thus, the statistical results demonstrate that the CombineDeepNet model achieves promising performance in accurately forecasting PM
E. Comparison Between the Top Three Models: CombineDeepNet, LSTM, and GRU
In this section, we further compare the prediction accuracy of the top three models: our proposed CombineDeepNet, LSTM, and GRU, using box plots. Fig. 10(a)–(f) illustrates the box plots of the RMSE values obtained for CombineDeepNet, LSTM, and GRU models across six monitoring stations: Aoti Zhongxin, Dongsi, Shunyicheng, Tiantan, Haidian Wanliu, and Wanshou Xigong. These plots demonstrate a consistent trend of RMSE values for CombineDeepNet, whereas for LSTM and GRU models, RMSE scores are more widely spread across most of the stations. In addition, the average RMSE values for CombineDeepNet are consistently lower than those for LSTM and GRU, highlighting the superiority of the CombineDeepNet model. Therefore, this model can be effectively applied in complex environments for long-term PM
Box plot of RMSE values for CombineDeepNet, LSTM, and GRU across six monitoring stations: (a) Aoti Zhongxin, (b) Dongsi, (c) Shunyicheng, (d) Tiantan, (e) Haidian Wanliu, and (f) Wanshou Xigong.
F. Comparison With Fitting Curve
Further, the CombineDeepNet model's effectiveness was evaluated based on the degree of fit between the actual and predicted PM
Degree of fit between the actual and predicted values of the proposed model on the test set in the [0–180 h] task for the six monitoring stations: (a) Aoti Zhongxin, (b) Dongsi, (c) Shunyicheng, (d) Tiantan, (e) Haidan Wanliu, and (f) Wanshou Xigong.
Furthermore, we conducted a dispersion comparison analysis for the PM
Therefore, our study presents a promising approach for predicting PM
Discussion
Making long-term predictions of PM
Table VI shows that the proposed model's RMSE values are closely related to the LSTM and GRU models, and outperform the CNN, CNN-LSTM, and CNN-GRU models for long-term lead time for all six monitoring stations. The RMSE values of the proposed model ranged from 6.70 to 23.50
Similarly, Tables VII and VIII demonstrate that the CombineDeepNet models' MAE and R
The experimental findings indicate that the CombineDeepNet is very accurate at forecasting PM
The optimal values for the RMSE, MAE, and R
In this research, we describe the findings of our investigation, in which we selected a consecutive 180-h test dataset and displayed it in Figs. 6 and 7 for Aoti Zhongxin and Wanshou Xigong monitoring stations. Our primary goal was to evaluate the fitting ability of the CombineDeepNet model and verify our supposition that it performs better on the new dataset. The figures also demonstrate that, in the long-term scenario, the CombineDeepNet model fits the new data better than the CNN, LSTM, GRU, CNN-LSTM, and CNN-GRU models. Figs. 6 and 7 depict the variations in prediction accuracy between CNN-LSTM, CNN-GRU, and models such as CombineDeepNet, CNN, LSTM, and GRU concerning short-term and long-term lead times in air quality data. These differences can be attributed to the architectural characteristics and capabilities of these models. CNN-LSTM and CNN-GRU are hybrid models that combine CNN layers for spatial feature extraction with recurrent layers (LSTM or GRU) for temporal sequence modeling. While they have the capacity to capture both short-term and long-term dependencies, the interplay between these layers can sometimes lead to a bias toward short-term patterns. In certain cases, the convolutional layers may overemphasize short-term fluctuations in the data, causing the models to focus excessively on recent data. This can lead to suboptimal performance for short-term predictions compared with models that are specifically designed for short-term tasks.
In addition, Fig. 11 displays the fitting trend of the test dataset. Furthermore, we conducted a dispersion comparison analysis for PM
In light of the experimental findings, we believe that the suggested CombineDeepNet can better forecast long-term PM
Conclusion
The CombineDeepNet model, proposed based on a hybrid architecture and data correlation principles, represents an effective approach for forecasting PM
The experimental findings show that by completely extracting the correlated information between pollutants and meteorological data, the CombineDeepNet model outperforms conventional approaches like CNN, LSTM, GRU, CNN-LSTM, and CNN-GRU. Furthermore, the CombineDeepNet model effectively addresses the challenge of long-term dependencies, which play a crucial role in predicting pollutant hazards. Overall, the CombineDeepNet model provides a promising network for forecasting PM
One limitation of this article is that the information from neighboring locations is not linked to the intended target area. In future work, the correlation between nearby location data and the target location will be incorporated into the input features to improve the long-term prediction of PM