Journals & Magazines >IEEE Access >Volume: 9

Deep Spatiotemporal Attention Network for Fine Particle Matter 2.5 Concentration Prediction With Causality Analysis

Proposed model architecture.

Abstract:

The increasing concentration of air pollutants, caused by industrialization and economic growth, is adversely affecting public health. Therefore, accurately measuring and...Show More

Metadata

Abstract:

The increasing concentration of air pollutants, caused by industrialization and economic growth, is adversely affecting public health. Therefore, accurately measuring and predicting air pollution has been an important societal issue. With the era of big data and the development of artificial intelligence technologies, air pollution concentration is now being measured and recorded in real-time using different sensors. Studies have attempted to predict air pollution concentration using deep learning-based spatiotemporal prediction. This, in turn, is based on distance networks. In these studies, the distance network used to predict air pollution simply reflects the distance. However, since air pollutants cannot move over high mountain ranges and move according to the wind, the station network should include the effect of terrain and the wind direction. Previous studies do not consider these effects. To overcome these limitations, this study proposes a novel station network that combines distance and causality networks based on transfer entropy. To evaluate the performance of the proposed method, out-of-sample experiments with an hourly dataset are performed from January 2017 to October 2020 using information from 186 stations in the Republic of Korea. The results suggest that the proposed method showed state-of-the-art performance compared to existing distance-based algorithms.

Proposed model architecture.

Published in: IEEE Access ( Volume: 9)

Page(s): 73230 - 73239

Date of Publication: 17 May 2021

Electronic ISSN: 2169-3536

DOI: 10.1109/ACCESS.2021.3080828

Contents

CCBY - IEEE is not the copyright holder of this material. Please follow the instructions via https://creativecommons.org/licenses/by/4.0/ to obtain full-text articles and stipulations in the API documentation.

SECTION I.

Introduction

Air pollution is increasing due to rapid industrialization and economic growth [1]. The increase in air pollutant concentration is driven by many causes: the increasing usage of fossil fuel [2], the growth of motor vehicles [3], the growth of global aviation activities [4], shipping [5], [6], and construction [7], [8]. Recently, cryptocurrency mining has been criticized for its carbon dioxide emissions [9].

Higher air pollution concentration adversely affects health [10], and in turn, increases the risk of brain [11], heart [10], and skin diseases [12]. Air pollutants even perturb regulatory immune responses [13]. Accordingly, the Organization for Economic Cooperation and Development (OECD) countries have implemented environmental regulations to reduce fine dust [14]. For example, the Clean Air Act by the Republic of Korea focused on Seoul and its surroundings, and significantly reduced Particle Matter (PM) by 9% [15]. Similarly, the air quality index reduced by 35.9% after the Asia-Pacific Economic Cooperation (APEC) meeting and corresponding regulations in China [16].

According to the OECD, in 2020, the Republic of Korea recorded the worst year for mean population exposure to fine particulate matter [17]. Therefore, it has become more and more important to monitor air pollution concentration in real-time. However, the government reported that it was experiencing difficulty in obtaining accurate air pollution data owing to a lack of air pollution monitoring stations [18]. Therefore, the importance of accurately predicting air pollution concentrations is increasing.

Many studies have been conducted to predict air quality. With the era of big data, involving the storage and usage of a large amount of data, and the development of artificial intelligence technologies, air pollution concentration is now being measured and recorded in real-time by different sensors. Studies have attempted to predict pollution concentration using deep learning-based time series prediction. Since air pollution prediction has a time series characteristic, numerous studies have predicted air pollution using time series prediction methods such as the Holt-Winters model, artificial neural network [19], ensemble long short-term memory network (LSTM) model [20], non-linear autoregressive models and support vector machine with air pollutant time-series input [21], long short-term memory-convolutional neural network (LSTM-CNN) method considering time series of meteorological factors and air pollutants [22], and time series ensemble method [18]. In addition, air pollution not only has a time series characteristic but also includes spatial characteristics. Air pollution is influenced by winds, and thus, by the surrounding areas [23]. Because air pollution remains at low altitudes, it has difficulty crossing high mountain ranges; as such, the terrain is an important factor for air pollution prediction [24]. Moreover, air pollutants are influenced by meteorological factors (i.e., wind) [25]. According to recent studies, it is vital to use relevant information based on theory in prediction problems [26], [27]. Therefore, it is important to consider the spatial characteristics while predicting air pollution.

Other studies have examined the spatiotemporal model for predicting air pollution [28], [29] and have proposed a two-part model: 1) a graph convolutional neural network (GCNN)-based spatial modeling, and 2) a LSTM-based temporal modeling. These two models captured spatial information related to air pollution using a distance graph based on the geographic distance between stations. However, this method is limited as the geographic distance does not fully represent the spatial information between stations [24]. In particular, the geographic distance network involves an assumption that two connected stations affect each other. However, in reality, the stations have an asymmetric relationship due to the influence of the wind and terrain. This is because air pollutants cannot pass through elevated terrain due to their weight, and as they move with the wind. To overcome this limitation, this study proposes a novel station network based on distance and causality networks. For causality detection, this study uses transfer entropy, which is mainly used in complex system theory [30]. Transfer entropy measures directed information flow in two time-series. Transfer entropy is used in a variety of causality detections, including the stock market [26], meteorology [31], and neuroscience [32]. A descriptive analysis of the station network suggested here provides supporting evidence on the influence of the western wind and terrain.

To evaluate the performance of the proposed method, this study employs an hourly dataset from January 2017 to October 2020 at 186 stations in the Republic of Korea and compares them to state-of-the-art methods in out-of-sample tests. In addition, to show that the proposed method can provide acceptable performance in the short-, medium-, and long-term prediction horizons, this study evaluates the performance by changing the prediction horizon from 1 h to 6 h, and to 24 h. Moreover, this study predicts specific air pollution, fine particulate matter 2.5 (PM2.5), which exhibits the most prevalent prediction problem in air pollution in its characteristics and in health studies [11]. The results indicate that in all metrics, including the index of agreement (IA), root means squared error (RMSE), and mean absolute error (MAE), the proposed method demonstrates superior performance compared to the other state-of-the-art algorithms. Especially, the proposed method outperforms the distance network model, which is used in the state-of-the-art algorithms in the literature, and the causality network model. Besides, the causality network model outperforms the distance network model. The results suggest that the causality network reflects geographic information better than the distance network, and that the causality network alone may suffer from spurious causality problems.

This study offers the following contributions. First, to address the problems of predicting air pollution, this study uses a theoretical approach to construct a network and achieves state-of-the-art performance by considering causality. This study suggests that while state-of-the-art algorithms mainly develop deep learning methodologies, theory-based approaches such as network reconfiguration are also important. Second, in solving problems important to society, this study successfully combines the theory of complex systems with deep learning and shows excellent performance. Third, to the best of my knowledge, this study is the first to build a spatial network that reflects terrain or wind direction.

The remainder of this paper is organized as follows: Section 2 undertakes a literature review on air pollution prediction. Section 3 describes the datasets, temporal modeling, spatial modeling, and experimental setting. Section 4 describes the experimental results. Section 5 presents the study’s conclusions and outlines future research directions.

SECTION II.

Key Related Papers

Air pollution issues constantly keep emerging owing to industrialization and economic growth. Air pollutants consist of carbon monoxide, lead, nitrogen dioxide, ozone, fine particles, and sulfur dioxide. Air pollutants, especially, fine particulate matter (PM), have adverse effects on health. They increase the risk of stroke [11], cardiovascular diseases in the elderly [10], dermatitis [12], and mortality [33]. Air pollutants also weaken the immune system [13].

PM refers to dust that floats in the air and is invisible. It consists of ionic components such as nitrate (NO3−), ammonium ion (NH4+), and sulfate (SO42−), and carbon and metal compounds. PMs are classified into PM10 and PM2.5 based on their diameter. PM10 are particles with diameters less than 10 nm, and PM2.5 are particles with diameters less than 2.5 nm. PM causes heart attack, stroke-related diseases, and respiratory symptoms such as asthma [34], [35]. Therefore, air pollution prediction is important as it can warn people and highlight the need for precautions.

Thus, numerous studies have predicted air pollution. Several studies have been based on deep learning as this technology developed. For example, Qin et al. [22] used a convolutional neural network (CNN) as feature extraction for meteorological factors and air pollution concentration, and LSTM as a time series prediction for PM10. The authors achieved the best performance with CNN and LSTM compared to other models. Xayasouk et al. [36] suggested a deep learning model that combined LSTM and a deep autoencoder model with historical meteorological and air pollution data. However, these studies do not consider the geographic factors of air pollution.

Some studies have examined the spatiotemporal model for predicting air pollution [28], [29]. Ge et al. [29] suggested a multiscale spatiotemporal graph convolutional network based on multiscale and spatiotemporal blocks. The proposed block is a representation of two graphs: distance adjacency and similarity graphs. A spatiotemporal block consists of a graph CNN and a temporal convolutional network. Qi et al. [28] proposed a dual-stage spatiotemporal prediction architecture based on spectral graph CNNs and LSTM. Their proposed spectral graph CNNs were based on a distance adjacency graph. However, these studies have limitations as distance graphs do not reflect all spatial information [37]. For example, a distance graph simply weights the edges of the graph as a function of the distance between two stations. This assumes that if the distance between the two stations is the same, the effects of each other are the same. However, PM has an asymmetrical structure due to the influence of the wind (western in case of the Republic of Korea) and terrain. That is, a causal relationship exists between stations that the distance graph cannot reflect. Therefore, the study proposes a novel station network to overcome this problem and a deep learning prediction method using it.

SECTION III.

Data and Method

A. Overview

This study proposes a deep learning model that uses the spatiotemporal feature of air pollution and causality analysis to predict future air pollution.

The model flow is as follows. First, the study constructs a station network. The proposed station network combines distance and causality networks. Then, based on the station network, the $n$ most influential stations are selected. The study stacks multiple time series of target and influential stations, and performs this process for all stations. These completed spatiotemporal features are predicted through spatiotemporal modeling. The complete architecture is described in Figure 1.

FIGURE 1.

Proposed model architecture.

Show All

B. Data

The proposed study area is the Republic of Korea. The Republic of Korea has different characteristics from other regions, which make it an interesting study area: First, in the Republic of Korea, as shown by the topographical image in Figure 2, much of the country’s land is mountainous. Second, as the Republic of Korea is densely populated in the metropolitan area, air pollutants are concentrated. Finally, due to the western winds, the Republic of Korea is affected by air pollution generated from China depending on the season [38]. With this particularity, the Korean government has also researched air pollution in cooperation with NASA.¹

FIGURE 2.

Topography image of the Republic of Korea.

Show All

In the Republic of Korea, air quality observation data from ground monitoring stations are openly available from the Korea Environment Corporation.² This study obtained the following observation data: an hourly dataset of PM2.5 from 186 stations located around the country from January 1, 2017, to October 31, 2020. Each station contains 33,599 records. Because the data set is provided with preprocessing, there were no issues such as missing data or outliers. The 186 stations are distributed as shown in Figure 3.

FIGURE 3.

Station distribution in the Republic of Korea.

Show All

Notably, air pollution is intensively distributed in cities with large populations such as Seoul or Busan. Since most cities are on plains, stations are rarely distributed in mountainous areas.

Moreover, this study obtained additional hourly meteorological observation data including wind speed, wind direction, temperature, and relative humidity for the same period from the Korea Meteorological Administration.³ Meteorological observation data are matched to the air quality observation data by finding the nearest station.

Besides air pollutant concentration and metrological observation data, geographic characteristics, including the longitude and latitude of each station, were added to the data. Table 1 shows the summary statistics of the variables.

TABLE 1 Summary Statistics

C. Station Network Construction

This study constructs a novel station network based on distance and causality. The concentration of air pollution is influenced by spatial information such as depth and terrain [24], [39]. If the distance between stations is small, one station’s air pollution concentration can influence the air pollution concentrations of other stations. Therefore, distance partially reflects spatial information of air pollution. Many studies predict future air pollution with a distance network, which has nodes of stations and edges of exponential inverse distance [28], [29], [37]. However, if two stations are divided by high mountains, air pollutants cannot cross mountains because they stay at low altitudes. Therefore, their measurements do not influence each other [24].

In addition, air pollution is influenced by meteorological information such as wind speed and wind direction [40]. Since the Republic of Korea lies between 30 degrees and 60 degrees latitude, it is affected by western winds. Because PM is affected by wind, the two stations have an asymmetric relationship. That is, the station in the west affects the station in the near east, but the inverse relationship is absent. Therefore, the distance network has limitations in that it is symmetric and does not reflect elements such as terrain. Therefore, this study constructs a causality network to reflect this.

For the causality network, this study uses transfer entropy. Transfer entropy is based on Shannon’s information theory approach [41]. The author suggested that entropy is the mean of information in possible outcomes. Here, entropy is defined as in equation (1), where X refers to a process.\begin{equation*} H\left ({X }\right)=-\sum {p\left ({x }\right)\ast \log \left ({p\left ({x }\right) }\right)}\tag{1}\end{equation*} View Source

Meanwhile, conditional entropy is expressed as equation (2), and mutual information is expressed as equation (3), where Y refers to another process.\begin{align*} H\left ({Y\thinspace \vert \thinspace X}\right)=&H\left ({X, Y }\right)-H\left ({X }\right) \tag{2}\\ I\left ({X;Y }\right)=&H\left ({Y }\right)-H\left ({Y\thinspace \vert \thinspace X}\right)\tag{3}\end{align*} View Source

Transfer entropy is directed mutual information like equation (4), and is also expressed as equation (5). $k$ refers to the delay.\begin{align*} {\mathrm {TE}}_{X\to Y}^{\mathrm {k}}=&I\left ({{Y_{t};X_{t-k}}\thinspace \vert \thinspace Y_{t-k}}\right) \tag{4}\\ {\mathrm {TE}}_{X\to Y}^{\mathrm {k}}=&\sum {p\left ({y_{t+1}, y_{t}^{k}, x_{t}^{l} }\right)log\frac {p(y_{t+1}\vert y_{t}^{k}, x_{t}^{l}) }{p(y_{t+1}\vert y_{t}^{k})}}\tag{5}\end{align*} View Source

The hourly air pollution concentration has random characteristics. However, all scales are different and are not stationary [42]. Therefore, to calculate the transfer entropy, normalized PM is used as in equation (6). The parameter $k$ of TE from equation (5) is set to 128, as in previous studies [26], [43].\begin{equation*} normalizedPM_{it} \mathrm { }=\mathrm {log}({\mathrm {PM}}_{\mathrm {it}}/{\mathrm {PM}}_{\mathrm {i}(t-1)})\tag{6}\end{equation*} View Source

Since transfer entropy constitutes a causality network with time-series information, spurious causality may occur [44]. To prevent this, this study calculates the statistical significance. To calculate the p-value, it has the characteristic of process $X$ like equation (7). However, one has to create a surrogate $s$ that does not have a direct relationship to process $Y$ [30]. Meanwhile, since the transfer entropy follows the F-distribution as in equation (8), where N refers to the num- ber of samples. Then, the p-value can be calculated. This study builds a causality network by only using the relationships with a p-value less than a threshold.\begin{align*} \mathrm {p}^{\mathrm {s}}\left ({x_{t}\thinspace \vert \thinspace \mathrm {x}_{\mathrm {t-1}}^{\mathrm {k}}}\right)=&\text {p}(\mathrm {x}_{\mathrm {t}}\mathrm {\vert }\mathrm {x}_{\mathrm {t-1}}^{\mathrm {k}}\mathrm {, }\mathrm {y}_{\mathrm {t-1}}^{\mathrm {l}}) \tag{7}\\ TE_{X^{s}\to Y}\sim&\frac {\aleph ^{2}}{2N}\tag{8}\end{align*} View Source

This study also utilizes a distance network to prevent spurious causality. That is, the station network presented here is a combination of distance and causality networks.

For the distance network, the closer the distance, the closer the edge weight is to one; the farther the distance, the closer the edge weight is to zero.

Haversine distance was used to measure the distances between geolocation points. The haversine distance is defined in equation (9), where $r$ is the radius of the earth, $\emptyset $ is the latitude of the station, and $\lambda $ is the longitude of the station.\begin{align*}&\hspace {-0.6pc}dist\left ({{station}_{1},{station}_{2} }\right) \\&2 \mathrm{rarcsin}\Big(\sqrt{\sin ^{2}\left(\frac{\emptyset_{2}-\emptyset_{1}}{2}\right)+\cos \left(\emptyset_{1}\right) \cos \left(\emptyset_{2}\right) \sin ^{2}\left(\frac{\lambda_{2}-\lambda_{1}}{2}\right)}\\\tag{9}\end{align*} View Source

To construct the distance network, the pairwise distance between sensors was computed and used to build the adjacency matrix using Gaussian kernel as in equation (10) [45]. $W_{ij}$ represents the edge weight between stations $i$ and $j$ , and $\sigma $ is the standard deviation of the distances.\begin{equation*} W_{ij}=\exp \left ({-\frac {dist\left ({point_{i}, point_{j} }\right)^{2}}{\sigma ^{2}} }\right)\tag{10}\end{equation*} View Source

After constructing both distance and causality networks, the intersection of the networks is calculated. For the distance network, only those with an edge weight of 0.85 or higher are chosen. For the causality network, only those with a p-value less than 0.1 are selected. If there are more than $n$ stations with influential relationships from those conditions, the $n$ most influential stations are selected. In this case, the influence is the value of TE. If there are no more than $n$ stations with an influential relationship, all these stations are selected.

In summary, the proposed station network is an intersection of a causality network built with transfer entropy and a distance network built with haversine distance.

Figures 4–7 compare the distance network and the proposed station network. The red marker is the target station, and the blue markers are the three closest stations. Figures 4 and 5 compare the Republic of Korea’s capital Seoul. As shown in Figure 4, the distance network is connected to a nearby station. Then, these stations are evenly distributed around the target station. However, the causality network is also connected to the station network of a relatively long distance. These stations are distributed southwest around the target station. In Seoul, the wind usually blows from the southwest. Thus, one may say that the causality network reflects meteorological factors.

FIGURE 4.

Distance network example in Seoul, the capital city.

Show All

FIGURE 5.

The proposed station network example in Seoul, the capital city.

Show All

FIGURE 6.

Distance network example in the mountain range.

Show All

FIGURE 7.

The proposed station network example in the mountain range.

Show All

Figures 6 and 7 compare the mountain range. As shown in Figure 6, the distance network detects that there are relationships with stations at a distance. However, as shown in Figure 7, the proposed station network detects that there is no relationship for the target station. In the map represented in Figure 6, high mountain ranges exist between the target station and other stations. Therefore, air pollution cannot move due to the wind. That is, it appears that the distance network used in previous studies does not reflect such an effect; however, the station network presented here reflects this effect.

D. The Proposed Approach

The proposed approach is twofold: input tensor processing and spatiotemporal modeling. The input tensor is comprised of two parts: the five historical time series features (PM2.5, temperature, wind speed, wind direction, and humidity) for all stations and the station network. To construct an input tensor, first, this study constructs time-series features of $r$ timesteps for each station. Next, each target station concatenates the features of $n$ influential stations from the proposed station network. Therefore, each station has a spatiotemporal feature ${\mathrm {\in R}}^{(n+1)\times t\times 5}$ . Finally, the spatiotemporal features of all features are concatenated into an input tensor ${\mathrm {\in R}}^{(n+1)\times 186\times t\times 5}$ . The overall process is described in Figure 8.

FIGURE 8.

Input tensor preprocessing.

Show All

The overall spatiotemporal modeling is described in Figure 9. The input tensor first extracts the spatiotemporal feature through a 3D convolutional network (CONV3D) [46]. CONV3D calculates spatial and temporal information with 3D convolution and 3D pooling operations. Mathematically, assuming that $x$ , $y$ , and $z$ are positions on the $j^{\mathrm {th}}$ feature map in the $i^{\mathrm {th}}$ layer, and $p$ , $q$ , and $r$ are indices of the kernel of $m^{\mathrm {th}}$ feature map, the value is $v_{ij}^{xyz}$ , as in equation (11). As in Wen et al. [37], CONV3D has (n +1, 1, t) convolution kernels by considering five time-series features as channels. The reason that the $1^{\mathrm {st}}$ and $3^{\mathrm {rd}}$ convolution kernels of CONV3D are n +1 and t, respectively, is considering the effect for all users and all influential targets, respectively, and to see the convolution effect for all timesteps. The second convolution kernel of CONV3D is one to extract each feature for all 186 stations.\begin{align*}&\hspace {-0.5pc}v_{ij}^{xyz}=max\big(\mathrm {b}_{\mathrm {ij}}+\sum \nolimits _{p=0}^{P_{i}-1} \sum \nolimits _{q=0}^{Q_{i}-1} \sum \nolimits _{r=0}^{R_{i}-1} \\&\times \, {\mathrm {w}_{\mathrm {ijm}}^{pqr}v_{(i-1)m,}^{(x+p)(y+q)(z+r)} \mathrm { }} , \mathrm {0}\big) \\\tag{11}\end{align*} View Source

FIGURE 9.

Spatiotemporal modeling.

Show All

Next, Convolutional LSTM (CONVLSTM2D) calculates spatiotemporal features. CONVLSTM2D calculates spatial and temporal information together by putting convolution into LSTM internal operations. The key equation modified in Convolutional LSTM is in equations (12)–(16). Mathematically, $\mathrm {\chi }_{\mathrm {t}}$ , $C_{t}$ , and $\mathrm {H}_{\mathrm {t}}$ stand for the cell input, cell output, and hidden state at time t, respectively. $i_{t}$ , $f_{t}$ , and $o_{t}$ are input, forget, and output gates, respectively. $\ast $ and $\circ $ operators denoting convolution and element-wise multiplication, respectively. All cell input, cell output, hidden state, input gate, forget gate, and output gate are 3-dimensional, whereas they are 1-dimensional in the LSTM. In addition, in CONVLSTM2D, all matrix products of LSTM have been converted into convolution operations.\begin{align*} i_{t}=&\sigma (\mathrm {W}_{\mathrm {xi}}\ast \mathrm {\chi }_{\mathrm {t}}+\mathrm {W}_{\mathrm {hi}}\mathrm {\ast }\mathrm {H}_{\mathrm {t-1}}+\mathrm {W}_{\mathrm {ci}}\mathrm {\circ }\mathrm {C}_{\mathrm {t-1}}+\mathrm {b}_{\mathrm {i}}) \tag{12}\\ f_{t}=&\sigma (\mathrm {W}_{\mathrm {xf}}\ast \mathrm {\chi }_{\mathrm {t}}+\mathrm {W}_{\mathrm {hf}}\mathrm {\ast }\mathrm {H}_{\mathrm {t-1}}+\mathrm {W}_{\mathrm {cf}}\mathrm {\circ }\mathrm {C}_{\mathrm {t-1}}+\mathrm {b}_{\mathrm {f}}) \tag{13}\\ C_{t}=&f_{t}\circ \mathrm {C}_{\mathrm {t-1}}+i_{t}\mathrm {\circ tanh(}\mathrm {W}_{\mathrm {xc}}\ast \mathrm {\chi }_{\mathrm {t}}+\mathrm {W}_{\mathrm {hc}}\mathrm {\ast }\mathrm {H}_{\mathrm {t-1}}+\mathrm {b}_{\mathrm {c}}) \\ \tag{14}\\ o_{t}=&\sigma (\mathrm {W}_{\mathrm {xo}}\ast \mathrm {\chi }_{\mathrm {t}}+\mathrm {W}_{\mathrm {ho}}\mathrm {\ast }\mathrm {H}_{\mathrm {t-1}}+\mathrm {W}_{\mathrm {co}}\mathrm {\circ }\mathrm {C}_{\mathrm {t-1}}+\mathrm {b}_{\mathrm {o}}) \tag{15}\\ H_{t}=&o_{t}\circ \text {tanh}(\mathrm {C}_{\mathrm {t}})\tag{16}\end{align*} View Source

After two layers of CONVLSTM2D and batch normalization, the two fully connected layers generate an output vector ${\in \textrm {R}}^{186\times 1}$ . This vector is the prediction for 186 stations.

E. Experimental Settings

A historical air pollution dataset from January 2017 to October 2020 for the Republic of Korea was used for the evaluation of the proposed methods. The training period was 70% of the entire dataset period; the validation and test periods were 10% and 20%, respectively. Bayesian parameter search was performed during the validation period and the following reported results were from the test period for the out-of-sample test [47]. The Bayesian optimization keeps track of past validation results to estimate a probabilistic distribution of RMSE mapped to parameters. Bayesian optimization adjusts parameters towards minimizing the RMSE. For Bayesian optimization, this study uses the python package, bayes_opt.⁴ The parameters included learning rate, CONV3D properties, and CONVLSTM properties. The number of neighbor station parameters are optimized in the range of (1, 10), which is consistent with [28], [37]. The filters of CONV3D and CONVLSTM2D are searched in the range of (20, 40, 100, 200), which are consistent with [37]. The number of nodes of the fully connected layer is optimized in the range of (500, 1000, 1500, 2000). The learning rate is (1e-5, 1e-4, 1e-3), and the window size is (16, 32, 64, 128, 256, 512). Adam was used as an optimizer software [48]. The final optimized parameters are listed in Table 2.

TABLE 2 Optimized Parameters

For the evaluation, IA, RMSE, and MAE were used. They are described in equations (17)–(19), where y represents the true data and $\hat {y}$ represents the predicted data. In all metrics, the average of the error values of 186 stations was reported.\begin{align*} IA=&1-\frac {\sum \nolimits _{1}^{n} \left \|{ \hat {y_{i}}-y_{i} }\right \| ^{2}}{\sum \nolimits _{1}^{n} \left ({\left \|{ \bar {y}-y_{i} }\right \|+\left \|{ \bar {y}-y_{i} }\right \| }\right)^{2}} \tag{17}\\ RMSE=&\sqrt {\frac {1}{n}\sum \nolimits _{1}^{n} \left ({\hat {y_{i}}-y_{i} }\right)^{2} } \tag{18}\\ MAE=&\frac {1}{n}\sum \nolimits _{1}^{n} \left \|{ \hat {y_{i}}-y_{i} }\right \|\tag{19}\end{align*} View Source

To show that the proposed algorithm can produce excellent predictive power in the short, medium, and long term, the prediction horizon was set as 1 h, 6 h, and 24 h, respectively. There are two baseline algorithms and two different network algorithms to compare. Simple LSTM and LSTM-CNN [22] are used as baselines. Here, LSTM and LSTM-CNN do not use spatial information but only use temporal information. Furthermore, there are two network models: a spatiotemporal model with a distance network (distance network model) [37], and a spatiotemporal model with a causality network (causality network model). The distance network model uses the same algorithm as the proposed model, but only the distance network is used like the previous studies [37]. The distance network model was considered as the state-of-the-art. The causality network model also uses the same algorithm as the proposed model; however, only the causality network is used, not a combination of the distance and causality networks.

SECTION IV.

Results

The test set performance of the proposed method and baseline algorithms are reported in Tables 3–5. The best results are highlighted in bold. For the IA metric, a higher value indicates superior performance; for the RMSE and MAE metrics, the lower the value, the better the performance. The results are the average of all 186 stations for the test period.

TABLE 3 IA

TABLE 4 RMSE

TABLE 5 MAE

In Tables 3–5, the distance network model outperforms the two baseline algorithms, LSTM and LSTM-CNN. This is because the distance network model utilizes both spatial and temporal information, while LSTM and LSTM-CNN utilize temporal information only. This is consistent with the literature [37]. Furthermore, the causality model outperforms the distance network. The latter has been considered as a state-of-the-art algorithm. This indicates that the distance network used in previous studies does not accurately reflect spatial information, such as terrain and western wind. Finally, in all metrics, the proposed method demonstrated the best performance. This suggests that there is spatial information that cannot be reflected only by the causality network. Furthermore, there is a concern of spurious causality when only the causality network is used. That is, to deal with the shortcomings of the causality and distance networks, one needs to use the two networks together.

This study provides a visualization of predicted values for intuitive comparison in Figures 10 and 11. We visualized the last 28 data points of the test set of two stations for the comparison of T +1 predictions. The two stations were determined as monitoring stations, with one being in Seoul and the other in the mountainous region. These two stations are the same as those in Figures 4–7. Here, the blue line means the true value, the green line means the value predicted by the proposed method, and the red line means the value predicted by the distance network model.

FIGURE 10.

Prediction comparison in Seoul, the capital city.

Show All

FIGURE 11.

Prediction comparison in the mountain range.

Show All

In Figures 10 and 11, the proposed method and the distance network model both follow the overall trend well. However, the proposed method has static errors, while the distance network model shows diverging errors. Therefore, the proposed method shows an overall lower error than the distance network model, and shows a stable prediction result with a small error deviation.

SECTION V.

Discussion and Conclusion

This study proposes a novel station network based on the causal analysis between stations using transfer entropy and the distance network using haversine distance. The proposed station network enhances prediction performance by going beyond the limits of previous studies that assume symmetrical influence within the distance network. This study suggests that the proposed station network can produce lower errors than other state-of-the-art algorithms. In addition, the descriptive analysis suggests that the proposed network reflects the western wind or terrain.

This study has several implications. Academically, this study suggests that two assumptions used in existing air pollution prediction studies are insufficient: 1) nearby stations exchange information with each other, and 2) have high correlations. More broadly, this study shows the importance of considering causality in artificial intelligence development. This supports the research stream that it is important to consider causality, not correlation, in prediction problems [26].

Practically, governments have been trying to measure air pollution quality exactly because of the lack of monitoring stations [18]. Air pollution is measured in real-time and reported on an average per hour basis. That is, it is difficult to predict future movements of air pollution and establish countermeasures. This study presents an algorithm that surpasses the performance of the existing state-of-the-art algorithms, thereby enabling us to predict future air pollution trends better and allowing people to cope with it in advance.

This study also has some limitations. First, the proposed method is tested only in the Republic of Korea. The country is characterized by a small size, and much of its area is mountainous. That is, the Republic of Korea has geographic specificity. Therefore, other regions may not exhibit as dramatic results as the results presented here. Further research should extend this study in other countries where the air pollution prediction is also important, such as China. In addition, future research should focus on grasping the utility of the network presented here according to the geographic specificity. Second, the proposed method assumes that the network is the same over time. The effect of the terrain does not change with time. However, the speed and direction of the wind change according to the season and time. Future research should also consider time-varying effects. For example, a system that predicts air pollution by dividing the causality network according to seasons may be constructed. In addition, a causality network can be established on a monthly basis, and a time-varying network can be considered as an evolutionary graph. Third, proposed method does not consider external effects. For example, air pollution from China has a transboundary effect on air quality in the Republic of Korea [38]. Future research should examine this external effect.

References is not available for this document.

Deep Spatiotemporal Attention Network for Fine Particle Matter 2.5 Concentration Prediction With Causality Analysis

Abstract:

Metadata

Abstract:

Introduction

Key Related Papers