Journals & Magazines >IEEE Journal of Selected Topi... >Volume: 17

CombineDeepNet: A Deep Network for Multistep Prediction of Near-Surface PM $_{2.5}$ Concentration

<< Results

Under a Creative Commons License

Abstract:

$_{2.5}$ is a type of air pollutant that can cause respiratory and cardiovascular problems. Precise PM

$_{2.5}$ (

$\mu {\text {g/m}} ^{3}$ ) concentration prediction may h...Show More

Metadata

Abstract:

$_{2.5}$ is a type of air pollutant that can cause respiratory and cardiovascular problems. Precise PM

$_{2.5}$ (

$\mu {\text {g/m}} ^{3}$ ) concentration prediction may help reduce health concerns and provide early warnings. To better understand air pollution, a number of approaches have been presented for predicting PM

$_{2.5}$ concentrations. Previous research used deep learning models for hourly predictions of air pollutants due to their success in pattern recognition, however, these models were unsuitable for multisite, long-term predictions, particularly in regard to the correlation between pollutants and meteorological data. This article proposes the combine deep network (CombineDeepNet), which combines multiple deep networks, including a bidirectional long short-term memory, bidirectional gated recurrent units, and a shallow model represented by fully connected layers, to create a hybrid forecasting system. It can effectively capture the complex relationships between air pollutants and various influencing factors to predict hourly PM

$_{2.5}$ concentrations in multiple monitoring sites based in China. The best root mean square error achieved was 22.0

$\mu {\text {g/m}} ^{3}$ (long-term) and 6.2

$\mu {\text {g/m}} ^{3}$ (short-term), with mean absolute error values of 3.4

$\mu {\text {g/m}} ^{3}$ (long-term) and 2.2

$\mu {\text {g/m}} ^{3}$ (short-term). In addition, the correlation coefficient (R

$^{2}$ ) reached 0.96 (long-term) and 0.83 (short-term) across six monitoring sites. These results demonstrate that CombineDeepNet enhances prediction accuracy compared with popular deep learning methods. Therefore, CombineDeepNet proves to be a important framework for predicting PM

$_{2.5}$ concentration.

Published in: IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing ( Volume: 17)

Page(s): 788 - 807

Date of Publication: 16 November 2023

ISSN Information:

DOI: 10.1109/JSTARS.2023.3333269

Funding Agency:

Contents

SECTION I.

Introduction

Near $-$ surface air pollutants have created various adverse health effects globally [1]. Their impact continues to grow across the globe. Economic expansion is one of the key reasons for the increasing concentration of pollutants. According to the research in [2], the number of fatalities in China caused by PM$_{2.5}$ increased by 0.39 million (23%) between 2002 and 2007, emphasizing the need for precisely predicting PM$_{2.5}$ concentration. PM$_{2.5}$ is considered the principal air pollutant that may have an impact on our quality of life, our health, and reduce air visibility [3]. The researchers have shown the relationship between PM$_{2.5}$ and respiratory system disorders [4], [5].

The first step in resolving the PM$_{2.5}$ problem is to increase forecast accuracy, which has been a crucial issue. It is important to note that accurately predicting PM$_{2.5}$ levels can be challenging, as there are many factors that can influence air quality [6]. To enhance the accuracy of PM$_{2.5}$ predictions, it is often necessary to incorporate a diverse range of data [7] and utilize techniques such as cross validation and regularization. These techniques play a crucial role in optimizing the model's performance by preventing overfitting, improving generalization, and ensuring robustness and accuracy. There are two types of prediction techniques available today: deterministic methods and statistical approaches. To predict the PM$_{2.5}$ concentration, deterministic approaches need complex prior knowledge [8]. Statistical methods involve analyzing PM$_{2.5}$ concentrations over a period of time in order to identify trends and patterns. Based on historical data, a time-series method like the autoregressive integrated moving average (ARIMA) model may be used to forecast future PM$_{2.5}$ concentrations [9]. However, the effectiveness of these statistical techniques may be limited due to the requirement of adherence to certain statistical assumptions. These assumptions play an important part in ensuring the validity and effectiveness of the techniques. Some common statistical assumptions include normality, independence, homoscedasticity, linearity, and no multicollinearity [10]. It is essential to carefully consider these assumptions when applying statistical techniques since violations can result in misleading results and adversely impact the reliability of the analysis.

When dealing with multidimensional time series data, machine learning (ML) methods often perform better. Particularly for the PM$_{2.5}$ concentration tasks, several models have been used, including decision tree [11], random forest [12], [13], support vector regression [14], support vector machine (SVM) [15], multilayer perceptron [16], artificial neural network (ANN) with density-based spatial clustering of applications with noise [17], extreme learning machines [18], echo state networks [18], hidden Markov model [19], XGBoost [20], and Light GBM [21], [22].

Recently, deep learning (DL) models have gained popularity and have demonstrated impressive results in various applications, such as pattern recognition [23], [24], remote sensing images [25], and built-up land expansion [26]. Advanced versions of recurrent neural networks (RNNs) or combinations of convolutional neural networks (CNNs) are frequently used for PM$_{2.5}$ prediction. These include long short-term memory (LSTM) [27], [28], gated recurrent units (GRU) [29], bidirectional LSTM (BiLSTM) [30], CNN-LSTM [31], and CNN-BiLSTM [32]. In this study, we propose the CombineDeepNet model using three DL models: BiLSTM, BiGRU, and fully connected (FC) for predicting near-surface PM$_{2.5}$ concentration.

A. Related Works

In recent decades, researchers have predominantly utilized ML models for PM$_{2.5}$ concentration prediction. Bhatti et al. [38] suggested a semisupervised learning strategy based on factor analysis and the seasonal autoregressive integrated moving average model. In this study, the authors correlate PM$_{2.5}$ with other pollutants to analyze the overall air quality of Lahore, Pakistan. Zhou et al. [39] created a hybrid framework that combines multi-output SVM with multitask learning to enhance long-term forecasting of PM$_{2.5}$ concentration in Taipei City, Taiwan. In this work, the authors also developed a single-output SVM benchmark technique for comparing the implemented model. Masood and Ahmad [40] used ANN and SVM models to forecast PM$_{2.5}$ level in Delhi. They trained their models using various meteorological and pollutant parameters over a two-year period (2016–2018).

Recently, numerous DL models have become more popular in time-series prediction applications. Zhang et al. [33] presented a DL system for forecasting PM$_{2.5}$ concentrations using an auto-encoder and a BiLSTM. In this work, the association between PM$_{2.5}$ concentration and other climatic factors was employed to produce a more accurate result. Similarly, Zhang et al. [34] developed a BiLSTM model for forecasting daily PM$_{2.5}$ concentrations in Beijing. Faraji et al. [35] proposed a combined 3-D CNN-GRU model for the spatial-temporal forecasting of PM$_{2.5}$ concentrations in urban areas. In order to train the prediction model, this study gathered PM$_{2.5}$ concentrations from several air quality monitoring sites around Tehran, Iran, from 2016 to 2019. In their proposal, Zhang et al. [36] combined CNN with spatial-temporal attention and residual learning to create a hybrid DL network for short-term PM$_{2.5}$ concentration prediction. Natsagdorj et al. [37] implemented two DL frameworks using Bayesian optimized LSTM and CNN-LSTM to forecast daily PM$_{2.5}$ levels in Ulaanbaatar, Mongolia. They collected hourly Himawari8 aerosol optical depth, PM$_{2.5}$ concentration, and meteorological data to train their prediction models.

Chen et al. [41] proposed a physical model for estimating PM$_{2.5}$ levels in the mideastern China. In their work, the authors utilized data from the AERONET official website. Wu et al. [42] developed a framework that utilizes satellite and station observations to estimate hourly PM$_{2.5}$ levels.

A summary comparison of recently developed DL models is presented in Table I. The table provides an overview of these models in terms of the year of development, three comparison metrics, prediction of PM$_{2.5}$ in multiple sites, correlation of PM$_{2.5}$ with different parameters, and their short-term and long-term prediction capabilities. Furthermore, certain studies have employed a combination of adjacent station parameters and target station parameters to effectively forecast PM$_{2.5}$ levels at the desired location. Zhao et al. [43] proposed an LSTM-FC framework to forecast PM$_{2.5}$ concentration for a given air quality monitoring station using 48 h of past air pollutant, meteorological, and weather forecasting data. However, this approach combines spatial information from surrounding monitoring stations located in different geographical regions, which may lead to overfitting issues and an increase in computational complexity. While combining information from surrounding monitoring stations can enhance prediction accuracy, there is a risk of overfitting. Overfitting can occur when the model captures noise or idiosyncratic patterns specific to the training dataset, resulting in reduced generalization ability. When neighboring station data is excessively relied upon, the model may become overly sensitive to local variations, compromising its ability to generalize to new locations or conditions. It is crucial to strike a balance between incorporating relevant information and avoiding overfitting to ensure robust and reliable predictions. Chang et al. [44] presented an aggregated LSTM to predict PM$_{2.5}$ level for 1–8 h using local air quality data from nearby industrial stations. Finally, Yeo et al. [45] combined CNN and GRU with a group of nearby stations to forecast PM$_{2.5}$ levels efficiently.

TABLE I Summary Comparison of Recently Developed DL Models in Terms of the Year of Development, Three Comparison Metrics, Prediction of PM$_{2.5}$ in Multiple Sites, Correlation of PM$_{2.5}$ With Different Parameters, and Their Short-Term and Long-Term Prediction Capabilities

$Table I- Summary Comparison of Recently Developed DL Models in Terms of the Year of Development, Three Comparison Metrics, Prediction of PM$_{2.5}$ in Multiple Sites, Correlation of PM$_{2.5}$ With Different Parameters, and Their Short-Term and Long-Term Prediction Capabilities$

B. Contributions of the Article

The previously described models are unable to extract deep features from PM$_{2.5}$ datasets. Therefore, in this research, we propose the CombineDeepNet model for predicting PM$_{2.5}$ levels at multiple sites. The CombineDeepNet is a combination architecture of three different networks, namely, BiLSTM, BiGRU, and an FC layer. The BiLSTM layer extracts preliminary features, while the BiGRU layer extracts deeper features from the air quality data. The FC layer predicts the PM$_{2.5}$ level using these deep features. Deep features refer to the high-level abstract representations of PM$_{2.5}$ data extracted from the BiGRU module using the lower level representation of the raw PM$_{2.5}$ data from the BiLSTM module. These deep features capture important patterns and relationships in the data that are not easily discernible from a simple analysis of the raw data. The absence of these deep features can diminish the precision of PM$_{2.5}$ level prediction. We evaluate the performance of this model using three key metrics: root mean square error (RMSE), mean absolute error (MAE), and correlation coefficient (R$^{2}$). Furthermore, we explore the relationships between PM$_{2.5}$ and meteorological factors, as well as the associations between PM$_{2.5}$, other pollutants, and meteorological data for both short-term and long-term forecasts. Our observations reveal that the CombineDeepNet model possesses all the necessary attributes for accurately predicting PM$_{2.5}$ concentrations. The main contributions of this article can be outlined as follows.

The BiLSTM module is used to solve vanishing gradient problems and to integrate the relationship between pollutant and meteorological data in the initial feature investigation. The vanishing gradient problem refers to the difficulty of RNNs propagating information over long sequences due to the diminishing gradients during backpropagation. This issue can hamper the network's capability to capture long-term dependencies within the data. BiLSTM addresses this issue by employing gates. A standard LSTM consists of three types of gates: the forget gate, the input gate, and the output gate. The forget gate decides whether to keep or discard information from the previous time step. The input gate decides which new information should be added to the cell state, and the output gate decides which information gets transferred to the next step. In a BiLSTM, there are separate sets of gates for the forward LSTM and the backward LSTM. When we compute gradients during backpropagation in a BiLSTM, it involves gradients from both directions. If one direction experiences vanishing gradients for a particular gate, the other direction might have nonvanishing gradients, which can help prevent complete information loss. Therefore, the bidirectional nature of a BiLSTM makes it less likely for both directions to simultaneously suffer from severe vanishing gradients. Similar to the forget gate, the input and output gates in a BiLSTM also have bidirectional counterparts. This bidirectional gating allows information to flow from both directions, potentially reducing the impact of vanishing gradients.
BiLSTM can also effectively integrate the relationship between pollutant and meteorological data by considering temporal dependencies and contextual information. By inputting both the pollutant and meteorological data into the BiLSTM model, the network can learn the complex relationships and patterns between them. The sequential nature of BiLSTM allows it to capture dependencies over time, enabling the model to understand how meteorological factors affect pollutant levels at different time points.
The bidirectional gated recurrent units (BiGRU) module not only mitigates the vanishing gradient problem but also extracts deeper features from the preliminary features. In a standard gated recurrent unit (GRU), there are reset and update gates that regulate the information flow within the network. In a BiGRU, there are two sets of these gates: one for the forward GRU and one for the backward GRU. When computing gradients during backpropagation in a GRU, the gradients are influenced by both directions. If one direction suffers from vanishing gradients for a particular gate, the other direction may exhibit a nonvanishing gradient for that gate, helping to prevent complete information loss. This bidirectional aspect can make it less likely for both directions to simultaneously encounter severe vanishing gradients.
“Preliminary features” refer to the initial set of features that act as the initial point of reference for the BiGRU module. These preliminary features are derived from the output of the BiLSTM modules. The selection of preliminary features is typically based on the hidden patterns found in the historical pollution data. These features capture relevant information and characteristics from the data that are deemed important for further analysis. Subsequently, the BiGRU module processes these preliminary features and extracts deep features from them. Deep features represent higher level representations of the data, capturing more complex patterns and relationships.
The FC layer is utilized to create an accurate long-term forecast of PM$_{2.5}$ concentrations across six monitoring locations. We are forecasting the next 180 h of lead time. We have measured the accuracy in terms of RMSE, MAE, and R$^{2}$. The FC layer is a type of ANN layer where each neuron is linked to all neurons in the preceding layer. It is commonly used for mapping high-level features extracted from the preceding layers to the desired output. Here, the FC layer takes the extracted features from the preceding layers as input and performs computations to generate predictions of PM$_{2.5}$ concentrations.
By utilizing the FC layer, the model can capture complex relationships and patterns in the input data to create accurate long-term forecasts. The layer learns to combine the extracted features from earlier layers and map them to the desired output, which is the prediction of PM$_{2.5}$ concentrations at various monitoring locations. In addition, the experimental outcomes demonstrate the superior performance of our proposed model compared with several other well-known DL models.
In the spirit of transparent research, we make available the code used to produce the findings of this work on the https://github.com/Prasanjit-Dey/CombineDeepNet.

In this work, the name “CombineDeepNet” signifies the combining aspect of the model. The model incorporates three modules: BiLSTM, BiGRU, and FC layers. The BiLSTM module takes the pollutants and meteorological data as input and extracts initial features. The BiGRU module then takes these initial features as input and further extracts deep features. Finally, the FC layers utilize the deep features to make predictions. The combining aspect of our framework lies in the integration of these three architectural components for long-term prediction. By combining BiLSTM, BiGRU, and FC layers, we leverage the capabilities of these modules to enhance predictive performance. Therefore, we named our framework “CombineDeepNet” to reflect its combining nature.

The rest of this article is organized as follows. Section II includes the study area and dataset analysis. The proposed method is described in Section III. It includes the CombineDeepNet framework. Section IV incorporates the results of the proposed model and five other popular DL models. Section V analyzes the discussion of the CombineDeepNet model and the other five models. Finally, Section VI concludes this article.

SECTION II.

Study Area and Dataset Analysis

A. Study Area

The six urban and suburban monitoring stations of China, namely, Aoti Zhongxin, Dongsi, Shunyicheng, Tiantan, Haidan Wanliu, and Wanshou Xigong, are selected as study areas. Table II shows the detailed description of six monitoring stations along with latitude and longitude details. Similarly, Fig. 1 depicts the physical location of the research region. These stations are in highly urbanized and industrialized regions. We have chosen the monitoring stations in these areas as our main study focus. We observe that there is still significant room for improvement in overall air quality in these chosen areas when compared with international standards. As a consequence, precise prediction of PM$_{2.5}$ concentrations in these locations is critical. While our study primarily focuses on providing accurate predictions of air quality pollutants, such as PM$_{2.5}$, the information obtained from these predictions can be valuable in informing decision-making processes related to air pollution mitigation and control. In addition, accurate predictions enable the identification of critical periods or locations where pollution levels are likely to exceed the limits, allowing for proactive measures and interventions.

TABLE II Latitude and Longitude Details of Six Monitoring Stations

$Fig. 1. - Distribution of PM$_{2.5}$ pollution monitoring station in China. Here, there are six monitoring stations across urban and suburban areas.$

Fig. 1.

Distribution of PM$_{2.5}$ pollution monitoring station in China. Here, there are six monitoring stations across urban and suburban areas.

Show All

B. Dataset Analysis

1) Data Description

In this research, our dataset contains past pollution levels and meteorological parameters collected from six monitoring stations (Aoti Zhongxin, Dongsi, Shunyicheng, Tiantan, Haidan Wanliu, and Wanshou Xigong) in China between March 1, 2013 and February 28, 2017. It contains 35 064 h of data for each monitoring site. The dataset contains 12 pollutant and meteorological variables, such as PM$_{2.5}$, PM$_{10}$, SO$_{2}$, NO$_{2}$, CO, O$_{3}$, temperature, air pressure, humidity, rain, wind direction, and wind speed. The inclusion of other pollutants and meteorological variables alongside PM$_{2.5}$ serves a specific purpose. By incorporating these additional variables, we aim to leverage the synergistic relationships and interactions among different factors affecting air quality. This approach allows the DL model to extract efficient hidden patterns and improve the overall prediction accuracy. Table III describes the statistical information of maximal values, minimal values, mean values, and standard deviation for six monitoring stations. We observe that high mean and standard deviation values for pollutant concentrations like PM$_{2.5}$, PM$_{10}$, SO$_{2}$, NO$_{2}$, CO, and O$_{3}$ over six monitoring stations can indicate significant fluctuations in air quality over time, and there are multiple sources of pollutants in the area.

TABLE III Statistical Information of Six Monitoring Stations

2) Correlation of PM$_{2.5}$ With Other Parameters

We calculated the correlation coefficient between pollutants and meteorological parameters for six monitoring stations: (a) Aoti Zhongxin, (b) Dongsi, (c) Shunyicheng, (d) Tiantan, (e) Haidan Wanliu, and (f) Wanshou Xigong, as shown in Fig. 2. It shows that PM$_{2.5}$ has the highest correlation with other parameters, specifically PM$_{2.5}$ has the strongest correlation with PM$_{10}$ for all six monitoring sites. Furthermore, Table IV presents the grading table for the Spearman correlation coefficient ($\rho$) [46]. The Spearman correlation coefficient is employed to assess both the magnitude and direction of the monotonic relationship between variables. According to the table, a strong correlation exists between PM$_{2.5}$ and other parameters, especially between PM$_{2.5}$ and PM$_{10}$, as the correlation coefficient between them is at least 0.87 for all monitoring sites. This strong correlation implies that there is a hidden pattern between the parameters, which can be useful during the model training process to improve prediction accuracy.

TABLE IV Grading Table of Spearman Correlation Coefficient ($\rho$) [46]

$Table IV- Grading Table of Spearman Correlation Coefficient ($\rho$) [46]$

Fig. 2.

Correlation coefficient between pollutants and meteorological parameters for six monitoring stations: (a) Aoti Zhongxin, (b) Dongsi, (c) Shunyicheng, (d) Tiantan, (e) Haidan Wanliu, and (f) Wanshou Xigong.

Show All

In addition, PM$_{2.5}$ is a well-established air pollutant that has been consistently associated with numerous adverse health effects [1]. Consequently, our objective was to estimate the concentration of PM$_{2.5}$ by leveraging meteorological parameters and historical pollution levels, as these factors exhibit strong correlations with PM$_{2.5}$ levels. It is important to reiterate that the primary focus of this article is to predict PM$_{2.5}$ due to its significant impact. Hence, PM$_{2.5}$ has been chosen as the target variable in this study.

3) Distribution Characteristic of Data

We identified six monitoring sites: Aoti Zhongxin, Dongsi, Shunyicheng, Tiantan, Haidan Wanliu, and Wanshou Xigong as the study targets. These monitoring locations were selected to investigate the properties of air pollution dispersion and meteorological data. Fig. 3 shows the numerical variations in PM$_{2.5}$ concentrations over various time periods across six monitoring sites. The deep blue line shows the 30-day exponential moving average of PM$_{2.5}$ concentrations. The reason for using a 30-day exponential moving average is to smooth out short-term fluctuations in the data, while still capturing long-term trends. This approach can provide a clearer representation of the overall pattern of PM$_{2.5}$ levels over time, particularly in situations where the data may be noisy or subject to short-term variability. The choice of 30 days may have also been based on practical considerations, such as the frequency of data collection and the time scale of relevant environmental and health effects associated with PM$_{2.5}$ exposure. In addition, it is a commonly used time frame for calculating moving averages in various fields, including finance and economics [47], [48]. It shows that the changes in PM$_{2.5}$ concentration have been pretty consistent over time. This suggests that there may be a hidden correlation in the historical data.

$Fig. 3. - Distribution of PM$_{2.5}$ concentrations across six monitoring stations: (a) Aoti Zhongxin, (b) Dongsi, (c) Shunyicheng, (d) Tiantan, (e) Haidan Wanliu, and (f) Wanshou Xigong. The deep blue line depicts the 30-d exponential moving average of PM$_{2.5}$ concentrations, indicating a consistent pattern.$

Fig. 3.

Distribution of PM$_{2.5}$ concentrations across six monitoring stations: (a) Aoti Zhongxin, (b) Dongsi, (c) Shunyicheng, (d) Tiantan, (e) Haidan Wanliu, and (f) Wanshou Xigong. The deep blue line depicts the 30-d exponential moving average of PM$_{2.5}$ concentrations, indicating a consistent pattern.

Show All

After statistical analysis, an average of 63.20% of the PM$_{2.5}$ levels at the six monitoring sites were determined to be greater than the World Health Organization's (WHO) first interim threshold of 35 $\mu {\text {g/m}} ^{3}$. Exposure to this pollutant can have adverse health effects on both vulnerable and healthy individuals [1]. The health effects may include respiratory and cardiovascular problems, among others, and can range in severity depending on the level and duration of exposure. Moreover, exposure to high levels of this pollutant can be harmful to the entire population, not just vulnerable individuals. The average PM$_{2.5}$ concentration of 40.02% was found to be greater than 75 $\mu {\text {g/m}} ^{3}$, which has a direct impact on physical health and human life. Therefore, to predict PM$_{2.5}$ concentration accurately and prevent its impact on human health, we considered the hidden relationship of the historical data [49], [50]. In our experiment, we used 35 064 h of data from each of the monitoring sites. However, we removed some inconsistent data, such as missing data and data with known errors or anomalies. These data points were excluded from the analysis to ensure the reliability and accuracy of our models. Among the remaining data, we used 26 300 h to train the models and 8760 h to test the models.

SECTION III.

Method

A. Overview of Proposed Framework

Fig. 4 depicts the proposed CombineDeepNet framework. The main structure is made up of the three stages listed below.

$Fig. 4. - Frameworks of the CombineDeepNet for PM$_{2.5}$ concentration prediction. $x_{t-i}$ is the pollutants' concentration and meteorological data input in the model for each timestamp.$

Fig. 4.

Frameworks of the CombineDeepNet for PM$_{2.5}$ concentration prediction. $x_{t-i}$ is the pollutants' concentration and meteorological data input in the model for each timestamp.

Show All

1) Data Analysis and Preprocessing

First, some of the data from the six monitoring stations could be lost because of equipment failure, routine maintenance, bad transmission, or other things that cannot be controlled. However, data loss significantly affects the performance of the prediction models. Therefore, replacing the empty or “nan” values is critical to ensuring the model's performance [51]. Using “nan” values in data during DL model training can pose three main challenges as follows:

Data incompatibility: Many DL frameworks and libraries expect numeric data as input. The “nan” is a nonnumeric value, which can lead to compatibility issues when training DL models. It may result in errors or inconsistencies during computations, affecting the training process.
Inconsistent data shape: DL models typically require consistent data shapes for proper training. If “nan” values are present in the dataset, they can introduce inconsistencies in the data shape. This can cause difficulties in batching, vectorization, or applying certain operations, leading to training errors or unexpected behavior.
Gradient computation issues: DL models rely on gradient computations during backpropagation for updating model parameters. The “nan” values can disrupt these computations because gradients for nonnumeric values are undefined. This can introduce errors or inaccuracies during gradient descent optimization, hindering the training process.

To address these challenges, some common strategies for handling empty or “nan” values have been employed in previous research, such as: 1) using the mean value of the previous and next values for the current “nan” value, 2) k-nearest neighbor, etc. In this research, we replace a missing value with the existing data's next value. Next, the Pearson correlation coefficient is applied to the six monitoring stations to assess the degree of correlation between pollutants and meteorological data, which can be utilized for geographical analysis [52]. Fig. 2 displays the correlation coefficients between variables for the six monitoring stations. In addition, Table IV presents the grading tables for the Spearman correlation coefficient, which assesses the strength of the correlation between variables. Both the figures and tables indicate a strong correlation between PM$_{2.5}$ and the other variables across all six monitoring sites. The presence of a strong correlation suggests the existence of hidden patterns, which can be leveraged during the model training process to enhance prediction accuracy. Furthermore, PM$_{2.5}$ is a widely recognized air pollutant that has been associated with numerous negative health impacts [1]. Therefore, we aimed to determine the PM$_{2.5}$ concentration based on meteorological parameters and past pollution levels, as these factors are closely related to PM$_{2.5}$ levels. Therefore, in this study, PM$_{2.5}$ is chosen as the target variable. Then, we normalize the dataset using the min-max technique, which is described as follows: \begin{equation*} x_{t}= \frac{x_{t}-\text{min}(x_{t})}{\text{max}(x_{t})-\text{min}(x_{t})}. \tag{1} \end{equation*} View Source

In this case, $x_{t}$ represents a single dataset from six monitoring stations, and $\text{min}(.)$ and $\text{max}(.)$ represent the minimal and maximal values of the $x_{t}$.

2) Prediction Model

CombineDeepNet is a powerful DL framework that combines the BiLSTM, BiGRU, and FC layers to address the PM$_{2.5}$ prediction problem. The motivation behind CombineDeepNet was to leverage the capabilities of both BiLSTM and BiGRU modules for extracting features from the input data. The BiLSTM module is employed to capture temporal dependencies and patterns in pollutants and meteorological data, generating initial features. These initial features are subsequently fed into the BiGRU module, which further extracts deeper features by leveraging its ability to model long-term dependencies. The FC layers in CombineDeepNet utilize these deep features to predict PM$_{2.5}$ concentration.

To enhance accuracy and understand the relationships between various variables, CombineDeepNet combines multiple pollutants and meteorological factors to generate more accurate predictions and identify meaningful relationships between the different variables. The training method employs a combination of input data and target results. In this scenario, the input data comprises air pollutants and meteorological parameters collected from six monitoring stations over the past $r$ h. Here, $r$ is equal to 26 300 h, and these data are fed into the hybrid models as $x = {x_{t-i}, \ldots, x_{t-r-1}}$, where $x_{t-i} \in \mathbb {R}^{m}$ (with $m$ denoting the number of pollutants and meteorological parameters) for each station. The objective of this task is to forecast the PM$_{2.5}$ level for each monitoring station, specifically for $n$ h into the future. This forecast is represented as $y = {y_{t+1}, \ldots, y_{t+j}, \ldots, y_{t+n}}$, where $y_{t+j} \in \mathbb {R}$ represents the prediction outcome. In this context, $n$ corresponds to 180 h.

The highest concentration of PM$_{2.5}$ is an indicator of air pollution and its potential adverse effects on human health [1]. In our research, we specifically focused on estimating and predicting the concentration of PM$_{2.5}$, recognizing its importance in understanding and addressing air quality concerns. The training method for this framework is designed using a three-module architecture. In the first module, there is a 50-unit BiLSTM layer that is used to look for early features in the training data. The BiLSTM layer is a variant of the RNN capable of processing sequential data in both forward and backward directions. This makes it useful for finding patterns in time series data. The second module consists of a BiGRU layer with 50 units, which further extracts deep features.

While BiLSTMs are well-regarded for their ability to capture long-range dependencies in sequential data, they tend to impose a higher computational and memory demand compared with BiGRUs [53]. BiGRUs, being a simpler variant, offer computational efficiency. In our experiment, we tackled the challenge of making accurate long-term predictions of PM$_{2.5}$ concentration within a complex and dynamic domain. Given this context, it was imperative to select a computationally efficient DL architecture. To strike a balance between model complexity and predictive accuracy, we initially integrated a single layer of BiLSTM into our model for primary feature extraction. Our analysis revealed that this single layer of BiLSTM was insufficient to capture the features necessary for accurate predictions. Adding more BiLSTM layers would have escalated computational complexity without commensurate gains in accuracy. Therefore, in our proposed architecture, we incorporated BiGRU layers to extract deeper features. This hybrid approach allowed us to harness the strengths of BiLSTM for primary feature extraction, whereas benefiting from the computational efficiency of BiGRU for capturing intricate, long-range dependencies. In our study, we conducted experiments with various numbers of units to identify the optimal configuration. After a thorough evaluation, we determined that utilizing 50 units in both the BiLSTM and BiGRU layers provided the optimal solution in terms of accuracy and computational efficiency.

To regularize the training process, dropout layers are added after each of the BiLSTM and BiGRU layers. Dropout is a strategy for preventing overfitting by randomly removing a particular proportion of neurons during training. It has proven to be a valuable technique for regularization in DL, helping networks generalize better and avoid overfitting. Researchers have found that dropout can lead to improved performance and generalization on a wide range of tasks [54], [55]. The third module is a FC layer that takes the output from the BiGRU layers and completes the PM$_{2.5}$ prediction as $y= { y_{t+1},\ldots, y_{t+j},\ldots, y_{t+n} }$. The FC layer is a sort of neural network layer that utilizes the characteristics retrieved by the preceding layers to produce predictions.

3) Model Evaluation

The last step of the proposed CombineDeepNet framework is model evaluation. The CombineDeepNet model, along with other popular DL models, is evaluated by comparing their predicted values to actual values using various metrics. These metrics are then compared to determine whether the proposed model is better at predicting PM$_{2.5}$.

Overall, the combined neural network model considers both air pollutants and meteorological parameters, employing a three-module architecture to extract initial and deep features and make predictions. This approach was developed primarily to forecast PM$_{2.5}$ concentrations, which may have a substantial influence on air quality and human health.

B. CombineDeepNet

Initially, air pollutants and meteorological parameters from six individual monitoring sites are fed into both the forward and backward directions of an LSTM model in chronological order using the time-series notation $x = {x_{t},\ldots,x_{t-i},\ldots,x_{t-r+1}}$ for preliminary feature extraction. Each forward and backward LSTM gate, such as the forget gate ($f_{t}$), input gate ($i_{t}$), output gate ($O_{t}$), candidate cell state ($\tilde{C_{t}}$), and hidden cell state ($C_{t}$), extracts features from the training data. Here, the forget gate determines the extent to which information from the cell state is retained by the current unit, whereas the input gate decides how much information from the current input is saved in the cell state. Finally, the output gate controls the output state of the memory cell. The equations governing each gate are as follows: \begin{align*} i_{t} =& \sigma \left(W_{i} \left[ h_{t-1}, x_{t} \right] + b_{i} \right) \tag{2}\\ f_{t} =& \sigma \left(W_{f} \left[ h_{t-1}, x_{t} \right] + b_{f} \right) \tag{3}\\ O_{t} =& \sigma \left(W_{o} \left[ h_{t-1}, x_{t} \right] + b_{o} \right) \tag{4} \end{align*} View Sourcewhere $\sigma$ denotes the sigmoid activation function, the weight for the corresponding gate ($x$) neurons is represented by $W_{x}$, $h_{t-1}$ denotes the output of the previous LSTM unit (at timestamp $t-1$), $x_{t}$ denotes input at the current period, and $b_{x}$ denotes the biases for the corresponding gates ($x$).

The following are the equations for the candidate cell state, hidden cell state, and final output: \begin{align*} \tilde{C_{t}} =& \tanh \left(W_{c} \left[ h_{t-1}, x_{t} \right] + b_{c} \right) \tag{5}\\ C_{t} =& f_{t}*c_{t-1}+i_{t}*\tilde{C_{t}} \tag{6}\\ h_{t}=& O_{t}*\tanh (C_{t}) \tag{7} \end{align*} View Sourcewhere tanh represents the activation function.

Finally, forward and backward LSTM combines the features to create preliminary features in time series order $O=\lbrace O_{t},\ldots,O_{t-i},\ldots,O_{t-r+1}\rbrace$. Therefore, BiLSTM employed two hidden layers, including forward hidden layers ($h^{\xrightarrow {f}}_{t}$) and backward hidden layers ($h^{\xleftarrow {b}}_{t}$). Finally, the preliminary features are generated by combining $h^{\xrightarrow {f}}_{t}$ and $h^{\xleftarrow {b}}_{t}$. The key equations of the BiLSTM network are shown as follows: \begin{align*} h^{\xrightarrow {f}}_{t}=& \tanh \left(W^{\xrightarrow {f}}_{h}x_{t} +W^{\xrightarrow {f}}_{h} h^{\xrightarrow {f}}_{t-1}+ b^{\xrightarrow {f}}_{h}\right) \tag{8}\\ h^{\xleftarrow {b}}_{t}=& \tanh \left(W^{\xleftarrow {b}}_{h}x_{t} +W^{\xleftarrow {b}}_{h} h^{\xleftarrow {b}}_{t-1}+ b^{\xleftarrow {b}}_{h}\right) \tag{9}\\ O_{t}=& W^{\xrightarrow {f}} h^{\xrightarrow {f}}_{t}+W^{\xleftarrow {b}} h^{\xleftarrow {b}}_{t}+b_{o} \tag{10} \end{align*} View Sourcewhere $O$ denotes the output features, $w$ denotes the weight matrix, $x$ represents the training data, and $b$ represents the bias.

Subsequently, the output features $O=\lbrace O_{t},\ldots,O_{t-i},\ldots,O_{t-r+1}\rbrace$ were utilized as input to the forward and backward GRU for further feature extraction. The update and reset gates within each unit of the forward and backward GRU were then employed to extract features. The update gate enables the model to evaluate the extent of historical information (from prior time steps) that should be conveyed to future steps. The reset gate enables the model to determine the amount of historical information that should be disregarded. The key equations of the update gates ($z_{t}$), reset gates ($r_{t}$), candidate cell state ($\tilde{h_{t}}$), and hidden cell states ($h_{t}$) are shown as follows: \begin{align*} z_{t} =& \sigma \left(W_{z} \left[ h_{t-1}, O_{t} \right] + b_{z} \right) \tag{11}\\ r_{t} =& \sigma \left(W_{r} \left[ h_{t-1}, O_{t} \right] + b_{r} \right) \tag{12}\\ \tilde{h_{t}} =& \tanh \left(W_{h}O_{t} + \left(r_{t} \odot h_{t-1} \right)*W_{h} + b_{h} \right) \tag{13}\\ h_{t} =& z_{t} \odot h_{t-1} + (1-z_{t}) \tag{14} \end{align*} View Sourcewhere $W_{z}$ and $W_{r}$ represent the weight parameters, $b_{z}$ and $b_{r}$ denote the bias parameters, and $\odot$ represents the Hadamard (elementwise) product operator.

Next, the forward and backward GRU merge the features and create the deep features in a temporal sequence $y={y_{t},\ldots,y_{t-i},\ldots,y_{t-r+1}}$, as expressed in the equation representing the BiGRU network. \begin{align*} h^{\xrightarrow {f}}_{t}=& \tanh \left(W^{\xrightarrow {f}}_{h}O_{t} +W^{\xrightarrow {f}}_{h} h^{\xrightarrow {f}}_{t-1}+ b^{\xrightarrow {f}}_{h}\right) \tag{15}\\ h^{\xleftarrow {b}}_{t}=& \tanh \left(W^{\xleftarrow {b}}_{h}O_{t} +W^{\xleftarrow {b}}_{h} h^{\xleftarrow {b}}_{t-1}+ b^{\xleftarrow {b}}_{h}\right) \tag{16}\\ y_{t}=& W^{\xrightarrow {f}} h^{\xrightarrow {f}}_{t}+W^{\xleftarrow {b}} h^{\xleftarrow {b}}_{t}+b_{y}. \tag{17} \end{align*} View Source

Lastly, the FC layers use the extracted features to forecast the PM$_{2.5}$ concentration. The approach presented is outlined in Algorithm 1.

Algorithm 1: Pseudo Code for Proposed Method.

C. Evaluation Metrics

In this study, the CombineDeepNet model was compared against other prediction models using the same dataset. The forecasting accuracy of the CombineDeepNet model was evaluated using the RMSE, MAE, and R$^{2}$ metrics. The following equation represents these metrics: \begin{align*} \text{RMSE} =& \sqrt{\frac{1}{n}\Sigma _{i=1}^{n}{({x_{i} -y_{i}})^{2}}} \tag{18}\\ \text{MAE}=& \frac{1}{n}\sum _{i=1}^{n}|x_{i}-y_{i}| \tag{19}\\ \text{ R}^{2} =& \frac{\sum _{i=1}^{n}(x_{i}-\overline{x})(y_{i}-\overline{y})}{\sqrt{\sum _{i=1}^{n}(x_{i}-\overline{x})^{2}}\sqrt{\sum _{i=1}^{n}(y_{i}-\overline{y})^{2}}} \tag{20} \end{align*} View Sourcewhere, $x_{i}$ and $y_{i}$ are the test and forecast results of PM$_{2.5}$ level, respectively, $n$ signifies the quantity of data points within the test dataset, $i$ represents a collection of test data, and $\overline{x}$ and $\overline{y}$ is the mean value of $n$ observed sample. For RMSE and MAE metrics, lower values indicate higher forecasting accuracy. Similarly, for R$^{2}$ metric, a higher value indicates higher forecasting accuracy.

SECTION IV.

Results

A set of tests was carried out to assess the efficacy of the proposed model. The test dataset from six monitoring stations was employed in the CombineDeepNet model to observe the forecasting accuracy. The same dataset was then applied to five popular DL models (CNN, LSTM, GRU, CNN-LSTM, and CNN-GRU), and their prediction accuracy was recorded. Lastly, the CombineDeepNet model's prediction results were compared with those of the five popular DL models, confirming the model's efficacy. The Supplementary Material includes the results from comparisons with other models.

A. Parameter Setting

In this article, we employed 26 300 h (75%) of data for training the models and 8760 h (25%) of data for model testing. A number of experiments were used to identify the hyperparameter configuration for this research, which resulted in the selection of the ideal hyperparameter set. To assess the prediction effectiveness of the algorithm, a validation set was utilized to calculate the mean squared error (MSE) after each epoch. Then, we selected the optimal model based on the validation set's error. We used the following methodology for the model training time: we set the number of epochs to 100 and tested the trained model against the validation set after each epoch. We changed and saved the model parameters if the forecasting model's MSE on the validation set improved. We ended the training after making several adjustments to the parameters and conducting various experiments, when the forecasting model's performance on the validation set was at its best. Lastly, in this experiment, we employed dropout to prevent the issue of model overfitting.

Also, we carefully addressed the selection of the time lag parameter for prediction tasks. We have addressed the concept of time lag differently. Instead of specifying a fixed time lag, we use a window size of 4 during the training phase. This means that when making predictions, we considered the previous 4 data points ($t, t+1, t+2, t+3$) as the “context” or “memory” for the model. In other words, we allowed the network to capture dependencies within this 4-h window. Furthermore, our predictions cover a time range of 0 to 180 h, which means that the model is designed to make forecasts for a relatively long-term horizon. The information within the 4-h window contributes to the model's understanding of temporal patterns and trends, allowing it to make predictions over this extended time frame.

The general hyperparameters used in this experiment are included in Table V. The layer-by-layer parameters are incorporated into the Supplementary Material. For further reference, we have included the code in the GitHub repository.¹

TABLE V Specific Hyperparameter Settings According to the Best Model's Performance

B. Loss Function of the Proposed Model

Fig. 5 shows the training loss versus validation loss during training of the CombineDeepNet model over six monitoring stations, namely, (a) Aoti Zhongxin, (b) Dongsi, (c) Shunyicheng, (d) Tiantan, (e) Haidan Wanliu, and (f) Wanshou Xigong. The blue line in Fig. 5 shows the train loss, and the orange line shows the validation loss. The loss function is an important component of a DL model since it is used to evaluate the efficacy of a trained model on a training sample. The loss function's purpose is to minimize the disparity between the anticipated and real output. The performance of the model improves as the loss function value decreases. However, when a model is trained too closely on the input dataset, it can lead to an overfitting problem. It occurs when a model learns the noise in the data instead of the hidden patterns, which results in poor generalization performance on new data. To overcome overfitting, a loss strategy is used to remove the model's overfitting flaws. This involves adjusting the model's parameters to achieve a balance between minimizing the training loss function and maximizing the validation loss function. During the model training process, the training loss function is calculated during each iteration or epoch to monitor the model's performance on the training data. The validation loss function is then measured after each iteration or epoch to evaluate the model's generalization performance on new data. If the validation loss function starts increasing whereas the training loss function continues to decrease, it is a sign of overfitting, and the model needs to be adjusted to improve its generalization ability.

Fig. 5.

Train loss versus validation loss of the proposed CombineDeepNet model over six monitoring stations: (a) Aoti Zhongxin, (b) Dongsi, (c) Shunyicheng, (d) Tiantan, (e) Haidan Wanliu, and (f) Wanshou Xigong.

Show All

C. Multistep Prediction of PM$_{2.5}$ Concentration Over Six Monitoring Stations

This section summarizes the findings and results of the study. Our main focus in this article is on the long-term, multistep forecasting of PM$_{2.5}$ levels over six monitoring stations. In this context, Figs. 6 and 7 play a crucial role in illustrating how different models, namely, (a) CNN, (b) LSTM, (c) GRU, (d) CNN-LSTM, (e) CNN-GRU, and (f) the proposed model, can be generalized to the same test data over Aoti Zhongxin, and Wanshou Xigong monitoring stations. The x-axis of Figs. 6 and 7 corresponds to a time span of 180 h, indicating that the initial 180 consecutive lead times were chosen from the test data to assess the performance of the forecasting models. In contrast, the y-axis depicts both the observed test data and the predicted PM$_{2.5}$ concentration. Specifically, the red curve represents the observed test data, whereas the green curve denotes the predicted values.

$Fig. 6. - Plot of actual PM$_{2.5}$ concentration versus predicted PM$_{2.5}$ concentration in the Aoti Zhongxin monitoring station for the 0 to 180 h lead times over six models: (a) CNN, (b) LSTM, (c) GRU, (d) CNN-LSTM, (e) CNN-GRU, and (f) the proposed model. It represents how well the model generalizes new data.$

Fig. 6.

Plot of actual PM$_{2.5}$ concentration versus predicted PM$_{2.5}$ concentration in the Aoti Zhongxin monitoring station for the 0 to 180 h lead times over six models: (a) CNN, (b) LSTM, (c) GRU, (d) CNN-LSTM, (e) CNN-GRU, and (f) the proposed model. It represents how well the model generalizes new data.

Show All

$Fig. 7. - Plot of actual PM$_{2.5}$ concentration versus predicted PM$_{2.5}$ concentration in the Wanshou Xigong monitoring station for the 0 to 180 h lead times over six models: (a) CNN, (b) LSTM, (c) GRU, (d) CNN-LSTM, (e) CNN-GRU, and (f) the proposed model. It represents how well the model generalizes new data.$

Fig. 7.

Plot of actual PM$_{2.5}$ concentration versus predicted PM$_{2.5}$ concentration in the Wanshou Xigong monitoring station for the 0 to 180 h lead times over six models: (a) CNN, (b) LSTM, (c) GRU, (d) CNN-LSTM, (e) CNN-GRU, and (f) the proposed model. It represents how well the model generalizes new data.

Show All

Figs. 6 and 7 demonstrate an interesting comparison of the proposed model's efficiency with five popular DL models: (a) CNN, (b) LSTM, (c) GRU, (d) CNN-LSTM, and (e) CNN-GRU. The results show that the CombineDeepNet model excels in terms of generalization for longer lead times compared with the other models. Specifically, the green curve, which represents the forecasting value of the proposed model, closely aligns with the actual test data represented by the red curve, demonstrating a high degree of accuracy. The comparison presented in the figures highlights the significance of the CombineDeepNet model's ability to generalize better than other popular DL models. Generalization plays a pivotal role in evaluating prediction models as it assesses a model's ability to effectively handle new, previously unobserved data. A model that generalizes well can provide more accurate and reliable predictions, which makes it more suited for forecasting PM$_{2.5}$ concentrations.

D. Comparison With Statistical Result for Long-Term PM$_{2.5}$ Concentration Prediction

To further confirm the validation of the CombineDeepNet model by incorporating three statistical analysis techniques: RMSE, MAE, and R$^{2}$. To analyze the CombineDeepNet model's performance, the prediction task was divided into multistep lead times ranging from 40–180 h. In addition, five other popular DL models, namely, CNN, LSTM, GRU, CNN-LSTM, and CNN-GRU, were also considered for comparison.

Fig. 8 presented a comparison of the CombineDeepNet model's effectiveness in terms of RMSE with other models for the prediction of PM$_{2.5}$ concentration over six monitoring stations: (a) Aoti Zhongxin, (b) Dongsi, (c) Shunyicheng, (d) Tiantan, (e) Haidan Wanliu, and (f) Wanshou Xigong. The findings demonstrated that the CombineDeepNet model outperformed the other five models in terms of RMSE for all six monitoring stations during long-term lead time, indicating its superiority in predicting PM$_{2.5}$ concentration.

$Fig. 8. - Comparison of the performance of the proposed model with other models in terms of RMSE for the prediction of PM$_{2.5}$ concentration over six monitoring stations: (a) Aoti Zhongxin, (b) Dongsi, (c) Shunyicheng, (d) Tiantan, (e) Haidan Wanliu, and (f) Wanshou Xigong.$

Fig. 8.

Comparison of the performance of the proposed model with other models in terms of RMSE for the prediction of PM$_{2.5}$ concentration over six monitoring stations: (a) Aoti Zhongxin, (b) Dongsi, (c) Shunyicheng, (d) Tiantan, (e) Haidan Wanliu, and (f) Wanshou Xigong.

Show All

Similarly, Fig. 9 presented the MAE values of the proposed model and the other five models over the same six monitoring stations. The findings demonstrated that the CombineDeepNet model had lower MAE values than the other five models, indicating its better performance in accurately predicting PM$_{2.5}$ concentrations.

$Fig. 9. - Comparison of the performance of the proposed model with other models in terms of MAE for the prediction of PM$_{2.5}$ concentration over six monitoring stations: (a) Aoti Zhongxin, (b) Dongsi, (c) Shunyicheng, (d) Tiantan, (e) Haidan Wanliu, and (f) Wanshou Xigong.$

Fig. 9.

Comparison of the performance of the proposed model with other models in terms of MAE for the prediction of PM$_{2.5}$ concentration over six monitoring stations: (a) Aoti Zhongxin, (b) Dongsi, (c) Shunyicheng, (d) Tiantan, (e) Haidan Wanliu, and (f) Wanshou Xigong.

Show All

To further assess the effectiveness of CombineDeepNet, Table VI provides a numerical analysis comparing the CombineDeepNet model with five popular DL models, emphasizing RMSE for forecasting PM$_{2.5}$ concentration across the same six monitoring stations. The table highlighted that the CombineDeepNet model had significantly lower RMSE values than the other five models during long-term lead time, thus, reinforcing the findings of the graphical analysis.

TABLE VI Numerical Evaluation of the CombineDeepNet Model Alongside Popular DL Models Regarding RMSE for Forecasting PM$_{2.5}$ Concentration Across Six Monitoring Stations: (a) Aoti Zhongxin, (b) Dongsi, (c) Shunyicheng, (d) Tiantan, (e) Haidan Wanliu, and (f) Wanshou Xigong

$Table VI- Numerical Evaluation of the CombineDeepNet Model Alongside Popular DL Models Regarding RMSE for Forecasting PM$_{2.5}$ Concentration Across Six Monitoring Stations: (a) Aoti Zhongxin, (b) Dongsi, (c) Shunyicheng, (d) Tiantan, (e) Haidan Wanliu, and (f) Wanshou Xigong$

Similarly, Tables VII and VIII presented the numerical values of MAE and R$^{2}$ for the CombineDeepNet model and the other five models over the same six monitoring stations. The results revealed that, in comparison with the other models, the CombineDeepNet model exhibited lower MAE values and higher R$^{2}$ values, providing further evidence for the superiority of the proposed model in accurately forecasting PM$_{2.5}$ concentration.

TABLE VII Numerical Evaluation of the CombineDeepNet Model Alongside Popular DL Models Regarding MAE for Forecasting PM$_{2.5}$ Concentration Across Six Monitoring Stations: (a) Aoti Zhongxin, (b) Dongsi, (c) Shunyicheng, (d) Tiantan, (e) Haidan Wanliu, and (f) Wanshou Xigong

$Table VII- Numerical Evaluation of the CombineDeepNet Model Alongside Popular DL Models Regarding MAE for Forecasting PM$_{2.5}$ Concentration Across Six Monitoring Stations: (a) Aoti Zhongxin, (b) Dongsi, (c) Shunyicheng, (d) Tiantan, (e) Haidan Wanliu, and (f) Wanshou Xigong$

TABLE VIII Numerical Evaluation of the CombineDeepNet Model Alongside Popular DL Models Regarding R$^{2}$ for Forecasting PM$_{2.5}$ Concentration Across Six Monitoring Stations: (a) Aoti Zhongxin, (b) Dongsi, (c) Shunyicheng, (d) Tiantan, (e) Haidan Wanliu, and (f) Wanshou Xigong

$Table VIII- Numerical Evaluation of the CombineDeepNet Model Alongside Popular DL Models Regarding R$^{2}$ for Forecasting PM$_{2.5}$ Concentration Across Six Monitoring Stations: (a) Aoti Zhongxin, (b) Dongsi, (c) Shunyicheng, (d) Tiantan, (e) Haidan Wanliu, and (f) Wanshou Xigong$

Thus, the statistical results demonstrate that the CombineDeepNet model achieves promising performance in accurately forecasting PM$_{2.5}$ concentration levels when compared with other popular DL models.

E. Comparison Between the Top Three Models: CombineDeepNet, LSTM, and GRU

In this section, we further compare the prediction accuracy of the top three models: our proposed CombineDeepNet, LSTM, and GRU, using box plots. Fig. 10(a)–(f) illustrates the box plots of the RMSE values obtained for CombineDeepNet, LSTM, and GRU models across six monitoring stations: Aoti Zhongxin, Dongsi, Shunyicheng, Tiantan, Haidian Wanliu, and Wanshou Xigong. These plots demonstrate a consistent trend of RMSE values for CombineDeepNet, whereas for LSTM and GRU models, RMSE scores are more widely spread across most of the stations. In addition, the average RMSE values for CombineDeepNet are consistently lower than those for LSTM and GRU, highlighting the superiority of the CombineDeepNet model. Therefore, this model can be effectively applied in complex environments for long-term PM$_{2.5}$ concentration prediction.

Fig. 10.

Box plot of RMSE values for CombineDeepNet, LSTM, and GRU across six monitoring stations: (a) Aoti Zhongxin, (b) Dongsi, (c) Shunyicheng, (d) Tiantan, (e) Haidian Wanliu, and (f) Wanshou Xigong.

Show All

F. Comparison With Fitting Curve

Further, the CombineDeepNet model's effectiveness was evaluated based on the degree of fit between the actual and predicted PM$_{2.5}$ concentrations. Fig. 11 presents the scatter plot that depicts the degree of fit between the actual and forecasting PM$_{2.5}$ concentrations on the test set for the six monitoring stations over 0–180 h. The y-axis indicates the forecasted PM$_{2.5}$ level, whereas the x-axis indicates the actual PM$_{2.5}$ concentration. In Fig. 11, the red line represents the $y=\hat{y}$ function, whereas the green dots represent the degree of variation between the actual and forecasted PM$_{2.5}$ concentration. We observed that the proposed model's forecasted PM$_{2.5}$ concentrations are generally consistent with the actual PM$_{2.5}$ concentrations.

Fig. 11.

Degree of fit between the actual and predicted values of the proposed model on the test set in the [0–180 h] task for the six monitoring stations: (a) Aoti Zhongxin, (b) Dongsi, (c) Shunyicheng, (d) Tiantan, (e) Haidan Wanliu, and (f) Wanshou Xigong.

Show All

Furthermore, we conducted a dispersion comparison analysis for the PM$_{2.5}$ concentrations between 0 and 400 $\mu {\text {g/m}} ^{3}$. The results demonstrate that the CombineDeepNet model's performance was best within this range. Overall, the results indicate that the CombineDeepNet model can efficiently forecast PM$_{2.5}$ concentrations. The model's performance was consistent across the six monitoring stations, suggesting that it can be applied in different geographical locations. Further testing and validation on other datasets will be required to confirm the generalizability of the proposed model.

Therefore, our study presents a promising approach for predicting PM$_{2.5}$ concentrations, which might be valuable for air quality monitoring and management.

SECTION V.

Discussion

Making long-term predictions of PM$_{2.5}$ is a difficult task, as it requires highly correlated historical pollutant and meteorological data. Moreover, it is difficult to establish the duration of the input sequence, which makes accurate prediction much more difficult. However, in this article, we propose CombineDeepNet, which can easily achieve long-term prediction for PM$_{2.5}$ concentration. The CombineDeepNet model achieves accurate long-term forecasting of PM$_{2.5}$ concentrations by combining various types of data, including historical pollutant and meteorological data, as well as other environmental data. This allows the model to capture complex relationships between different environmental variables and accurately predict PM$_{2.5}$ concentrations over a long period of time.

Table VI shows that the proposed model's RMSE values are closely related to the LSTM and GRU models, and outperform the CNN, CNN-LSTM, and CNN-GRU models for long-term lead time for all six monitoring stations. The RMSE values of the proposed model ranged from 6.70 to 23.50 $\mu {\text {g/m}} ^{3}$ for Aoti Zhongxin, 8.7 to 30.4 $\mu {\text {g/m}} ^{3}$ for Dongsi, 6.70 to 37.3 $\mu {\text {g/m}} ^{3}$ for Shunyicheng, 6.20 to 34.6 $\mu {\text {g/m}} ^{3}$ for Tiantan, 8.50 to 43.0 $\mu {\text {g/m}} ^{3}$ for Haidan Wanliu, and 6.90 to 22.0 $\mu {\text {g/m}} ^{3}$ for Wanshou Xigong. The RMSE values of the other five models ranged from 5.40 to 49.0 $\mu {\text {g/m}} ^{3}$ for Aoti Zhongxin, 5.60 to 56.6 $\mu {\text {g/m}} ^{3}$ for Dongsi, 5.80 to 60.8 $\mu {\text {g/m}} ^{3}$ for Shunyicheng, 5.10 to 64.0 $\mu {\text {g/m}} ^{3}$ for Tiantan, 4.40 to 59.9 $\mu {\text {g/m}} ^{3}$ for Haidan Wanliu, and 7.20 to 40.9 $\mu {\text {g/m}} ^{3}$ for Wanshou Xigong.

Similarly, Tables VII and VIII demonstrate that the CombineDeepNet models' MAE and R$^{2}$ values are closely related to the LSTM and GRU models, as well as outperform the CNN, CNN-LSTM, and CNN-GRU models for the long-term lead time for all six monitoring stations. The MAE values of the proposed model ranged from 2.40 to 3.40 $\mu {\text {g/m}} ^{3}$ for Aoti Zhongxin, 2.70 to 3.90 $\mu {\text {g/m}} ^{3}$ for Dongsi, 2.30 to 4.20 $\mu {\text {g/m}} ^{3}$ for Shunyicheng, 2.20 to 4.10 $\mu {\text {g/m}} ^{3}$ for Tiantan, 2.70 to 4.50 $\mu {\text {g/m}} ^{3}$ for Haidan Wanliu, and 2.4 to 3.50 $\mu {\text {g/m}} ^{3}$ for Wanshou Xigong. Conversely, the MAE values of the other five models ranged from 2.10 to 5.80 $\mu {\text {g/m}} ^{3}$ for Aoti Zhongxin, 2.10 to 5.40 $\mu {\text {g/m}} ^{3}$ for Dongsi, 2.10 to 5.50 $\mu {\text {g/m}} ^{3}$ for Shunyicheng, 1.90 to 6.00 $\mu {\text {g/m}} ^{3}$ for Tiantan, 1.90 to 6.00 $\mu {\text {g/m}} ^{3}$ for Haidan Wanliu, and 2.50 to 4.90 $\mu {\text {g/m}} ^{3}$ for Wanshou Xigong. The R$^{2}$ values of the proposed model ranged from 0.78 to 0.99 for Aoti Zhongxin, 0.68 to 0.98 for Dongsi, 0.66 to 0.96 for Shunyicheng, 0.83 to 0.97 for Tiantan, 0.60 to 0.95 for Haidan Wanliu, and 0.74 to 0.98 for Wanshou Xigong. Whereas the R$^{2}$ values of the other five models ranged from 0.23 to 0.99 for Aoti Zhongxin, 0.46 to 0.98 for Dongsi, 0.22 to 0.95 for Shunyicheng, 0.08 to 0.96 for Tiantan, 0.24 to 0.97 for Haidan Wanliu, and 0.53 to 0.98 for Wanshou Xigong.

The experimental findings indicate that the CombineDeepNet is very accurate at forecasting PM$_{2.5}$ concentrations for long-term scenarios at numerous target sites. In particular, Fig. 8 depicts the RMSE values of the CombineDeepNet model and other five models at six monitoring stations, showing that the CombineDeepNet model had the lowest RMSE values or similar RMSE values in comparison to the other models. This finding implies that the CombineDeepNet model performs better than other models in terms of long-term forecast accuracy for PM$_{2.5}$ concentrations. In addition, the results presented in Fig. 9 show that the proposed model had the lowest MAE values for all six monitoring stations in the long-term scenarios. This finding suggests that the CombineDeepNet model is highly effective in accurately predicting PM$_{2.5}$ concentrations over an extended period.

The optimal values for the RMSE, MAE, and R$^{2}$ for the proposed CombineDeepNet model were also reported. The optimal RMSE value was 22.0 $\mu {\text {g/m}} ^{3}$, the optimal MAE value was 3.4 $\mu {\text {g/m}} ^{3}$, and the optimal R$^{2}$ value was 0.96 for the long-term lead time. These values indicate that the CombineDeepNet model can efficiently forecast PM$_{2.5}$ concentrations for an extended period with minimal errors. The experimental findings demonstrate that the CombineDeepNet model is quite good at forecasting PM$_{2.5}$ concentrations in long-term situations across various target sites. The suggested model seems to be able to accurately forecast PM$_{2.5}$ concentrations with minimum errors over a long period of time, as shown by the model's superior accuracy compared with previous models.

In this research, we describe the findings of our investigation, in which we selected a consecutive 180-h test dataset and displayed it in Figs. 6 and 7 for Aoti Zhongxin and Wanshou Xigong monitoring stations. Our primary goal was to evaluate the fitting ability of the CombineDeepNet model and verify our supposition that it performs better on the new dataset. The figures also demonstrate that, in the long-term scenario, the CombineDeepNet model fits the new data better than the CNN, LSTM, GRU, CNN-LSTM, and CNN-GRU models. Figs. 6 and 7 depict the variations in prediction accuracy between CNN-LSTM, CNN-GRU, and models such as CombineDeepNet, CNN, LSTM, and GRU concerning short-term and long-term lead times in air quality data. These differences can be attributed to the architectural characteristics and capabilities of these models. CNN-LSTM and CNN-GRU are hybrid models that combine CNN layers for spatial feature extraction with recurrent layers (LSTM or GRU) for temporal sequence modeling. While they have the capacity to capture both short-term and long-term dependencies, the interplay between these layers can sometimes lead to a bias toward short-term patterns. In certain cases, the convolutional layers may overemphasize short-term fluctuations in the data, causing the models to focus excessively on recent data. This can lead to suboptimal performance for short-term predictions compared with models that are specifically designed for short-term tasks.

In addition, Fig. 11 displays the fitting trend of the test dataset. Furthermore, we conducted a dispersion comparison analysis for PM$_{2.5}$ concentrations between 0 and 400 $\mu {\text {g/m}} ^{3}$. Our results demonstrate that the CombineDeepNet model's performance was best within this range. Overall, the results indicate that the CombineDeepNet model can efficiently forecast PM$_{2.5}$ concentrations. Furthermore, the model's performance was consistent across the six monitoring stations, suggesting its applicability in different geographical locations. However, further testing and validation on other datasets will be necessary to confirm the generalizability of the CombineDeepNet model.

In light of the experimental findings, we believe that the suggested CombineDeepNet can better forecast long-term PM$_{2.5}$ concentrations. The proposed model outperformed CNN, LSTM, GRU, CNN-LSTM, and CNN-GRU in terms of RMSE, MAE, and R$^{2}$ values for all six monitoring stations. The lower values of RMSE and MAE, as well as the higher values of R$^{2}$ indicate that the proposed model has a better fit to the observed data.

SECTION VI.

Conclusion

The CombineDeepNet model, proposed based on a hybrid architecture and data correlation principles, represents an effective approach for forecasting PM$_{2.5}$ concentration across several monitoring stations. The CombineDeepNet model consists of three modules that tract the preliminary and deep features of pollutants and meteorological data from six monitoring stations and combine the features to predict the PM$_{2.5}$ level. The advantages of the CombineDeepNet model include better extraction of deep features and better correlation of pollutants and meteorological data, which can be applied to multiple sites for multistep long-term forecasting of PM$_{2.5}$ concentration.

The experimental findings show that by completely extracting the correlated information between pollutants and meteorological data, the CombineDeepNet model outperforms conventional approaches like CNN, LSTM, GRU, CNN-LSTM, and CNN-GRU. Furthermore, the CombineDeepNet model effectively addresses the challenge of long-term dependencies, which play a crucial role in predicting pollutant hazards. Overall, the CombineDeepNet model provides a promising network for forecasting PM$_{2.5}$ concentration in multiple monitoring stations. The capability of the proposed model to extract profound features and establish correlations between pollutant and meteorological data, coupled with its ability to solve long-term dependency problems, makes it a valuable tool for predicting PM$_{2.5}$ concentration and warning against pollutant hazards.

One limitation of this article is that the information from neighboring locations is not linked to the intended target area. In future work, the correlation between nearby location data and the target location will be incorporated into the input features to improve the long-term prediction of PM$_{2.5}$ concentration.

References is not available for this document.

CombineDeepNet: A Deep Network for Multistep Prediction of Near-Surface PM $_{2.5}$ Concentration

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency: