Introduction
Load forecasting is of crucial importance for the reliable operation of power systems, which plays an essential role in energy management, economic dispatch, and maintenance planning in power grids [1], [2]. In general, electricity load forecasting can be categorized into four categories according to the time horizon: short-term load forecasting (STLF), medium-term load forecasting (MTLF), and long-term load forecasting (LTLF) [3]. STLF focuses on predicting the load in the next several minutes to at most one week [4]. As restructuring and deregulation deepen in the power industry, the STLF has become one of the most significant tasks for the electric utilities and power providers in liberalized electricity markets. Short-term electricity load demand varies greatly and is affected by a number of external factors. Therefore, an accurate and reliable STLF is beneficial for the strategy design, reliability estimation, security analysis, and spot price calculation in future smart grids [5], [6].
In the last decades, various STLF algorithms have been developed by researchers. The representative methods used for STLF can be broadly divided into three categories: regressions, similar day methods, and machine-learning approaches [7]. Regression methods mainly include exponential smoothing [8], multiple linear regression [9], autoregressive moving average (ARMA) [10], and autoregressive integrated moving average (ARIMA) [11]. These methods are able to obtain the quantitative relationships between the load and its influential factors when deal with the linear load forecasting tasks. However, the linear regression methods are inadequate for nonlinear problems like analyzing the relationship between the load and the electricity price.
Similar day methods select the historical days similar to the forecasted day based on influential factors for electricity demand [12]. In these methods, the load to be predicted is the load of the most similar day or the load obtained by an appropriate calculation of loads of several similar days. Although similar day methods are simple and intuitionistic, they are not competent in capturing intricate load features if used alone [7]. Therefore, similar day methods are commonly used for selecting the initial input of forecasting models.
Machine-learning methods such as Kalman filtering [13], support vector regression (SVR) [14], [15], regression trees [16], random forest (RF) [17], artificial neural networks (ANNs) [18], [19], and deep neural networks (DNNs) [20], [21] have been applied to STLF. In recent years, constructing STLF models with DNN is a relatively hot research topic and has become the mainstream approach to solve the STLF problem. In contrast to classical neural networks, DNNs have the ability of automatic feature extraction and may extract more features via adding layers and constructing structures. The authors of [22] reviewed several DNNs for STLF, among which recurrent neural networks (RNNs) and convolutional neural networks (CNNs) are the two most widely adopted models. In [23], a variant of RNN called Long short-term memory (LSTM) was used for short-term residential load forecasting and achieved high accuracy. An efficient STLF model combining empirical mode decomposition (EMD) and LSTM was proposed for noise-free data training in [24]. Firstly, the EMD algorithm was applied to decompose the load time series into some intrinsic mode functions (IMFs) and residual. An LSTM model was then trained separately on each EMD component. Finally, the prediction results of all EMD components were added together to determine an aggregated prediction output.
The CNN was primarily designed for image recognition, while it has recently been employed for STLF [25]. In [26], a time-dependent CNN (TD-CNN) was developed to improve the forecasting accuracy of STLF based on only the historical electricity load data. CNNs are well suited for processing load data since they are adept at solving highly nonlinear problems and extracting the high-level spatial features from the input data [27], [28]. Nevertheless, CNNs may not provide good accuracy when there are high volatility and uncertainty in load data. Whereas, RNN and its variants are effective in handling time series data, especially in extracting the dynamic temporal information [23]. Based on these observations, an emerging model combining CNN and LSTM (CNN-LSTM) to learn both spatial and temporal characteristics of input data has been presented [29]–[32]. Utilizing the CNN-LSTM model to address the STLF problems can be found only in several recent work. In [33] and [34], a CNN-LSTM model combining residual network (ResNet, a typical improved CNN) with LSTM and a hybrid CNN-LSTM model based on CNN and LSTM were respectively proposed for the field of STLF. In these two CNN-LSTM models, CNN was applied to extract features and arrange them into vectors, while LSTM was fed into the vectors for load forecasting. Since LSTM is trained by the extracted feature vector from CNN, the STLF based on above two CNN-LSTM models may lose some important temporal feature or spatial feature information of original input data.
Based on above considerations, this article presents an EMD-DNN exploiting transitional forecasting scheme that integrates the techniques of EMD, similar day selection, and DNNs to fully exploit information of input data. More specifically, our proposed scheme has two major layers: a feature extraction layer and a forecasting layer.
In the feature extraction layer, there are four modules and two of them are based on the CNN-LSTM model. These two modules are responsible for the three processing steps of electricity load/price data: 1) EMD decomposes the load/price time series into a group of IMFs and a residue; 2) based on the idea of VGGNet, CNN extracts the spatial features in the 2-D matrix composed of EMD components; 3) LSTM extracts temporal features with the fusion of the extracted feature vector and the original load/price sequence as its input. Consequently, the multimodal spatial-temporal features of original input data can be extracted. Furthermore, a multilayer fully-connected neural network module and a similar day selection module, which are based on day and hour factors, are used to augment extra features for the forecasting layer.
In the forecasting layer, there is only a multilayer fully-connected neural network that consists of two fully-connected (FC) layers. The forecasting is finally performed by the fully-connected neural network fed into the outputs of the feature extraction layer.
The main contributions of this article are summarized as follows:
We present an EMD based CNN-LSTM approach that is able to extract multimodal spatial-temporal features in electricity load/price time series. Particularly, the CNN model is built based on the idea of VGGNet, which can lessen parameters and have stronger extraction capability than classical CNNs.
A novel transitional forecasting approach that contains a feature extraction layer and a forecasting layer is proposed. The data of electricity load/price and other influential factors are firstly used for STLF in the feature extraction layer. Corresponding results of STLF are taken as the transitional predictions and then input into the forecasting layer to perform the final forecasting. Compared with other studies that have only one prediction layer and complete STLF with the original data directly, the proposed approach is able to capture more potential information through transitional predictions.
By taking the loads of several similar days as an input of the forecasting layer, the advantages of the similar day methods are incorporated into the proposed scheme. Importantly, the proposed scheme increases the importance of the similar day loads, thereby significantly improving the forecasting accuracy.
The effectiveness of the proposed scheme is validated with real data collected from the electricity market in Singapore. Compared with TD-CNN, CNN-LSTM model, ResNet/LSTM combined model, and similar day-based wavelet neural networks (SIWNN), the proposed scheme has better performance in various metrics.
The remainder of this article is organized as follows. Section II illustrates the main methodologies involved in the proposed scheme. Section III elaborates on the proposed scheme, and Section IV presents the experimental results of the proposed scheme. Conclusions are drawn in Section V.
Methodology
A. Convolutional Neural Network
As a typical kind of deep artificial neural networks, the CNN, proposed by Song et al. [29], draws inspirations from the concept of hierarchical processing of information in the visual cortex channel. Generally, CNN is employed for pattern recognition and feature extraction on tasks such as visual imagery, video, and text. CNN replaces the general matrix multiplication by convolution in at least one layer of the network. The operation of 2-dimensional convolution can be represented as:\begin{equation*}s(i,j)=(I*K)(i,j)=\sum \limits _{l}{\sum \limits _{m}{I(l,m)K(i+l,j+m)}} \tag{1}\end{equation*}
As illustrated in Fig. 1, CNN is a multi-layer artificial neural network, which is usually composed of three crucial layers: 1) convolutional layer; 2) pooling layer; 3) FC layer. The convolutional layer extracts various features of the input by applying the convolution operation. In order to accelerate the calculation process effectively, the pooling layer reduces the size of the feature map obtained from the convolutional layer. Through multiple layers of convolution and pooling operations, the topological features can be captured from input data. Finally, the FC layer makes uses of these features to calculate the final result for classification or regression.
B. Long Short-Term Memory
RNN is an excellent type of sequence-based models, which can construct the temporal correlations existing in the time series data effectively. However, it is always an arduous task for the traditional RNN to learn long-range dependencies due to the gradient exploding or disappearing problems. LSTM is specially designed for resolving the aforementioned weaknesses. Different from the standard RNN, LSTM uses the concept of gates to replace classical hidden nodes with hidden memory. Also, LSTM networks have the same chain structure as RNN, which consist of a series of cyclical cells. Nevertheless, LSTM is much more complicated than RNN as there are four layers in an LSTM block.
Fig. 2 describes the inner architecture of an LSTM block. A typical LSTM block comprises four parts: the memory cell \begin{align*} {f_{g}}(k)=&\sigma ({W_{f}}\cdot h(k-1)+{R_{f}}\cdot v(k)+{b_{f}})\tag{2}\\ {i_{g}}(k)=&\sigma ({W_{i}}\cdot h(k-1)+{R_{i}}\cdot v(k)+{b_{i}})\tag{3}\\ {o_{g}}(k)=&\sigma ({W_{o}}\cdot h(k-1)+{R_{o}}\cdot v(k)+{b_{o}})\tag{4}\end{align*}
The update of the cell memory is determined by the forget gate and the input gate. More specifically, the state \begin{align*} {m_{c}}(k)=&{f_{g}}(k)\odot {m_{c}}(k-1)+{i_{g}}(k)\odot {g_{c}}(k)\tag{5}\\ {g_{c}}(k)=&\varphi ({W_{c}}\cdot h(k-1)+{R_{c}}\cdot v(k)+{b_{c}})\tag{6}\end{align*}
The corresponding output of the hidden layer is expressed as:\begin{equation*}h(k)=\varphi ({g_{c}}(k))\odot {o_{g}}(k) \tag{7}\end{equation*}
C. Empirical Mode Decomposition
EMD is a nonlinear analysis method that can transform non-stationary and nonlinear data to stationary and linear data. It breaks down a time series into a group of IMFs and a residue. Each IMF has symmetrical envelopes and the same number of zero-crossing points and extrema points. The local frequency of each IMF is different from the others. The residue stands for the trend. The steps of the EMD algorithm are expressed in Algorithm 1.
Algorithm 1 EMD (Empirical Mode Decomposition)
Identify all the local maxima and minima in a given time series
Apply cubic-spline interpolation to generate upper
Calculate the mean series of the two envelopes
Compute the difference between the initial data and the mean
Examine the characteristic of
If
is an IMF, calculate the residued(t) as:r(t) .r(t)=y(t)-d(t) If
is not an IMF, replaced(t) withy(t) and repeat steps 1 to 4.d(t)
Treat
Given a set of \begin{equation*}y(t)=r(t)+\sum \limits _{p=1}^{P}{{d_{p}}(t)} \tag{8}\end{equation*}
EMD has two significant advantages in time series analysis or prediction. One lies in its potent reconstruction property, which means all IMFs can reconstruct original time series data without any data loss. The other rests on its adept at acquiring the trend of non-stationary data. Therefore, EMD is very helpful for time series analysis or prediction.
D. Similar Day Selection Algorithm
The similar day selection is an approach to find similar days from the historical data. The selected similar days are usually taken as the input for the prediction task. For load forecasting, selecting appropriate similar days is an effective way to improve the performance of forecasting models. Under different circumstances, the factors related to load variation are different, and only a few of them are the dominant ones. For example, the weekday index is the dominant factor in metropolitan areas where the major consumption of electricity load is commercial and residential load. A good similar day selection algorithm should be able to identify the major factors of load change in different situations, and thereby ensuring an appropriate selection of similar days.
Let the normalized \begin{equation*}X={[x(1),x(2),\cdot \cdot \cdot,x(N)] ^{}}. \tag{9}\end{equation*}
For the day to be predicted and a historical day, the vector \begin{align*} {X_{0}}=&[{x_{0}}(1),{x_{0}}(2),\cdot \cdot \cdot,{x_{0}}(N)] \tag{10}\\ {X_{j}}=&[{x_{j}}(1),{x_{j}}(2),\cdot \cdot \cdot,{x_{j}}(N)] \tag{11}\end{align*}
The similarity can be defined as:\begin{align*} {F_{j}} = \prod \limits _{k=1}^{N}{{\varepsilon _{j}}(n)} \qquad \qquad \qquad \quad \qquad \tag{12}\\ \!\!\!\!\!\! {\varepsilon _{j}}(n) = \frac {\mathop{\min }\limits_{j}\,\mathop{\min }\limits_{n}\,\left |{ {x_{0}}(n)\!-\!{x_{j}}(n) }\right |+\rho \mathop{\max }\limits_{j}\,\mathop{\max }\limits_{n}\,\left |{ {x_{0}}(n)-{x_{j}}(n) }\right |}{\left |{ {x_{0}}(n)-{x_{j}}(n) }\right |+\rho \mathop{\max }\limits_{j}\,\mathop{\max }\limits_{n}\,\left |{ {x_{0}}(n)-{x_{j}}(n) }\right |} \\ \tag{13}\end{align*}
With the continuous multiplication in (12), the dominant factors can be identified easily and automatically without need of assigning weights to each factor. The process of the similar day selection algorithm is as follows:
Starting from the nearest historical day, the similarity value
of the historical dayF_{j} is calculated day by day in reverse.j The
days with the highest similarity in the nearestD days are selected as the similar days of the dayN to be forecasted.i
Implementation
This section describes the implementation of the proposed EMD-DNN exploiting transitional forecasting scheme. As shown in Fig. 3, the whole process is divided into three main steps: data preparation, training, prediction. In step 1, the electricity load data and price data are preprocessed in the order of data cleansing, EMD, and normalization. The normalized electricity load data and price data are then converted into 2-dimensional images. In addition to the electricity load and price, other types of factors including holidays, the days of the week and the hours of the day are also incorporated. All the datasets are finally divided into the training set and testing set. In step 2, the proposed scheme is built and trained. Particularly, the modules in the feature extraction layer are seperately trained on the training dataset at first. The results of prediction are then used to generate a new dataset. Finally, the fully-connected network in the forecasting layer is trained on the new dataset. The mean square error (MSE) is taken as the cost function. Adam optimizer is employed for optimizing the training models, and the learning rate is set to 0.001. In the final step, the performance of the proposed scheme is evaluated using the testing set.
A. Data Preparation
First of all, we briefly describe the data use. The data are derived from the electricity market in Singapore, which contains the data of total electricity consumption and electricity price in this country. In our paper, hourly data of a period of 2 years (from January 2014 to January 2016) are picked out as the used data sets. The ranges of electricity load data and electricity price data are [3795, 6850] (MW) and [21, 1318] ($/MWh), respectively. Furthermore, other types of factors such as the holiday index, the days of the week, and the hours of the day are manually added to the used data sets at one-hour resolution. We collect the public holidays in each year and use them as a binary feature. The holiday index of a day is 1 if this day is a holiday, otherwise it is 0. The data of 2014 are collected as the historical dataset for selecting similar days, while the data of 2015 are used as the forecasting dataset. The 80% of the forecasting dataset is separated as the training samples, and the rest 20% of the dataset is the testing samples. In both training samples and testing samples, the hourly load values are obtained by our proposed scheme using a moving window.
Among the collected data, there are some missing values and abnormal values. We replace these values with highly correlated data. After data cleansing, EMD is applied to the time series. As illustrated in Fig. 4, the electricity load series is decomposed into 11 IMFs and the residue, while the electricity price data is broken down into 15 IMFs and the residue. These components are processed by the min-max normalization in the range of [0, 1] for better training results. Additionally, the electricity load data and the electricity price data are normalized. The min-max normalization follows \begin{equation*}{{\bar {x}}_{i}}=\frac {{x_{i}}-{x_{\min }}}{x{}_{\max }-{x_{\min }}} \tag{14}\end{equation*}
Example of the decomposition of electricity load and price. (a) the decomposition of electricity load, (b) the decomposition of electricity price.
The crucial part of data preparation is to transform these normalized components into a proper form as the input of our CNN. We treat the processed components of electricity load/price as pixels of images and perform the prediction of the next hour using historical data of the most recent 672 hours. For example, the components of the time series of 672 historical electricity load data are rearranged into a matrix as below.\begin{align*}{X_{M}}=\left ({\begin{matrix} im{f_{1}}(1) &\quad im{f_{1}}(2) &\quad \cdots &\quad im{f_{1}}(672) \\ im{f_{2}}(1) &\quad im{f_{2}}(2) &\quad \cdots &\quad im{f_{2}}(672) \\ \vdots &\quad \vdots &\quad \ddots &\quad \vdots \\ im{f_{C}}(1) &\quad im{f_{C}}(2) &\quad \cdots &\quad im{f_{C}}(672) \\ \end{matrix} }\right) \tag{15}\end{align*}
B. EMD-DNN Exploiting Transitional Forecasting Scheme
This subsection elaborates on the proposed scheme for STLF in detail. The objective of the proposed scheme is to estimate the energy load for the next one hour with the given historical electricity load, historical electricity price, and day and hour information.
The entire framework of the proposed scheme is presented in Fig. 5. As illustrated in the framework, this scheme is composed of the abovementioned methodologies in Section II and contains two major layers: the feature extraction layer and the forecasting layer. The definition of all variables can be found in Table 1.
Framework of the proposed scheme. In the feature extraction layer, the CNN-LSTM combined model extracts multimodal spatial-temporal features from the historical electricity load and price, respectively. Meanwhile, the fully-connected neural network and similar day selection are employed to extract extra features in the day and hour information. The output predictions from the feature extraction layer are fused as inputs of the multilayer fully-connected neural network to accomplish the forecasting task. This framework ensures the scheme can extract sufficient latent characteristics, which enhanced understanding of the dataset.
In the feature extraction layer, there are four modules that perform load forecasting based on the data of different factors. The results of forecasting are the transitional predictions that represent the extracted features from the original data. For historical electricity load and price, there is an ingenious collaboration between CNN and LSTM in these two modules to take full advantage of the information. This collaboration is performed by the CNN-LSTM combined model depicted in Fig. 6.
There are two steps in the procedure of building the CNN-LSTM combined model. In the first step, we creatively constitute the CNN model that extracts spatial features of the abovementioned load/price image matrices in (15) based on the idea of VGGNet. VGGNet repeatedly stacks
In the second step, a fusion layer concatenates the feature vectors with the 672 samples of normalized load sequence. The concatenation is fed into the LSTM model to estimate the load of the next hour to capture the long-term temporary dependency features. The size of hidden state of the LSTM layer is 200. Based on the above two steps, the CNN-LSTM combined model can extract multimodal spatial-temporal features.
In our experiments, the input size of the CNN-LSTM combined model is different for electricity load and electricity price. The dimension of the electricity price matrix is
In addition to the abovementioned electricity load and price modules, a fully-connected neural network module with 10 hidden neurons learns the dynamic relationship between the load and other factors. Such factors include the days of the week, the hours of the day, and the holiday index. Furthermore, a similar day selection module employs the similarity in (12) to select the three most similar days from the past 30 days based on the historical electricity load, the historical electricity price, and the day information. The load affecting factor vector is described as follows:\begin{equation*} {V_{s}}=[{L_{H}},{P_{H}},{D_{I}}] \tag{16}\end{equation*}
In the forecasting layer, there is only a fully-connected network that consists of two hidden layers with 20 hidden neurons. The outputs of the feature extraction layer are fused and fed to the fully-connected neural network to learn and forecast the ultimate results. With the transitional predictions, the proposed scheme enhances its extraction capability and is capable of capturing more latent features.
Numerical Experiments
A. Evaluation Metrics
In this study, the forecasting performance of the proposed scheme is evaluated using three evaluation metrics, including root mean square error (RMSE), mean absolute error (MAE) and mean absolute percentage error (MAPE). These performance measures are popularly applied in tasks of STLF, and their definitions are given by:\begin{align*} \text {RMSE}=&\sqrt {\frac {1}{T}{{\sum \limits _{i=1}^{T}{({L_{i}}-{{{\overset {\scriptscriptstyle \frown }{L}}}_{i}})}}^{2}}} \tag{17}\\[3pt] \text {MAE}=&\frac {1}{T}\sum \limits _{i=1}^{T}{\left |{ {L_{i}}-{{{\overset {\scriptscriptstyle \frown }{L}}}_{i}} }\right |} \tag{18}\\[3pt] \text {MAPE}=&\frac {1}{T}\sum \limits _{i=1}^{T}{\frac {\left |{ {L_{i}}-{{{\mathop{L}\limits^{\scriptscriptstyle \frown }}}_{i}} }\right |}{{L_{i}}}}\tag{19}\end{align*}
B. Models for Comparison
In order to demonstrate the performance of the proposed scheme, four existing models for STLF are introduced for comparisons.
1) Model 1: TD-CNN
TD-CNN is a multilayer neural network composed of convolutional layers and FC layers. According to [26], the convolutional layer of TD-CNN has a special kernel size and can capture the local pattern with similar characteristics. Different from the classic CNN model, TD-CNN removes the pooling layer so as to keep the finer features.
2) Model 2: CNN-LSTM Model
The standard CNN and LSTM are described above. In [35], the CNN-LSTM model sequentially extracts the spatial-temporal features from the input data. The fully-connected neural network is used for achieving load prognostics.
3) Model 3: ResNet/LSTM Combined Model
The description of this model can be found in [33]. The ResNet extracts the features from 2-D load images by using its 12 layers. Then, the LSTM model is applied to perform finally forecasting based on the feature vectors.
4) Model 4: SIWNN
SIWNN discussed in [7] picks out the load of the similar day as the input load, and employ the wavelet decomposition and separate neural networks to capture the features of load at low and high frequencies.
C. Results
To verify the performance of the proposed scheme, the experiments with this scheme and four comparative models are implemented on the same dataset mentioned in Section III. Figs. 7(a) and 7(b) depict two samples of the predictions based on the proposed scheme in the training and testing sets, respectively. In these figures, the actual value of load is compared with the estimation at each timestamp. Obviously, our proposed scheme is able to follow the general trend of load variation very well in both training and testing.
Estimation of our proposed scheme. (a) 72 samples of the estimated results on the training set, (b) 72 samples of the estimated results on the testing set.
Table 2 compares the estimation performance of different schemes on the entire testing set in terms of the RMSE, MAE, and MAPE. It is obvious that our scheme performs better than other four models on all three metrics. For example, compared with the ResNet/LSTM combined model that is based on DNN, the proposed scheme reduces the value of RMSE, MAE, and MAPE by 29.75%, 13.02%, and 13.99%, respectively. In addition, compared with the SIWNN that also employs the similar day selection method, our scheme reduces the value of RMSE, MAE, and MAPE by 26.66%, 16.97%, and 17.45%, respectively.
Figs. 8 and 9 show the curve of load forecasting based on different models. It is observed that our proposed scheme is able to track the load variation in the next hour more effectively and accurately. It is worth noting that electricity consumption varies on weekends, holidays, and weekdays. Since the electricity consumption is the largest during weekdays, the load forecasting on weekdays is the most important. The performance of the proposed scheme and other four comparative models on load forecasting of weekdays is listed in Table 3. From this table, it is clear that compared with the existing four models, the proposed scheme has obvious advantages in predicting the load variation during weekdays. Therefore, our proposed scheme is more effective for STLF. Furthermore, our scheme is significantly better than other available models since this scheme takes full advantage of the extraction capacity of each method. In brief, our proposed scheme is superior to other models, and it is encouraging for short-term load demand prognostics tasks.
Load forecasting for the next hour. (a) forecasting results for November, (b) forecasting results for December. Model 3 deviates from the actual load drastically. Although Model 1 and Model 2 are close, in contrast, our proposed scheme achieves the best accuracy over the dataset.
Conclusion
In this article, we have presented an EMD-DNN exploiting transitional forecasting scheme to address the short-term load forecasting problem. This scheme combines deep neural networks, empirical mode decomposition, and similar day selection for an accurate estimation of the energy load in the next hour. Based on these methods, the proposed scheme can extract different load features in terms of various influential factors of load. Particularly, a novel EMD based CNN-LSTM approach is proposed to extract multimodal spatial-temporal features from electricity load/price data. A fully-connected network and a similar day selection are applied to capture the day and hour information. In conclusion, the proposed scheme enhances the extraction capacity and is able to capture more potential information providing a precise result. With the actual data from the Singapore electricity market, our scheme is compared with TD-CNN, CNN-LSTM model, ResNet/LSTM combined model, and SIWNN. The results demonstrate that the proposed EMD-DNN exploiting transitional forecasting scheme has a higher forecasting accuracy than the other four models.