Journals & Magazines >IEEE Access >Volume: 8

Multimodal Feature Extraction and Fusion Deep Neural Networks for Short-Term Load Forecasting

Framework of the proposed scheme. In the feature extraction layer, the CNN-LSTM combined model extracts multimodal spatial-temporal features from the historical electrici...

Abstract:

Precise and reliable forecasting of short-term electricity load is essential to the development of smart grids. Particularly, deep neural networks (DNNs) are widely utili...Show More

Society Section: IEEE Power & Energy Society Section

Metadata

Abstract:

Precise and reliable forecasting of short-term electricity load is essential to the development of smart grids. Particularly, deep neural networks (DNNs) are widely utilized for the prediction of short-term electricity load due to their automatic feature extraction ability. However, these available stacked deep-learning models may lose some temporal features or spatial features of original input data. To capture more comprehensive information, in this article, we present an integration scheme based on empirical mode decomposition (EMD), similar day methods, and DNNs to perform short-term load forecasting. It is especially worth noting that the electricity price is also an important factor for load variation, which is considered in our proposed scheme. Specifically, there are two primary layers: a feature extraction layer and a forecasting layer. In the feature extraction layer, EMD is applied to decompose load time series into several components, which are arranged into the 2-D input matrix of the convolutional neural network (CNN). Both the output vectors of the CNN and the raw load sequences are fed into the long short-term memory (LSTM) layer. Therefore, the whole EMD based CNN-LSTM approach extracts multimodal spatial-temporal features from input data. Meanwhile, the electricity price data is utilized to obtain multimodal spatial-temporal features in the same way. Additionally, the day and hour information and loads of similar days are to augment extra features for prediction. In the forecasting layer, the forecasting task is accomplished through a fully-connected neural network based on the outputs of the feature extraction layer. Leveraging these techniques enables our proposed scheme to extract more latent features, which significantly improve the accuracy. In order to demonstrate the performance of our proposed scheme, related experiments are conducted on actual data from the electricity market in Singapore. Compared to other available models, our proposed s...

Society Section: IEEE Power & Energy Society Section

Framework of the proposed scheme. In the feature extraction layer, the CNN-LSTM combined model extracts multimodal spatial-temporal features from the historical electrici...

Published in: IEEE Access ( Volume: 8)

Page(s): 185373 - 185383

Date of Publication: 09 October 2020

Electronic ISSN: 2169-3536

DOI: 10.1109/ACCESS.2020.3029828

Funding Agency:

Contents

CCBY - IEEE is not the copyright holder of this material. Please follow the instructions via https://creativecommons.org/licenses/by/4.0/ to obtain full-text articles and stipulations in the API documentation.

SECTION I.

Introduction

Load forecasting is of crucial importance for the reliable operation of power systems, which plays an essential role in energy management, economic dispatch, and maintenance planning in power grids [1], [2]. In general, electricity load forecasting can be categorized into four categories according to the time horizon: short-term load forecasting (STLF), medium-term load forecasting (MTLF), and long-term load forecasting (LTLF) [3]. STLF focuses on predicting the load in the next several minutes to at most one week [4]. As restructuring and deregulation deepen in the power industry, the STLF has become one of the most significant tasks for the electric utilities and power providers in liberalized electricity markets. Short-term electricity load demand varies greatly and is affected by a number of external factors. Therefore, an accurate and reliable STLF is beneficial for the strategy design, reliability estimation, security analysis, and spot price calculation in future smart grids [5], [6].

In the last decades, various STLF algorithms have been developed by researchers. The representative methods used for STLF can be broadly divided into three categories: regressions, similar day methods, and machine-learning approaches [7]. Regression methods mainly include exponential smoothing [8], multiple linear regression [9], autoregressive moving average (ARMA) [10], and autoregressive integrated moving average (ARIMA) [11]. These methods are able to obtain the quantitative relationships between the load and its influential factors when deal with the linear load forecasting tasks. However, the linear regression methods are inadequate for nonlinear problems like analyzing the relationship between the load and the electricity price.

Similar day methods select the historical days similar to the forecasted day based on influential factors for electricity demand [12]. In these methods, the load to be predicted is the load of the most similar day or the load obtained by an appropriate calculation of loads of several similar days. Although similar day methods are simple and intuitionistic, they are not competent in capturing intricate load features if used alone [7]. Therefore, similar day methods are commonly used for selecting the initial input of forecasting models.

Machine-learning methods such as Kalman filtering [13], support vector regression (SVR) [14], [15], regression trees [16], random forest (RF) [17], artificial neural networks (ANNs) [18], [19], and deep neural networks (DNNs) [20], [21] have been applied to STLF. In recent years, constructing STLF models with DNN is a relatively hot research topic and has become the mainstream approach to solve the STLF problem. In contrast to classical neural networks, DNNs have the ability of automatic feature extraction and may extract more features via adding layers and constructing structures. The authors of [22] reviewed several DNNs for STLF, among which recurrent neural networks (RNNs) and convolutional neural networks (CNNs) are the two most widely adopted models. In [23], a variant of RNN called Long short-term memory (LSTM) was used for short-term residential load forecasting and achieved high accuracy. An efficient STLF model combining empirical mode decomposition (EMD) and LSTM was proposed for noise-free data training in [24]. Firstly, the EMD algorithm was applied to decompose the load time series into some intrinsic mode functions (IMFs) and residual. An LSTM model was then trained separately on each EMD component. Finally, the prediction results of all EMD components were added together to determine an aggregated prediction output.

The CNN was primarily designed for image recognition, while it has recently been employed for STLF [25]. In [26], a time-dependent CNN (TD-CNN) was developed to improve the forecasting accuracy of STLF based on only the historical electricity load data. CNNs are well suited for processing load data since they are adept at solving highly nonlinear problems and extracting the high-level spatial features from the input data [27], [28]. Nevertheless, CNNs may not provide good accuracy when there are high volatility and uncertainty in load data. Whereas, RNN and its variants are effective in handling time series data, especially in extracting the dynamic temporal information [23]. Based on these observations, an emerging model combining CNN and LSTM (CNN-LSTM) to learn both spatial and temporal characteristics of input data has been presented [29]–[32]. Utilizing the CNN-LSTM model to address the STLF problems can be found only in several recent work. In [33] and [34], a CNN-LSTM model combining residual network (ResNet, a typical improved CNN) with LSTM and a hybrid CNN-LSTM model based on CNN and LSTM were respectively proposed for the field of STLF. In these two CNN-LSTM models, CNN was applied to extract features and arrange them into vectors, while LSTM was fed into the vectors for load forecasting. Since LSTM is trained by the extracted feature vector from CNN, the STLF based on above two CNN-LSTM models may lose some important temporal feature or spatial feature information of original input data.

Based on above considerations, this article presents an EMD-DNN exploiting transitional forecasting scheme that integrates the techniques of EMD, similar day selection, and DNNs to fully exploit information of input data. More specifically, our proposed scheme has two major layers: a feature extraction layer and a forecasting layer.

In the feature extraction layer, there are four modules and two of them are based on the CNN-LSTM model. These two modules are responsible for the three processing steps of electricity load/price data: 1) EMD decomposes the load/price time series into a group of IMFs and a residue; 2) based on the idea of VGGNet, CNN extracts the spatial features in the 2-D matrix composed of EMD components; 3) LSTM extracts temporal features with the fusion of the extracted feature vector and the original load/price sequence as its input. Consequently, the multimodal spatial-temporal features of original input data can be extracted. Furthermore, a multilayer fully-connected neural network module and a similar day selection module, which are based on day and hour factors, are used to augment extra features for the forecasting layer.

In the forecasting layer, there is only a multilayer fully-connected neural network that consists of two fully-connected (FC) layers. The forecasting is finally performed by the fully-connected neural network fed into the outputs of the feature extraction layer.

The main contributions of this article are summarized as follows:

We present an EMD based CNN-LSTM approach that is able to extract multimodal spatial-temporal features in electricity load/price time series. Particularly, the CNN model is built based on the idea of VGGNet, which can lessen parameters and have stronger extraction capability than classical CNNs.
A novel transitional forecasting approach that contains a feature extraction layer and a forecasting layer is proposed. The data of electricity load/price and other influential factors are firstly used for STLF in the feature extraction layer. Corresponding results of STLF are taken as the transitional predictions and then input into the forecasting layer to perform the final forecasting. Compared with other studies that have only one prediction layer and complete STLF with the original data directly, the proposed approach is able to capture more potential information through transitional predictions.
By taking the loads of several similar days as an input of the forecasting layer, the advantages of the similar day methods are incorporated into the proposed scheme. Importantly, the proposed scheme increases the importance of the similar day loads, thereby significantly improving the forecasting accuracy.
The effectiveness of the proposed scheme is validated with real data collected from the electricity market in Singapore. Compared with TD-CNN, CNN-LSTM model, ResNet/LSTM combined model, and similar day-based wavelet neural networks (SIWNN), the proposed scheme has better performance in various metrics.

The remainder of this article is organized as follows. Section II illustrates the main methodologies involved in the proposed scheme. Section III elaborates on the proposed scheme, and Section IV presents the experimental results of the proposed scheme. Conclusions are drawn in Section V.

SECTION II.

Methodology

A. Convolutional Neural Network

As a typical kind of deep artificial neural networks, the CNN, proposed by Song et al. [29], draws inspirations from the concept of hierarchical processing of information in the visual cortex channel. Generally, CNN is employed for pattern recognition and feature extraction on tasks such as visual imagery, video, and text. CNN replaces the general matrix multiplication by convolution in at least one layer of the network. The operation of 2-dimensional convolution can be represented as: $\begin{equation*}s(i,j)=(I*K)(i,j)=\sum \limits _{l}{\sum \limits _{m}{I(l,m)K(i+l,j+m)}} \tag{1}\end{equation*}$ View Source where $I$ is a two-dimensional input matrix, and $K$ is a two-dimensional kernel matrix. $s$ denotes the resulting feature map after the convolution.

As illustrated in Fig. 1, CNN is a multi-layer artificial neural network, which is usually composed of three crucial layers: 1) convolutional layer; 2) pooling layer; 3) FC layer. The convolutional layer extracts various features of the input by applying the convolution operation. In order to accelerate the calculation process effectively, the pooling layer reduces the size of the feature map obtained from the convolutional layer. Through multiple layers of convolution and pooling operations, the topological features can be captured from input data. Finally, the FC layer makes uses of these features to calculate the final result for classification or regression.

FIGURE 1.

The classical convolutional neural network architecture.

Show All

B. Long Short-Term Memory

RNN is an excellent type of sequence-based models, which can construct the temporal correlations existing in the time series data effectively. However, it is always an arduous task for the traditional RNN to learn long-range dependencies due to the gradient exploding or disappearing problems. LSTM is specially designed for resolving the aforementioned weaknesses. Different from the standard RNN, LSTM uses the concept of gates to replace classical hidden nodes with hidden memory. Also, LSTM networks have the same chain structure as RNN, which consist of a series of cyclical cells. Nevertheless, LSTM is much more complicated than RNN as there are four layers in an LSTM block.

Fig. 2 describes the inner architecture of an LSTM block. A typical LSTM block comprises four parts: the memory cell ${m_{c}}$ , the forget gate ${f_{g}}$ , the input gate ${i_{g}}$ , and the output gate ${o_{g}}$ . The three gates are utilized for dropping, refreshing, and controlling the information in the memory cell. At time step $k$ , the inputs of the block are the current sequence vector $v(k)$ , the previously hidden state $h(k-1)$ , and the cell memory ${m_{c}}(k-1)$ . The outputs are the current hidden state $h(k)$ and the cell memory ${m_{c}}(k)$ . Mathematical formulas of the forget gate, the input gate, and the output gate are given by: $\begin{align*} {f_{g}}(k)=&\sigma ({W_{f}}\cdot h(k-1)+{R_{f}}\cdot v(k)+{b_{f}})\tag{2}\\ {i_{g}}(k)=&\sigma ({W_{i}}\cdot h(k-1)+{R_{i}}\cdot v(k)+{b_{i}})\tag{3}\\ {o_{g}}(k)=&\sigma ({W_{o}}\cdot h(k-1)+{R_{o}}\cdot v(k)+{b_{o}})\tag{4}\end{align*}$ View Source

FIGURE 2.

The inner architecture of an LSTM block.

Show All

The update of the cell memory is determined by the forget gate and the input gate. More specifically, the state ${m_{c}}(k)$ is refreshed according to the following equations: $\begin{align*} {m_{c}}(k)=&{f_{g}}(k)\odot {m_{c}}(k-1)+{i_{g}}(k)\odot {g_{c}}(k)\tag{5}\\ {g_{c}}(k)=&\varphi ({W_{c}}\cdot h(k-1)+{R_{c}}\cdot v(k)+{b_{c}})\tag{6}\end{align*}$ View Source

The corresponding output of the hidden layer is expressed as: $\begin{equation*}h(k)=\varphi ({g_{c}}(k))\odot {o_{g}}(k) \tag{7}\end{equation*}$ View Source where ${W_{f}}$ , ${W_{i}}$ , ${W_{o}}$ , and ${W_{c}}$ are the weight matrices associated with the previous output $h$ . ${R_{f}}$ , ${R_{i}}$ , ${R_{o}}$ , and ${R_{c}}$ are the weight matrices associated with the current input $v$ . ${b_{f}}$ , ${b_{i}}$ , ${b_{o}}$ , and ${b_{c}}$ are the corresponding bias vectors. $\cdot$ represents the matmul product, and $\odot$ represents the element-wise multiplication. $\delta (\cdot)$ denotes the sigmoid activation function, and $\varphi (\cdot)$ denotes the both hyperbolic tangent activation function.

C. Empirical Mode Decomposition

EMD is a nonlinear analysis method that can transform non-stationary and nonlinear data to stationary and linear data. It breaks down a time series into a group of IMFs and a residue. Each IMF has symmetrical envelopes and the same number of zero-crossing points and extrema points. The local frequency of each IMF is different from the others. The residue stands for the trend. The steps of the EMD algorithm are expressed in Algorithm 1.

Algorithm 1 EMD (Empirical Mode Decomposition)

Identify all the local maxima and minima in a given time series $y(t)$ .

Apply cubic-spline interpolation to generate upper ${y_{u}}(t)$ and lower ${y_{l}}(t)$ envelope according to the local extrema.

Calculate the mean series of the two envelopes $m(t)=\frac {{y_{u}}(t)+{y_{l}}(t)}{2}$ .

Compute the difference between the initial data and the mean $d(t)=y(t)-m(t)$ .

Examine the characteristic of $d(t)$ :

If $d(t)$ is an IMF, calculate the residue $r(t)$ as: $r(t)=y(t)-d(t)$ .
If $d(t)$ is not an IMF, replace $y(t)$ with $d(t)$ and repeat steps 1 to 4.

Treat $r(t)$ as the new initial time series $y(t)$ and repeat the process from step 1. The procedure terminates when the trend of the last residue is monotonic.

Given a set of $P$ IMFs ${d_{p}}(t)$ , $p=1,\cdots, P$ and a residue $r(t)$ , the time series $y(t)$ can be reconstructed as: $\begin{equation*}y(t)=r(t)+\sum \limits _{p=1}^{P}{{d_{p}}(t)} \tag{8}\end{equation*}$ View Source

EMD has two significant advantages in time series analysis or prediction. One lies in its potent reconstruction property, which means all IMFs can reconstruct original time series data without any data loss. The other rests on its adept at acquiring the trend of non-stationary data. Therefore, EMD is very helpful for time series analysis or prediction.

D. Similar Day Selection Algorithm

The similar day selection is an approach to find similar days from the historical data. The selected similar days are usually taken as the input for the prediction task. For load forecasting, selecting appropriate similar days is an effective way to improve the performance of forecasting models. Under different circumstances, the factors related to load variation are different, and only a few of them are the dominant ones. For example, the weekday index is the dominant factor in metropolitan areas where the major consumption of electricity load is commercial and residential load. A good similar day selection algorithm should be able to identify the major factors of load change in different situations, and thereby ensuring an appropriate selection of similar days.

Let the normalized $x(n)$ , $n=1, 2,\cdots, N$ denotes a factor of load variation, then we obtain the vector $\begin{equation*}X={[x(1),x(2),\cdot \cdot \cdot,x(N)] ^{}}. \tag{9}\end{equation*}$ View Source

For the day to be predicted and a historical day, the vector $X$ is represented as $X_{0}$ and $X_{j}$ , respectively. $\begin{align*} {X_{0}}=&[{x_{0}}(1),{x_{0}}(2),\cdot \cdot \cdot,{x_{0}}(N)] \tag{10}\\ {X_{j}}=&[{x_{j}}(1),{x_{j}}(2),\cdot \cdot \cdot,{x_{j}}(N)] \tag{11}\end{align*}$ View Source

The similarity can be defined as: $\begin{align*} {F_{j}} = \prod \limits _{k=1}^{N}{{\varepsilon _{j}}(n)} \qquad \qquad \qquad \quad \qquad \tag{12}\\ \!\!\!\!\!\! {\varepsilon _{j}}(n) = \frac {\mathop{\min }\limits_{j}\,\mathop{\min }\limits_{n}\,\left |{ {x_{0}}(n)\!-\!{x_{j}}(n) }\right |+\rho \mathop{\max }\limits_{j}\,\mathop{\max }\limits_{n}\,\left |{ {x_{0}}(n)-{x_{j}}(n) }\right |}{\left |{ {x_{0}}(n)-{x_{j}}(n) }\right |+\rho \mathop{\max }\limits_{j}\,\mathop{\max }\limits_{n}\,\left |{ {x_{0}}(n)-{x_{j}}(n) }\right |} \\ \tag{13}\end{align*}$ View Source where ${\varepsilon _{j}}(n)$ represents the correlation between ${X_{0}}$ and $X_{j}$ regarding to the $n$ th factor. $\rho$ is the identification coefficient, which is usually set to 0.5.

With the continuous multiplication in (12), the dominant factors can be identified easily and automatically without need of assigning weights to each factor. The process of the similar day selection algorithm is as follows:

Starting from the nearest historical day, the similarity value $F_{j}$ of the historical day $j$ is calculated day by day in reverse.
The $D$ days with the highest similarity in the nearest $N$ days are selected as the similar days of the day $i$ to be forecasted.

SECTION III.

Implementation

This section describes the implementation of the proposed EMD-DNN exploiting transitional forecasting scheme. As shown in Fig. 3, the whole process is divided into three main steps: data preparation, training, prediction. In step 1, the electricity load data and price data are preprocessed in the order of data cleansing, EMD, and normalization. The normalized electricity load data and price data are then converted into 2-dimensional images. In addition to the electricity load and price, other types of factors including holidays, the days of the week and the hours of the day are also incorporated. All the datasets are finally divided into the training set and testing set. In step 2, the proposed scheme is built and trained. Particularly, the modules in the feature extraction layer are seperately trained on the training dataset at first. The results of prediction are then used to generate a new dataset. Finally, the fully-connected network in the forecasting layer is trained on the new dataset. The mean square error (MSE) is taken as the cost function. Adam optimizer is employed for optimizing the training models, and the learning rate is set to 0.001. In the final step, the performance of the proposed scheme is evaluated using the testing set.

FIGURE 3.

Flowchart of the proposed scheme.

Show All

A. Data Preparation

First of all, we briefly describe the data use. The data are derived from the electricity market in Singapore, which contains the data of total electricity consumption and electricity price in this country. In our paper, hourly data of a period of 2 years (from January 2014 to January 2016) are picked out as the used data sets. The ranges of electricity load data and electricity price data are [3795, 6850] (MW) and [21, 1318] ($/MWh), respectively. Furthermore, other types of factors such as the holiday index, the days of the week, and the hours of the day are manually added to the used data sets at one-hour resolution. We collect the public holidays in each year and use them as a binary feature. The holiday index of a day is 1 if this day is a holiday, otherwise it is 0. The data of 2014 are collected as the historical dataset for selecting similar days, while the data of 2015 are used as the forecasting dataset. The 80% of the forecasting dataset is separated as the training samples, and the rest 20% of the dataset is the testing samples. In both training samples and testing samples, the hourly load values are obtained by our proposed scheme using a moving window.

Among the collected data, there are some missing values and abnormal values. We replace these values with highly correlated data. After data cleansing, EMD is applied to the time series. As illustrated in Fig. 4, the electricity load series is decomposed into 11 IMFs and the residue, while the electricity price data is broken down into 15 IMFs and the residue. These components are processed by the min-max normalization in the range of [0, 1] for better training results. Additionally, the electricity load data and the electricity price data are normalized. The min-max normalization follows $\begin{equation*}{{\bar {x}}_{i}}=\frac {{x_{i}}-{x_{\min }}}{x{}_{\max }-{x_{\min }}} \tag{14}\end{equation*}$ View Source where ${x_{\max }}$ and ${x_{\min }}$ are the maximum and the minimum of each component, respectively.

FIGURE 4.

Example of the decomposition of electricity load and price. (a) the decomposition of electricity load, (b) the decomposition of electricity price.

Show All

The crucial part of data preparation is to transform these normalized components into a proper form as the input of our CNN. We treat the processed components of electricity load/price as pixels of images and perform the prediction of the next hour using historical data of the most recent 672 hours. For example, the components of the time series of 672 historical electricity load data are rearranged into a matrix as below. $\begin{align*}{X_{M}}=\left ({\begin{matrix} im{f_{1}}(1) &\quad im{f_{1}}(2) &\quad \cdots &\quad im{f_{1}}(672) \\ im{f_{2}}(1) &\quad im{f_{2}}(2) &\quad \cdots &\quad im{f_{2}}(672) \\ \vdots &\quad \vdots &\quad \ddots &\quad \vdots \\ im{f_{C}}(1) &\quad im{f_{C}}(2) &\quad \cdots &\quad im{f_{C}}(672) \\ \end{matrix} }\right) \tag{15}\end{align*}$ View Source where $im{f_{c}}(i)$ is the value of the $c$ th components at timestamp $i$ obtained by the EMD procedure and normalization, and $C$ is the total number of the components. The components of electricity price are treated as same as the components of load. Hence, the processed components of electricity load and electricity price are transformed into a 2-dimensional image matrix of size (12, 672) and (16, 672), respectively. Other types of factors including the days of the week and the hours of the day are also normalized in the range of [0, 1].

B. EMD-DNN Exploiting Transitional Forecasting Scheme

This subsection elaborates on the proposed scheme for STLF in detail. The objective of the proposed scheme is to estimate the energy load for the next one hour with the given historical electricity load, historical electricity price, and day and hour information.

The entire framework of the proposed scheme is presented in Fig. 5. As illustrated in the framework, this scheme is composed of the abovementioned methodologies in Section II and contains two major layers: the feature extraction layer and the forecasting layer. The definition of all variables can be found in Table 1.

TABLE 1 Description of the Defined Variables

FIGURE 5.

Framework of the proposed scheme. In the feature extraction layer, the CNN-LSTM combined model extracts multimodal spatial-temporal features from the historical electricity load and price, respectively. Meanwhile, the fully-connected neural network and similar day selection are employed to extract extra features in the day and hour information. The output predictions from the feature extraction layer are fused as inputs of the multilayer fully-connected neural network to accomplish the forecasting task. This framework ensures the scheme can extract sufficient latent characteristics, which enhanced understanding of the dataset.

Show All

In the feature extraction layer, there are four modules that perform load forecasting based on the data of different factors. The results of forecasting are the transitional predictions that represent the extracted features from the original data. For historical electricity load and price, there is an ingenious collaboration between CNN and LSTM in these two modules to take full advantage of the information. This collaboration is performed by the CNN-LSTM combined model depicted in Fig. 6.

FIGURE 6.

Architecture of the CNN-LSTM combined model.

Show All

There are two steps in the procedure of building the CNN-LSTM combined model. In the first step, we creatively constitute the CNN model that extracts spatial features of the abovementioned load/price image matrices in (15) based on the idea of VGGNet. VGGNet repeatedly stacks $3\times 3$ small convolution kernels and $2 \times 2$ maximum pooling kernels. As a substitute for $5 \times 5$ or $7 \times 7$ large convolution kernels, multiple $3 \times 3$ small convolution kernels in series lead to fewer parameters, more nonlinear activation functions and a stronger extraction capability of CNN. Each stacked convolution layer is followed by a $2 \times 2$ maximum pooling layer which helps extract more detailed features and greater differences in local information. The CNN model includes two VGGNet blocks, each containing two convolution layers and a maximum pooling layer. In convolution layers, the stride is 1, and the number of filters is set to 16. Furthermore, the stride of pooling layers is set to 2 so that the feature map is reduced by half. Particularly, tanh activation function is used after each convolution layer. At the end of the CNN model, the feature map is flattened, and a FC layer with 168 hidden neurons is employed for output. As a result, the extracted features can be taken as the encode features that represent 672 samples of historical EMD components.

In the second step, a fusion layer concatenates the feature vectors with the 672 samples of normalized load sequence. The concatenation is fed into the LSTM model to estimate the load of the next hour to capture the long-term temporary dependency features. The size of hidden state of the LSTM layer is 200. Based on the above two steps, the CNN-LSTM combined model can extract multimodal spatial-temporal features.

In our experiments, the input size of the CNN-LSTM combined model is different for electricity load and electricity price. The dimension of the electricity price matrix is $16 \times 672$ , while that of the electricity load matrix is $14 \times 672$ .

In addition to the abovementioned electricity load and price modules, a fully-connected neural network module with 10 hidden neurons learns the dynamic relationship between the load and other factors. Such factors include the days of the week, the hours of the day, and the holiday index. Furthermore, a similar day selection module employs the similarity in (12) to select the three most similar days from the past 30 days based on the historical electricity load, the historical electricity price, and the day information. The load affecting factor vector is described as follows: $\begin{equation*} {V_{s}}=[{L_{H}},{P_{H}},{D_{I}}] \tag{16}\end{equation*}$ View Source where ${L_{H}}$ and ${P_{H}}$ represent loads and prices of the most recent 168 hours prior to the day, respectively, and ${D_{I}}$ represents the days of the week and the holiday index. Then, we select the days that are 1 week, 4 weeks, and 1 year earlier than the forecasted day as other similar days. Loads of the forecasted hour in the six similar days are treated as an output vector. The function of the fully-connected neural network and the similar day selection is to augment extra features for prediction.

In the forecasting layer, there is only a fully-connected network that consists of two hidden layers with 20 hidden neurons. The outputs of the feature extraction layer are fused and fed to the fully-connected neural network to learn and forecast the ultimate results. With the transitional predictions, the proposed scheme enhances its extraction capability and is capable of capturing more latent features.

SECTION IV.

Numerical Experiments

A. Evaluation Metrics

In this study, the forecasting performance of the proposed scheme is evaluated using three evaluation metrics, including root mean square error (RMSE), mean absolute error (MAE) and mean absolute percentage error (MAPE). These performance measures are popularly applied in tasks of STLF, and their definitions are given by: $\begin{align*} \text {RMSE}=&\sqrt {\frac {1}{T}{{\sum \limits _{i=1}^{T}{({L_{i}}-{{{\overset {\scriptscriptstyle \frown }{L}}}_{i}})}}^{2}}} \tag{17}\\[3pt] \text {MAE}=&\frac {1}{T}\sum \limits _{i=1}^{T}{\left |{ {L_{i}}-{{{\overset {\scriptscriptstyle \frown }{L}}}_{i}} }\right |} \tag{18}\\[3pt] \text {MAPE}=&\frac {1}{T}\sum \limits _{i=1}^{T}{\frac {\left |{ {L_{i}}-{{{\mathop{L}\limits^{\scriptscriptstyle \frown }}}_{i}} }\right |}{{L_{i}}}}\tag{19}\end{align*}$ View Source where ${L_{i}}$ represents the actual value, ${{{\mathop{L}\limits^{\scriptscriptstyle \frown }}}_{i}}$ represents the forecasted value, and $T$ is the number of samples.

B. Models for Comparison

In order to demonstrate the performance of the proposed scheme, four existing models for STLF are introduced for comparisons.

1) Model 1: TD-CNN

TD-CNN is a multilayer neural network composed of convolutional layers and FC layers. According to [26], the convolutional layer of TD-CNN has a special kernel size and can capture the local pattern with similar characteristics. Different from the classic CNN model, TD-CNN removes the pooling layer so as to keep the finer features.

2) Model 2: CNN-LSTM Model

The standard CNN and LSTM are described above. In [35], the CNN-LSTM model sequentially extracts the spatial-temporal features from the input data. The fully-connected neural network is used for achieving load prognostics.

3) Model 3: ResNet/LSTM Combined Model

The description of this model can be found in [33]. The ResNet extracts the features from 2-D load images by using its 12 layers. Then, the LSTM model is applied to perform finally forecasting based on the feature vectors.

4) Model 4: SIWNN

SIWNN discussed in [7] picks out the load of the similar day as the input load, and employ the wavelet decomposition and separate neural networks to capture the features of load at low and high frequencies.

C. Results

To verify the performance of the proposed scheme, the experiments with this scheme and four comparative models are implemented on the same dataset mentioned in Section III. Figs. 7(a) and 7(b) depict two samples of the predictions based on the proposed scheme in the training and testing sets, respectively. In these figures, the actual value of load is compared with the estimation at each timestamp. Obviously, our proposed scheme is able to follow the general trend of load variation very well in both training and testing.

FIGURE 7.

Estimation of our proposed scheme. (a) 72 samples of the estimated results on the training set, (b) 72 samples of the estimated results on the testing set.

Show All

Table 2 compares the estimation performance of different schemes on the entire testing set in terms of the RMSE, MAE, and MAPE. It is obvious that our scheme performs better than other four models on all three metrics. For example, compared with the ResNet/LSTM combined model that is based on DNN, the proposed scheme reduces the value of RMSE, MAE, and MAPE by 29.75%, 13.02%, and 13.99%, respectively. In addition, compared with the SIWNN that also employs the similar day selection method, our scheme reduces the value of RMSE, MAE, and MAPE by 26.66%, 16.97%, and 17.45%, respectively.

TABLE 2 Performance Comparison on the Testing Set

Figs. 8 and 9 show the curve of load forecasting based on different models. It is observed that our proposed scheme is able to track the load variation in the next hour more effectively and accurately. It is worth noting that electricity consumption varies on weekends, holidays, and weekdays. Since the electricity consumption is the largest during weekdays, the load forecasting on weekdays is the most important. The performance of the proposed scheme and other four comparative models on load forecasting of weekdays is listed in Table 3. From this table, it is clear that compared with the existing four models, the proposed scheme has obvious advantages in predicting the load variation during weekdays. Therefore, our proposed scheme is more effective for STLF. Furthermore, our scheme is significantly better than other available models since this scheme takes full advantage of the extraction capacity of each method. In brief, our proposed scheme is superior to other models, and it is encouraging for short-term load demand prognostics tasks.

TABLE 3 Performance Comparison on Weekdays

FIGURE 8.

Load forecasting for the next hour. (a) forecasting results for November, (b) forecasting results for December. Model 3 deviates from the actual load drastically. Although Model 1 and Model 2 are close, in contrast, our proposed scheme achieves the best accuracy over the dataset.

Show All

FIGURE 9.

Accuracy comparison between Model 4 and the proposed model.

Show All

SECTION V.

Conclusion

In this article, we have presented an EMD-DNN exploiting transitional forecasting scheme to address the short-term load forecasting problem. This scheme combines deep neural networks, empirical mode decomposition, and similar day selection for an accurate estimation of the energy load in the next hour. Based on these methods, the proposed scheme can extract different load features in terms of various influential factors of load. Particularly, a novel EMD based CNN-LSTM approach is proposed to extract multimodal spatial-temporal features from electricity load/price data. A fully-connected network and a similar day selection are applied to capture the day and hour information. In conclusion, the proposed scheme enhances the extraction capacity and is able to capture more potential information providing a precise result. With the actual data from the Singapore electricity market, our scheme is compared with TD-CNN, CNN-LSTM model, ResNet/LSTM combined model, and SIWNN. The results demonstrate that the proposed EMD-DNN exploiting transitional forecasting scheme has a higher forecasting accuracy than the other four models.

MIT Libraries

MIT Libraries

Multimodal Feature Extraction and Fusion Deep Neural Networks for Short-Term Load Forecasting

Abstract:

Metadata

Abstract:

Funding Agency:

Introduction