Journals & Magazines >IEEE Access >Volume: 11

Enhanced ES-adRNNe Load Forecasting With Contextual Augmentation on Similar Load Days

Representative sequences are selected by GCES-adRNNe using FGC to incorporate contextual information. Context trajectories are refined through the utilization of XGBoost....

Abstract:

The importance of accurate prediction of power load variations for ensuring the reliability and rationality of power supply cannot be overstated. The GCES-adRNNe (Grey Re...Show More

Metadata

Abstract:

The importance of accurate prediction of power load variations for ensuring the reliability and rationality of power supply cannot be overstated. The GCES-adRNNe (Grey Relational Analysis Contextually Enhanced ES-adRNNe) is proposed as a load forecasting model that incorporates context augmentation on similar load days. Firstly, the selection of similar load days as representative sequences using the comprehensive grey correlation analysis method is enforced. Furthermore, context information is extracted from the representative sequences, and the individual sequences are dynamically adjusted to accommodate the main track prediction using XGBoost. Moreover, a stacked recurrent neural network (RNN) architecture with multiple layers is used. It incorporates attention-enhanced gated recurrent units (GRU) to capture various dependencies in the time series, such as short-term, long-term, and seasonal dependencies. The GRU also dynamically weight input information. Finally, the prediction results are post-processed. Experimental results indicate that the proposed model yields improvements in terms of RMSE, MAPE, StdPE, and other aspects compared to other load forecasting models. It is indicated that the model, when considering similar load days, is capable of capturing the trends and characteristics of electricity load changes more accurately, thereby providing strong support for ensuring the reliability and rationality of power supply.

Representative sequences are selected by GCES-adRNNe using FGC to incorporate contextual information. Context trajectories are refined through the utilization of XGBoost....

Published in: IEEE Access ( Volume: 11)

Page(s): 93727 - 93738

Date of Publication: 30 August 2023

Electronic ISSN: 2169-3536

DOI: 10.1109/ACCESS.2023.3310395

Funding Agency:

Contents

SECTION I.

Introduction

In recent years, load forecasting plays a crucial role in the energy sector, being of significant importance for the planning, operation, and dispatch of power systems [1]. The aim of load forecasting is to predict the electricity demand for a future period, allowing for the reasonable arrangement of power production and allocation by power companies and suppliers. Accurate load forecasting helps optimize generation plans, the allocation of power resources, the avoidance of energy waste, and the improvement of grid reliability [2]. Additionally, it has an important impact on electricity market trading and the formulation of electricity prices, assisting market participants in making wise decisions and improving market efficiency.

However, electricity load forecasting faces numerous challenges. Firstly, electricity load is influenced by various factors including seasonal variations, weather conditions, and economic situations, complexities that heighten the difficulty of the forecasting task. Secondly, electricity load exhibits a degree of uncertainty, where unforeseen events and anomalies can impact demand without prior notice. These factors remain unattainable, necessitating that prediction models possess robustness and adaptability when external variables are inaccessible. Additionally, electricity load data often possesses nonlinearity and non-stationarity, complexities that conventional statistical models frequently struggle to capture.

In response to the aforementioned challenges, various approaches have been proposed by researchers. Among these, significant progress has been made in machine learning-based methods. Support Vector Machines, Random Forests, and Recurrent Neural Networks (RNNs), among others, have been widely applied in electricity load forecasting. These methods aim to predict future load demands by learning patterns and trends from historical data. However, these approaches still face limitations when dealing with complex relationships among multiple factors and variables.

To overcome these challenges, a novel load forecasting model is introduced in this study, namely GCES-adRNNe (Grey Relational Analysis Contextually Enhanced ES-adRNNe). Differing from traditional methods, this model employs the concepts of grey relational analysis, context trajectories, and main trajectories. The aim is to better capture correlations among multiple factors and variables. By dynamically adjusting model parameters and integrating the features of the current sequence with contextual information, the GCES-adRNNe model seeks to achieve a dual enhancement of stability and accuracy in load forecasting.

SECTION II.

Related Works

A statistical load forecasting approach that utilized optimal quantile regression random forests and risk assessment indicators was proposed by Aprillia et al. [4]. Furthermore, electricity load prediction and analysis were performed using periodic steady-state Kalman filters by Assimakis et al. [5]. These statistical methods can predict loads based on the statistical characteristics of historical data [6], but there might be certain limitations when dealing with non-linear and complex relationships.

Significant progress was made in electricity load forecasting through machine learning algorithms. For instance, a study by Li et al. [7] in 2020 introduced Support Vector Regression (SVR) as a commonly used machine learning algorithm for short-term load prediction. In this study, a SVR model incorporating sample entropy, two-stage decomposition, and optimization with the whale optimization algorithm was proposed for short-term load forecasting. Multiple techniques have been combined in this approach to enhance prediction accuracy; however, limitations are encountered when dealing with intricate time series patterns.

Significant success was achieved in electricity load forecasting by deep learning models. The combination of LSTM networks for short-term load prediction was carried out by Kong et al. [8], where temporal patterns and dependencies in load data were learned by LSTM networks. Another model commonly used was the Convolutional Neural Network (CNN), which was proposed by Eskandari et al. [9]. In their study, a model based on convolution and recurrent neural networks was utilized for short-term load forecasting. Local features of load data were extracted through convolution operations, and the temporal dependencies were captured using recurrent neural networks. A comparison of the performance of multiple neural network models for pattern-based short-term load forecasting was conducted by Dudek et al. [10]. A deep learning approach for short-term residential load forecasting in smart grids was proposed by Hong et al. [11]. Additionally, deep residual networks were employed for short-term load forecasting by Wang et al. [15].

In addition to individual models, the combination of multiple models to form ensemble models had been explored in research. A hybrid model for short-term load forecasting based on stacked generalization ensemble was proposed by Massaoudi et al. [13] in 2021. A more powerful ensemble model was created by combining LGBM, XGB, and MLP models. An ensemble learning method based on recurrent neural fuzzy systems was suggested by Kandilogiannakis et al. [14]. Moreover, a stochastic vector functional-link neural network was utilized for ensemble deep learning by Gao et al. [17]. These ensemble learning methods allowed for the combination of predictions from multiple models, thereby enhancing the accuracy and stability of the forecasting. The method of integrating models combines multiple individual models, thereby enhancing the accuracy and robustness of predictions. However, a more complex process of model selection and tuning is required.

In addition, the combination of traditional time series methods with deep learning models has been attempted by some researchers in order to fully utilize their advantages. For instance, a hybrid model based on exponential smoothing and extended recurrent neural networks for short-term load forecasting was proposed by Smyl et al. [12]. Similarly, a hybrid residual extended LSTM and exponential smoothing model for medium-term load forecasting was suggested by Dudek et al. [16]. These hybrid models can combine the strengths of different methods, leading to improvements in the accuracy and stability of load forecasting. However, more model integration and parameter tuning are required, resulting in an increase in complexity.

Load forecasting in the power industry was a complex problem that involved multiple factors and variables. Some success had been achieved in addressing this problem through the use of traditional statistical methods, machine learning algorithms, and deep learning models. However, there was room for further research to be conducted in the handling of multiple factors and variables.

The article proposed a new load forecasting model called GCES-adRNNe (Grey Relational Analysis Contextually Enhanced ES-adRNNe) to address the problem of load forecasting, which involved multiple factors and variables. Representative sequences were selected from similar load days using FGC (Comprehensive Grey Relational Analysis), providing contextual information for the forecasting task. Context trajectories were extracted from the representative sequences and dynamically adjusted using the XGBoost method. Multiple stacked recurrent layers were employed, and attention-augmented recurrent neural units were added to capture short-term, long-term, and seasonal dependencies in the time series.

SECTION III.

GCES-adRNNe

A new load forecasting model, GCES-adRNNe, is proposed to address the issue of multiple factors and variables involved in load forecasting. The model is composed of the context trajectory and the main trajectory.

The context trajectory plays a crucial role in sequence prediction tasks as it is acted upon as a summary or abstraction of historical data. Due to a large number of sequences, there are significant challenges in connecting the data inputs. To tackle this problem, a comprehensive grey correlation analysis method is employed to select load days that are similar to the target sequence. The selected sequences are then processed in parallel using the context model. At the end of each step, the batch outputs are flattened into a vector.

The main track is considered as the core component of the load forecasting model. In the pre-processing stage, the data undergoes adaptive normalization and deseasonalization. To enhance the prediction performance, an enhanced input is created by incorporating the outputs of the context track before being fed into the recurrent neural network (RNN). The connection and coordination between the context track and the main track are achieved using the XGBoost method. The context track and the main track are synchronized in time.

By fitting the context track, the main track, and all the parameters of each sequence, the prediction error can be effectively reduced by the GCES-adRNNe model. The final prediction results are outputted through post-processing.

In summary, the context track provides a summary of historical data, while the predictions are performed using a recurrent neural network in the main track. The connection and coordination between the two tracks are achieved using XGBoost. The framework diagram of the GCES-adRNNe model is shown in Figure 1.

FIGURE 1.

The framework of the GCES-adRNNe model.

Show All

A. Context Track

1) FGC Similar Load Days Selecting

In sequence prediction tasks, context is a summary or abstraction of past data. To address the challenge of connecting a large number of sequences into a single input, the Load Day Selection Method based on Load Data Similarity is employed. The correlation degree between different numerical factors is measured using the FGC method. The degree of correlation between various numerical values in a data sequence is determined by comparing their similarity. In contrast, cosine similarity is a method used to measure the similarity of different data sequence trends. Load data can be influenced by multiple factors and exhibit certain trends. To comprehensively consider these factors, the Comprehensive Grey Correlation Theory is utilized. The selection of load days similar to the target sequence is not solely dependent on the Fuzzy Grey Correlation method but rather multiple factors are considered. The similarity index $\delta _{i}$ , representing overall similarity, is composed of the weighted correlation degree $y_{0i}$ and cosine similarity $D_{\cos i}$ , which consider the similarity between load sequence values and their trend. A higher value of $\delta _{i}$ , closer to 1, indicates stronger similarity. The calculation formula is as follows:

$\begin{equation*} \delta _{\textrm {i}} =\gamma y_{0i} +(1-\gamma)D_{\cos i} \tag{1}\end{equation*}$ View Source

where

$\gamma$

refers to the empirical weight coefficient, and the empirical value is 0.5.

Considering the impact of working days on the load, the reference load day at the same time as the target prediction day is chosen as the previous week. For example, when predicting the load on Sunday of this week, the previous Sunday is selected as the reference load day. The selection of similar load days is performed using comprehensive grey relational analysis, and 50% of the historical data is chosen as the set of similar load days. The selection of similar load days is shown in Figure 2.

FIGURE 2.

Selection of similar load days.

Show All

2) ES Time Series Decomposing

The contextual time series and the forecasted time series are decomposed into seasonal and level components, respectively, by the ES model. The ES model represents a simplified Holt-Winters model with multiplicative seasonality and consists of two equations representing the level and weekly seasonality.

$\begin{align*} l_{t,\tau } &=\alpha _{t} \frac {Z_{\tau }}{S_{t,\tau }}+(1-\alpha _{t})l_{t,\tau -1} \tag{2}\\ S_{t,\tau +168}& =\beta _{t} \frac {z_{\tau }}{l_{t,\tau }}+(1-\beta _{t})S_{t,\tau } \tag{3}\end{align*}$ View Source

where the horizontal component is denoted as

$l_{t,\tau }$

, the seasonal component per week is denoted as

$s_{t,\tau }$

, and the smoothing coefficients

$\alpha _{t}$

and

$\beta _{t}$

, which belong to the range [0, 1], are used for adjusting in each recursive step

${t}$

The smoothing coefficient of the above model is not constant but is changed at each step t, being dynamically adapted to the current time series properties. The smoothing coefficient is adjusted by the learned corrections $\Delta \alpha _{t}$ and $\Delta \beta _{t}$ from the RNN, as shown below:

$\begin{align*} \alpha _{t+1}& =\sigma \left ({{I\alpha +\Delta \alpha _{t}} }\right) \tag{4}\\ \beta _{t+1} &=\sigma \left ({{I\beta +\Delta \beta _{t}} }\right) \tag{5}\end{align*}$ View Source

where the initial values of the smoothing coefficients,

$I\alpha$

and

$I\beta$

, are used, and

$\sigma$

is the sigmoid function that ensures the coefficients are kept within the range of 0 to 1.

3) Pre-Process Vector De-Seasoning

The pre-processing model has two objectives. The time series is deseasonalized using seasonal components. Contextual orbits are constructed to build the RNN training set.

The input for the contextual orbit is the weekly sequence preceding the day to be predicted. The window of the $\Delta _{t}^{in}$ input weekly sequence, which covers 168 hours, is used. Similarly, the window of the next daily sequence, covering 24 hours, is denoted as $\Delta _{t}^{out}$ . The input sequences are undergone the following processing of being deseasonalized, normalized, and compressed.

$\begin{equation*} x_{\tau }^{in} =\log \frac {Z_{\tau }}{\bar {Z}_{t} \hat {S}_{t,\tau }} \tag{6}\end{equation*}$ View Source

where

$\tau ~\in ~\Delta _{t}^{in}$

$\bar {Z}_{t}$

represents the average value of the input sequence, and

$\hat {S}_{t,\tau }$

represents the seasonal component predicted by ES for

$\tau$

in step

$t$

The pre-processed input sequence is represented by the vector $X_{t}^{in} =[x_{\tau }^{in}]_{\tau \in \Delta _{t}^{in}} \in \mathbb {R}^{168}$ . The log function is used for mapping purposes to prevent the influence of outliers. The deseasonalization of $S_{t,\tau }$ is applied to remove the weekly seasonality, while the normalization of $\bar {Z}_{t}$ is employed to eliminate the long-term trend in the input window. As a result, all pre-processed series are brought to the same standard. The input pattern $X_{t}^{in}$ exhibits dynamic characteristics and is adjusted in each training cycle based on the variations in the seasonal component $S_{t,\tau }$ .

To enrich the input information, the input of the main track is expanded by utilizing the horizontal and seasonal data of the weekly sequence, as well as date data and modulated context vectors.

$\begin{equation*} X_{t}^{i{n}'} =[X_{t}^{in},\hat {S}_{t},\log _{10} (\bar {Z}_{t}),d_{t}^{w},d_{t}^{m},d_{t}^{y},{r}'_{t}] \tag{7}\end{equation*}$ View Source

where

$\hat {S}_{t} \in \mathbb {R}^{24}$

is a representation of the 24 seasonal components predicted by ES for

$\tau \in _{t}^{out}$

with a lag of

$1.~d_{t}^{w} \in \{0,1\}^{7}$

$d_{t}^{m} \in \{0,1\}^{31}$

, and

$d_{t}^{y} \in \{0,1\}^{52}$

are binary one-hot encoded vectors that encode the day of the week, month, and year, respectively, for the predicted date.

${r}'_{t}$

is a contextual vector that is modulated by the context trajectory from the previous steps.

The date variables $d_{t}^{w}$ , $d_{t}^{m}$ , and $d_{t}^{y}$ contain information about the position of the predicted sequence within the weekly and yearly cycles, which is useful for handling fixed-date public holidays. The component $\log _{10} (\bar {Z}_{t})$ represents the local level of the sequence, while $\hat {S}_{t}$ represents the daily variations in the predicted sequence. The modulated contextual vector ${r}'_{t}$ introduces additional information from the context sequence to adjust the step at time $t$ to the predicted time series. The pre-processing of input data and the generation of input patterns are shown in Figure 3.

FIGURE 3.

Pre-processing and generating inputs.

Show All

The target daily sequence covered by the $\Delta _{t}^{out}$ is represented by the output pattern. To obtain the output pattern, $x_{t}^{out} =[x_{\tau }^{out}]_{\tau \in \Delta _{t}^{out}} \in \mathbb {R}^{24}$ , the original sequence is normalized as follows.

$\begin{equation*} x_{\tau }^{out} =\frac {Z_{\tau }}{\bar {Z}_{t}} \tag{8}\end{equation*}$ View Source

where

$\tau ~\in \Delta _{t}^{out}$

. To define the output mode, a normalized approach is used exclusively to ensure that the errors of different time series, which have different scales, can be compared when measured by the loss function. The output mode is only defined when time series in the main trajectory are processed, while the context trajectory generated by the context vector

$r$

does not have target values. These time series are compared with the patterns in the main trajectory, and all model parameters, including those of the main trajectory and the context trajectory, are optimized using the error in the fitting process of the optimization algorithm.

The training patterns are generated by moving adjacent moving windows $\Delta _{t}^{in}$ and $\Delta _{t}^{out}$ by 24 hours, on which the model is trained. The vector corresponding to the 24-hour forecast for the next day is predicted by the main trajectory RNN, denoted as $\hat {X}_{t}^{RNN} =[\hat {x}_{\tau }^{RNN}]_{\tau \in \Delta _{t}^{out}} \in \mathbb {R}^{24}$ .

4) AdRNNe Load Forecasting

The LSTM-like cell, adRNNCell, is employed by the RNN for handling multi-seasonality in time series data, which is equipped with an internal attention mechanism for weighting input information. The adRNNCell consists of two identical dRNNCells, as depicted in Figure 4.

FIGURE 4.

AdRNN cell structure.

Show All

The adRNNCell is characterized by having two cell states (c-states) and two control states (h-states). The states $c_{t-1}$ and $h_{t-1}$ introduce information from the previous step (t-1) into the cell, while the extended states $c_{t-d}$ and $h_{t-d}$ introduce information from a delayed step (t-d), where d > 1. The delayed states extend the cell’s receptive field, aiding in the modeling of long-term and seasonal dependencies.

The adRNNCell is equipped with two weighting mechanisms derived from GRU, which are controlled by the f-gate and the u-gate. The f-gate controls the weighting of the recent and delayed c-states ( $c_{t-1}$ and $c_{t-d}$ ), while the u-gate controls the weighting of the old and new c-states, specifically the combination of $c_{t-1}$ and $c_{t-d}$ with $\tilde {c}_{t}$ . By effectively utilizing both recent and delayed states, the cell’s memory capacity is increased.

The output of dRNNCell is divided into true output $y_{t} (m_{t})$ and control output $h_{t}$ . $y_{t} (m_{t})$ is passed to the next layer (cell), while $h_{t}$ serves as the input to the gating mechanism in the following time steps. In LSTM and GRU, both outputs are executed by the h state. In dRNNCell, the functions are separated into two different vectors, providing more flexibility.

AdRNNCell features an internal attention mechanism. The attention vector $m_{t}$ is generated by the bottom unit, where the components of $m_{t}$ are weighted by the upper unit’s input $x_{t}$ after they have been processed by the exp function.

Based on the weighted input vector $x_{t}^{2}$ , the upper unit generates the vector $y_{t}$ , which is then fed to the next layer. $m_{t}$ exhibits dynamic characteristics, as the weights are adjusted to the current input by the bottom unit at step t.

The equation for the adRNNCell at time step $t$ is shown in Figure 2, where t represents the time step. d > 1 is the dilation factor. $x_{t}$ is the input vector. $h_{t-1}$ and $h_{t-d}$ are the recent and delayed control states, respectively. $c_{t-1}$ and $c_{t-d}$ are the recent and delayed cell states, respectively. $y_{t}$ and $m_{t}$ are the outputs of the upper and lower dRNNCells, respectively. $f_{t}$ , $u_{t}$ , and $o_{t}$ are the outputs of the fusion gate, update gate, and output gate, respectively. $\tilde {c}_{t}$ is the candidate state. W, V, and U are weight matrices. $B$ is the bias vector. $\sigma$ represents the logistic sigmoid function. and denotes the Hadamard product.

The RNN architecture is composed of three layers of adRNN cells, with expansion factors of 2, 4, and 7, respectively. The number of layers and the expansion factors were chosen through experimentation and may not be optimal for other prediction tasks. The stacked layers with increasing expansion factors help in the extraction of more abstract features within consecutive layers and in achieving a larger receptive field. This contributes to the learning of long-term and seasonal time dependencies at different scales. When the number of stacked units reaches three or more, gradient vanishing may occur, so ResNet-style shortcut connections are used between layers. The input vector $x_{t}^{i{n}'}$ is introduced not only into the first layer but also introduced into the second and third layers through the expanded output vectors from the previous layer. Details can be seen in Figure 5.

FIGURE 5.

RNN structure.

Show All

To reduce the input dimensionality of the date variables ( $d_{t}^{w}$ , $d_{t}^{m}$ , and $d_{t}^{y}$ ) with a range of 0-1, they are embedded into a continuous vector $d_{t}$ of dimension d by a linear layer. The embedding vector, along with the model itself, is learned during training.

The adRNN output for the main track is generated by another linear layer, consisting of five elements. A vector corresponding to the 24-point prediction ( $\hat {X}_{t}^{RNN} =[\hat {x}_{\tau }^{RNN}]_{\tau \in \Delta _{t}^{out}} \in \mathbb {R}^{24}$ ), a vector corresponding to the lower bound of PI ( ${\underline {\hat {X}}}_{t}^{RNN} =[{\underline {\hat {x}}}_{\tau }^{RNN}]_{\tau \in \Delta _{t}^{out}} \in \mathbb {R}^{24}$ ), a vector corresponding to the upper bound of PI ( $\hat {{\bar {X}}}_{t}^{RNN} =[\hat {{\bar {x}}}_{\tau }^{RNN}]_{\tau \in \Delta _{t}^{out}} \in \mathbb {R}^{24}$ ), and the correction factors for the smoothing coefficient ( $\Delta \alpha _{t}$ and $\Delta \beta _{t}$ ).

$\begin{equation*} \hat {x}_{t}^{RN{N}'} =[\hat {x}_{t}^{RNN},{\underline {\hat {x}}}_{t}^{RNN},\hat {{\bar {x}}}_{t}^{RNN},\Delta \alpha _{t},\Delta \beta _{t}] \tag{9}\end{equation*}$ View Source

where the corrections for the smoothing coefficients

$\Delta \alpha _{t}$

and

$\Delta \beta _{t}$

are generated by the output linear layer for contextual embeddings.

$\begin{equation*} \hat {X}_{t}^{RN{N}'} =[r_{t}^{(i)},\Delta \alpha _{t},\Delta \beta _{t}] \tag{10}\end{equation*}$

View Source

where

$\Delta \alpha _{t}$

and

$\Delta \beta _{t}$

are generated by two separate RNNs. For

$K$

sequences in a context batch,

$K$

vectors

$r_{t}$

are generated by a context RNN, and then concatenated to form

$r_{t} =[r_{t}^{(1) },\ldots,r_{t}^{(k)}]\in \mathbb {R}^{uK}$

5) Context Track Summarizing

The role of the context trajectory in the GCES-adRNNe model is of significant importance as it is primarily used to provide contextual information for load forecasting tasks. A summary of the role of the context trajectory can be described as follows:

Contextual information is summarized by the context trajectory, which serves as an abstraction of historical data, enabling the provision of the contextual environment for load forecasting tasks. Historical patterns and trends in load variations can be captured by selecting representative sequences that are similar to the target sequence through the utilization of the context trajectory.

The main track prediction is dynamically adjusted by the contextual tracks to adapt to individual sequences. Individualized prediction adjustments are made to each sequence based on the characteristics of the contextual information, thereby improving the accuracy and reliability of the predictions.

Time series dependencies are captured by the contextual tracks, which synchronize with the main track. Multiple stacked recurrent layers and attention-expanded recurrent units are used to capture short-term, long-term, and seasonal dependencies in the time series. This allows the model to better understand changes and influencing factors at different scales in the time series, thereby improving prediction accuracy and generalization ability.

B. XGBoost Context Vectors Modulating

Before being fed into the recurrent neural network (RNN), a connection operation is performed, combining the input data with the output of the contextual trajectory. This connection operation remains the same for each sequence in every prediction batch. If the number of sequences is relatively small, it could lead to each sequence being presented to the prediction network multiple times during training.

To better leverage the information within the sequences, the XGBoost method is chosen to facilitate the connection between the input data and the output of the contextual trajectory. XGBoost, known as a gradient boosting tree model, is capable of effectively handling non-linear relationships and intricate feature interactions.

The process of connecting the contextual trajectory with the main trajectory, along with the role of XGBoost within this process, is outlined below:

Connection Operation: The input data is connected with the output of the contextual trajectory to form a new feature vector, denoted as $F$ .

$\begin{equation*} F=Concat(input,output_{context}) \tag{11}\end{equation*}$ View Source

where the

$input$

denotes the input of the main track, while

$output_{context}$

represents the output of the context track.

Learning and feature generation with XGBoost: The XGBoost model is trained to acquire the non-linear mapping between sequences and context, resulting in the generation of the feature representation $F_{ES}$ that will be subsequently input into ES.

$\begin{equation*} F_{ES} =XGBoost(F) \tag{12}\end{equation*}$ View Source

where the

$F$

represents the feature obtained after the previous connection.

The introduction of XGBoost into the connection operation enhances the model’s capability for capturing sequence features and analyzing associations, thereby elevating the accuracy of predictions. Not only can XGBoost learn the intricate mapping relationship between sequences and context, but it can also generate more comprehensive feature representations, further reinforcing the model’s comprehension of sequence data. When XGBoost is applied during the connection between the contextual trajectory and the main trajectory, an increased number of sequence features and associations can be captured. With its potent nonlinear modeling capacity, XGBoost can learn complex nonlinear mappings between sequences and context, ultimately generating distinct feature representations for each sequence. The dynamics and temporal characteristics of sequences are better learned by the model, resulting in enhanced prediction accuracy.

The core idea of XGBoost is to have the results of weak classifiers added as the prediction value. The residual of the prediction value based on the error function is fitted by the next weak classifier until the error requirements are met, as shown in Figure 6.

FIGURE 6.

Modulating context vectors with XGBoost.

Show All

The objective function of XGBoost, denoted as OB, is calculated using the following formula:

$\begin{equation*} obj=\sum \limits _{i=1}^{m} {l(y_{i},\hat {y}_{i})} +\sum \limits _{k=1}^{K} \Omega \left ({{f_{k}} }\right) \tag{13}\end{equation*}$ View Source

where the

$i$

-th sample is represented by

$I$

, the total number of data in the MTh tree is represented by m, the total number of trees is represented by

$K$

, the true label is represented by

$y_{i}$

, the predicted value is represented by

$\hat {y}_{i}$

, and the transformation form of the tree is represented by

$\Omega$

C. Main Track

1) ES Time Series Decomposing

In the main track, the contextual time series and the forecast time series are decomposed into seasonal and level components by the ES (Exponential Smoothing) model. Exponential smoothing techniques are utilized to smooth and decompose the time series, enabling a better understanding and prediction of the seasonality and overall trends of the sequence. The role of ES in the contextual track is consistent with this. Further details on ES can be found in section III-A2.

In the main track, the context time series is first subjected to exponential smoothing to obtain the seasonal and level components of the context time series. The seasonal features and overall trends of the context sequence are extracted.

Subsequently, the forecast time series is subjected to exponential smoothing, similarly obtaining the seasonal and level components of the forecast time series. The forecast sequence is decomposed into seasonal and overall trend information.

By decomposing the context and forecast sequences into seasonal and level components, the seasonal changes and overall trends of the time series can be better captured by the main track, thereby improving the accuracy and reliability of load forecasting. This decomposition allows the model to have a better understanding of the cyclic and trend changes in the sequence.

2) Pre-Process Vector De-Seasoning

In the main track, the pre-processing model has two main objectives. Firstly, the seasonal component is utilized to remove seasonality from time series data. This step aims to eliminate seasonal variations in the time series, allowing the model to focus more on the residual part of the sequence, which consists of the trend and random components after the seasonal component has been removed. By removing seasonality, the interference of seasonal patterns on load forecasting can be reduced, enabling the model to better capture trends and random variations. The pre-processing details described in section III-A3 are consistent with this process.

In the main track, the training set for the RNN is constructed using the pre-processing model. The time series is split into training samples and target value sequences, and appropriate sliding window operations are applied to transform the time series into input-output pairs suitable for the RNN model. In this way, the patterns and features of the sequence can be learned, and load prediction can be performed, utilizing the RNN model’s memory and temporal dependency.

Further processing and normalization of the training set are performed by the pre-processing model as required to ensure data stability and appropriate scaling. This can include standardization, normalization, or other pre-processing steps to properly transform and adjust the data before training the RNN model.

By these pre-processing steps, the training data for the RNN model is constructed, while seasonal effects are removed to improve the accuracy and reliability of load prediction.

3) AdRNNe Load Forecasting

The processing of time series data in the main track is carried out using AdRNNe. AdRNNe is a type of recurrent neural network that incorporates attention mechanisms and dilated recurrent neural networks for capturing multi-seasonal dependencies in time series data. The functioning of AdRNNe in the context track remains consistent, as described in section III-A4.

AdRNNe possesses the following characteristics:

The LSTM-like cell adRNNCell is employed by AdRNNe for handling multi-seasonal patterns in time series data. Long Short-Term Memory (LSTM) cells, known for their ability to retain long-term memory, are utilized to effectively capture long-term dependencies in time series data.

An attention mechanism is integrated into AdRNNe, allowing input information to be weighted. The weights are dynamically adjusted based on the importance of the input, ensuring that more attention is given to the information that is more beneficial for the prediction task.

Dilated RNN, on the other hand, is an improved RNN structure that expands the model’s receptive field by introducing skip connections between recurrent layers and increasing the time steps distance between layers. This enables the model to better capture long-term dependencies in time series data.

With the assistance of AdRNNe, the GCES-adRNNe model is capable of effectively handling power load time series data. The attention mechanism is utilized to dynamically weight the input information, thereby enhancing the accuracy and reliability of the predictions.

4) Post-Process True Values Converting

Real values are obtained by the post-processing model through the transformation equations, which convert the components of the predicted result $\hat {Z}_{\tau }$ .

$\begin{equation*} \hat {Z}_{\tau } =\exp (\hat {x}_{\tau }^{RNN})\bar {Z}_{t} \hat {S}_{t,\tau } \tag{14}\end{equation*}$ View Source

where

$\tau ~\in ~\Delta _{t}^{out}$

$\bar {Z}_{t}$

represents the average value of the input sequence, and

$\hat {S}_{t,\tau }$

is the seasonal component predicted by ES for

$\tau$

in step

$t$

The loss function employs a normalized version.

$\begin{equation*} \hat {x}_{\tau }^{out} =\frac {\hat {z}_{\tau }}{\hat {z}_{t}} \tag{15}\end{equation*}$ View Source

5) Main Track Summarizing

The main track in the GCES-adRNNe model is used for handling sequence prediction tasks, specifically for predicting future variations in power load. Pre-processed and context-enhanced input sequences are received and processed by the main track, which utilizes stacked recurrent layers for information propagation and learning. By capturing short-term, long-term, and seasonal dependencies in the time series, the dynamic characteristics of the load data can be captured by the main track.

An attention-expanded recurrent neural unit is incorporated into the main track, which has an inherent attention mechanism. This allows input information to be dynamically weighed, with a focus on time steps that are more crucial for the prediction task. The accuracy of predicting key time steps is improved as a result.

The input information is dynamically weighted by the main track through an attention mechanism. The contribution of different time steps of input to the prediction can be adaptively determined by the model, allowing for better adaptation to the changing characteristics of the load data. By means of dynamic weighted input information, important features in the load data can be better captured by the main track, thereby improving the accuracy of the prediction.

The output of the main track is subjected to post-processing steps in order to further process and adjust the predicted results, with the aim of obtaining the final prediction results and further enhancing the accuracy and reliability of the predictions.

SECTION IV.

Experimental Results and Analysis

A. Experimental Dataset

The experimental dataset is derived from the ENTSO-E database (https://www.entsoe.eu/data/power-stats). This dataset comprises hourly electricity demand data from 35 European countries spanning the years 2006 to 2018. A variety of time series with different characteristics are provided by the dataset, including level and trend, stability of variance over time, intensity and regularity of seasonal fluctuations at different periods (annual, weekly, and daily), as well as the strength of random fluctuations. The diverse time series with different characteristics allow for a better evaluation of the models.

B. Evaluation Metrics

The following seven evaluation metrics are employed to assess the performance of the model comprehensively:

MAPE (Mean Absolute Percentage Error): The average of the absolute percentage errors between predicted values and actual values is calculated. It measures the degree of error between the predicted and actual values in terms of percentage.
MdAPE (Median Absolute Percentage Error): The median of the absolute percentage errors between predicted values and actual values is taken. It exhibits more robustness to outliers compared to MAPE.
IqrAPE (Interquartile Absolute Percentage Error): The interquartile range of the absolute percentage errors between predicted values and actual values is determined. It provides a certain level of robustness against outliers.
RMSE (Root Mean Square Error): The square root of the average of the squared differences between predicted values and actual values is computed. It quantifies the average difference between predicted and actual values.
MPE (Mean Percentage Error): The average of the percentage errors between predicted values and actual values is calculated. It is used to measure the average deviation of predicted values from actual values.
StdPE (Standard Percentage Error): The standard deviation of the percentage errors, which measures the variability of percentage errors between predicted values and actual values, is determined.
GWtest (Gauss-Newton test statistic): The Gauss-Newton test statistic is employed to test the goodness of fit of a time series forecasting model. It relies on the Gaussian distribution assumption of prediction errors and helps determine if the model is appropriate.

C. Model Comparison Experiment

In this experiment, we compare the performance of the proposed model with baseline models, including statistical models, classical machine learning models, as well as recursive, deep, and hybrid neural network architectures.

Statistical models:

Naive [18]: Naive model in the form: the forecasted demand profile for the $i$ -th day is the same as the profile for the ( $i$ – 7)-th day.
ARIMA [19]: Autoregressive integrated moving average model.
ES [20]: Exponential smoothing model.
Prophet [21]: Model additive regression model with non-linear trend and seasonal components.
N-WE [22]: Nadaraya-Watson estimator.
GRNN [23]: General regression NN.

Deep learning model:

MLP [24]: Perceptron with a single hidden layer and sigmoid nonlinearities.
SVM [25]: Linear epsilon insensitive support vector machine (o-SVM).
LSTM [26]: Long short-term memory.
ANFIS [27]: Adaptive neuro-fuzzy inference system.
MTGNN [28]: Graph NN for multivariate TS forecasting.
DeepAR [29]: Autoregressive RNN model for probabilistic forecasting.
WaveNet [30]: autoregressive deep NN model combining causal filters with dilated convolutions.
N-BEATS [31]: deep NN with hierarchical doubly residual topology.
LGBM [32]: Light Gradient-Boosting Machine.
XGB [33]: eXtreme Gradient Boosting algorithm.

Hybrid NN architecture:

ES-adRNNe100 [34]: hybrid model combining ES and di-lated RNN with attention mechanism (predecessor of the proposed model). The result is presented for an ensemble of 100 ES-adRNN base models.
cESadRNN single model, cES-adRNNe ensemble of five individual models, cES-adRNNe ensemble of 100 individual models [35].
GCESadRNN single model, GCES-adRNNe ensemble of 80 individual models. (This paper model).

The model was tested on data from 2006-2018. Due to incomplete data for about 40% of the countries in the dataset during the period of 2006-2015, the models had poorer capabilities in handling missing values. As a result, better results were achieved in the shorter period of 2016-2018. For these models, an asterisk ( $\ast$ ) is used in the table to indicate that testing was performed using data from 2016-2018.

A MAPE (Mean Absolute Percentage Error) metric of 1.85 is exhibited by GCES-adRNNe, while the values for other models range from 2.07 to 5.03. This indicates that the average prediction error of GCES-adRNNe is smaller, and in comparison to other models, its predicted results are closer to the actual values. Lower values are also achieved by GCES-adRNNe in metrics such as MdAPE (Median Absolute Percentage Error), IprAPE (Absolute Percentage Error within 80% Prediction Interval), RMSE (Root Mean Square Error), and MPE (Mean Percentage Error), which suggest higher accuracy in its predictions. Additionally, impressive performance is demonstrated by GCES-adRNNe in the StdPE (Standard Percentage Error) and GWtest (Grubbs’ Test) metrics. The small values observed in these two metrics indicate higher stability and significance of its predicted results. It can be observed from Figure 7 that the minimum MAPE is achieved by GCES-adRNNe when the ensemble size is 80.

FIGURE 7.

Ensemble size confirmation.

Show All

The concept of context trajectories and main trajectories is introduced by the GCES-adRNNe model, and representative sequences of similar load days are selected using the comprehensive grey relational analysis method, providing context information for the prediction task. The correlations and similarities between sequences are captured by the GCES-adRNNe model through the extraction of context trajectories from representative sequences. The model is dynamically adjusted using the XGBoost algorithm to adapt to individual sequences for main trajectory prediction. This mechanism allows the model to be dynamically adjusted in real-time based on the features and context information of the current sequence, thereby improving the flexibility and accuracy of the prediction. In terms of the RNN architecture, the GCES-adRNNe model incorporates multiple stacked recurrent layers and attention-expanded recurrent units, enabling the model to capture short-term, long-term, and seasonal dependencies in time series and dynamically weight input information. The introduction of attention mechanisms allows the model to automatically focus on and highlight important time steps in the sequence, thereby improving the prediction of future load changes. Partial load prediction examples are presented in Figures 8, 9, and 10.

FIGURE 8.

Load forecasting example (Country: AL).

Show All

FIGURE 9.

Load forecasting example (Country: AT).

Show All

FIGURE 10.

Load forecasting example (Country: BA).

Show All

D. Model Ablation Analysis

In this section, the proposed GCES-adRNNe model is analyzed through a series of ablation experiments. The experimental settings are presented in Table 2.

Ab1: The performance of ES-adRNNe was achieved with a MAPE of 2.13 and an RMSE of 288.17. It utilized the main orbitals but did not incorporate additional inputs from contextual orbitals.
Ab2: A performance improvement was observed in cES-adRNNe, with a MAPE of 1.96 and an RMSE of 270. It achieved this by directly introducing the input vector into the second and third layers, and extending the output vectors from the previous layer.
Ab3: In Ab3, the same contextual vectors were used for all series in the main orbitals, and it resulted in slightly lower performance compared to Ab2. It achieved a MAPE of 2.05 and an RMSE of 274.
Ab4: By introducing the adaptive context vector modulation vector g, Ab4 improved its performance compared to Ab3. It achieved a MAPE of 1.95 and an RMSE of 266.12.
Ab5: Further performance improvement was observed in Ab5, with a MAPE of 1.92 and an RMSE of 261.71. It achieved this by utilizing XGBoost for the modulation of contextual vectors.
Ab6: The best performance was achieved in GCES-adRNNe, with a MAPE of 1.85 and an RMSE of 252.73. It combined comprehensive grey relational analysis for selecting similar load days and utilized XGBoost for the modulation of contextual vectors.

TABLE 1 Experimental Results Comparison

TABLE 2 Ablation Experiment Settings

As shown in Table 3, the specific ablation experiment results are as follows.

TABLE 3 Experimental Results of Ablation Study

The following conclusions can be drawn based on the above results. The performance is positively influenced by the introduction of adaptive modulation vector g and the utilization of xgboost for context vector modulation in the cES-adRNNe model. Furthermore, performance can be further improved by integrating comprehensive grey correlation analysis to optimize the selection of load days.

SECTION V.

Conclusion

The GCES-adRNNe model is proposed in this paper to fully utilize the relationships among multiple factors and variables, as well as to extract contextual information from representative sequences. By dynamically adjusting model parameters and combining the features of the current sequence with contextual information, improvements in stability and accuracy are achieved by the GCES-adRNNe model. The effectiveness of the model is demonstrated through experimental results, where GCES-adRNNe performs well in seven evaluation metrics, including MAPE, MdAPE, and IprAPE. For instance, a MAPE value of 1.85 is obtained for GCES-adRNNe, while other models range from 2.07 to 5.03. The average prediction error of GCES-adRNNe is smaller, and relative to other models, its predicted results are closer to the actual values. This signifies that the model demonstrates a reduced average prediction error in electricity load forecasting, enabling it to accurately capture intricate variations in power demand. This provides robust support for practical applications.

Multiple stacked recurrent layers and attention-expanded recurrent units are adopted in the GCES-adRNNe model in the RNN architecture. This allows the model to effectively capture short-term, long-term, and seasonal dependencies in time series data and dynamically weight the input information. The introduction of the attention mechanism enables the model to automatically focus on and highlight important time steps in the sequence, resulting in better predictions of future load changes.

Power system planning and operational decisions can be strongly supported by the GCES-adRNNe model, promoting the sustainable development and optimization of power supply.

References is not available for this document.

MIT Libraries

MIT Libraries

Enhanced ES-adRNNe Load Forecasting With Contextual Augmentation on Similar Load Days

Abstract:

Metadata

Abstract:

Funding Agency:

Introduction

Related Works

GCES-adRNNe

A. Context Track

1) FGC Similar Load Days Selecting

2) ES Time Series Decomposing

3) Pre-Process Vector De-Seasoning

4) AdRNNe Load Forecasting

5) Context Track Summarizing

B. XGBoost Context Vectors Modulating

C. Main Track

1) ES Time Series Decomposing

2) Pre-Process Vector De-Seasoning

3) AdRNNe Load Forecasting

4) Post-Process True Values Converting

5) Main Track Summarizing

Experimental Results and Analysis

A. Experimental Dataset

B. Evaluation Metrics

C. Model Comparison Experiment

D. Model Ablation Analysis

Conclusion

References

IEEE Account

Purchase Details

Profile Information

Need Help?

MIT Libraries

MIT Libraries

Enhanced ES-adRNNe Load Forecasting With Contextual Augmentation on Similar Load Days

Alerts

Abstract:

Metadata

Abstract:

Funding Agency:

Introduction

Related Works

GCES-adRNNe

A. Context Track

1) FGC Similar Load Days Selecting

2) ES Time Series Decomposing

3) Pre-Process Vector De-Seasoning

4) AdRNNe Load Forecasting

5) Context Track Summarizing

B. XGBoost Context Vectors Modulating

C. Main Track

1) ES Time Series Decomposing

2) Pre-Process Vector De-Seasoning

3) AdRNNe Load Forecasting

4) Post-Process True Values Converting

5) Main Track Summarizing

Experimental Results and Analysis

A. Experimental Dataset

B. Evaluation Metrics

C. Model Comparison Experiment

D. Model Ablation Analysis

Conclusion

References