Introduction
In recent years, load forecasting plays a crucial role in the energy sector, being of significant importance for the planning, operation, and dispatch of power systems [1]. The aim of load forecasting is to predict the electricity demand for a future period, allowing for the reasonable arrangement of power production and allocation by power companies and suppliers. Accurate load forecasting helps optimize generation plans, the allocation of power resources, the avoidance of energy waste, and the improvement of grid reliability [2]. Additionally, it has an important impact on electricity market trading and the formulation of electricity prices, assisting market participants in making wise decisions and improving market efficiency.
However, electricity load forecasting faces numerous challenges. Firstly, electricity load is influenced by various factors including seasonal variations, weather conditions, and economic situations, complexities that heighten the difficulty of the forecasting task. Secondly, electricity load exhibits a degree of uncertainty, where unforeseen events and anomalies can impact demand without prior notice. These factors remain unattainable, necessitating that prediction models possess robustness and adaptability when external variables are inaccessible. Additionally, electricity load data often possesses nonlinearity and non-stationarity, complexities that conventional statistical models frequently struggle to capture.
In response to the aforementioned challenges, various approaches have been proposed by researchers. Among these, significant progress has been made in machine learning-based methods. Support Vector Machines, Random Forests, and Recurrent Neural Networks (RNNs), among others, have been widely applied in electricity load forecasting. These methods aim to predict future load demands by learning patterns and trends from historical data. However, these approaches still face limitations when dealing with complex relationships among multiple factors and variables.
To overcome these challenges, a novel load forecasting model is introduced in this study, namely GCES-adRNNe (Grey Relational Analysis Contextually Enhanced ES-adRNNe). Differing from traditional methods, this model employs the concepts of grey relational analysis, context trajectories, and main trajectories. The aim is to better capture correlations among multiple factors and variables. By dynamically adjusting model parameters and integrating the features of the current sequence with contextual information, the GCES-adRNNe model seeks to achieve a dual enhancement of stability and accuracy in load forecasting.
Related Works
A statistical load forecasting approach that utilized optimal quantile regression random forests and risk assessment indicators was proposed by Aprillia et al. [4]. Furthermore, electricity load prediction and analysis were performed using periodic steady-state Kalman filters by Assimakis et al. [5]. These statistical methods can predict loads based on the statistical characteristics of historical data [6], but there might be certain limitations when dealing with non-linear and complex relationships.
Significant progress was made in electricity load forecasting through machine learning algorithms. For instance, a study by Li et al. [7] in 2020 introduced Support Vector Regression (SVR) as a commonly used machine learning algorithm for short-term load prediction. In this study, a SVR model incorporating sample entropy, two-stage decomposition, and optimization with the whale optimization algorithm was proposed for short-term load forecasting. Multiple techniques have been combined in this approach to enhance prediction accuracy; however, limitations are encountered when dealing with intricate time series patterns.
Significant success was achieved in electricity load forecasting by deep learning models. The combination of LSTM networks for short-term load prediction was carried out by Kong et al. [8], where temporal patterns and dependencies in load data were learned by LSTM networks. Another model commonly used was the Convolutional Neural Network (CNN), which was proposed by Eskandari et al. [9]. In their study, a model based on convolution and recurrent neural networks was utilized for short-term load forecasting. Local features of load data were extracted through convolution operations, and the temporal dependencies were captured using recurrent neural networks. A comparison of the performance of multiple neural network models for pattern-based short-term load forecasting was conducted by Dudek et al. [10]. A deep learning approach for short-term residential load forecasting in smart grids was proposed by Hong et al. [11]. Additionally, deep residual networks were employed for short-term load forecasting by Wang et al. [15].
In addition to individual models, the combination of multiple models to form ensemble models had been explored in research. A hybrid model for short-term load forecasting based on stacked generalization ensemble was proposed by Massaoudi et al. [13] in 2021. A more powerful ensemble model was created by combining LGBM, XGB, and MLP models. An ensemble learning method based on recurrent neural fuzzy systems was suggested by Kandilogiannakis et al. [14]. Moreover, a stochastic vector functional-link neural network was utilized for ensemble deep learning by Gao et al. [17]. These ensemble learning methods allowed for the combination of predictions from multiple models, thereby enhancing the accuracy and stability of the forecasting. The method of integrating models combines multiple individual models, thereby enhancing the accuracy and robustness of predictions. However, a more complex process of model selection and tuning is required.
In addition, the combination of traditional time series methods with deep learning models has been attempted by some researchers in order to fully utilize their advantages. For instance, a hybrid model based on exponential smoothing and extended recurrent neural networks for short-term load forecasting was proposed by Smyl et al. [12]. Similarly, a hybrid residual extended LSTM and exponential smoothing model for medium-term load forecasting was suggested by Dudek et al. [16]. These hybrid models can combine the strengths of different methods, leading to improvements in the accuracy and stability of load forecasting. However, more model integration and parameter tuning are required, resulting in an increase in complexity.
Load forecasting in the power industry was a complex problem that involved multiple factors and variables. Some success had been achieved in addressing this problem through the use of traditional statistical methods, machine learning algorithms, and deep learning models. However, there was room for further research to be conducted in the handling of multiple factors and variables.
The article proposed a new load forecasting model called GCES-adRNNe (Grey Relational Analysis Contextually Enhanced ES-adRNNe) to address the problem of load forecasting, which involved multiple factors and variables. Representative sequences were selected from similar load days using FGC (Comprehensive Grey Relational Analysis), providing contextual information for the forecasting task. Context trajectories were extracted from the representative sequences and dynamically adjusted using the XGBoost method. Multiple stacked recurrent layers were employed, and attention-augmented recurrent neural units were added to capture short-term, long-term, and seasonal dependencies in the time series.
GCES-adRNNe
A new load forecasting model, GCES-adRNNe, is proposed to address the issue of multiple factors and variables involved in load forecasting. The model is composed of the context trajectory and the main trajectory.
The context trajectory plays a crucial role in sequence prediction tasks as it is acted upon as a summary or abstraction of historical data. Due to a large number of sequences, there are significant challenges in connecting the data inputs. To tackle this problem, a comprehensive grey correlation analysis method is employed to select load days that are similar to the target sequence. The selected sequences are then processed in parallel using the context model. At the end of each step, the batch outputs are flattened into a vector.
The main track is considered as the core component of the load forecasting model. In the pre-processing stage, the data undergoes adaptive normalization and deseasonalization. To enhance the prediction performance, an enhanced input is created by incorporating the outputs of the context track before being fed into the recurrent neural network (RNN). The connection and coordination between the context track and the main track are achieved using the XGBoost method. The context track and the main track are synchronized in time.
By fitting the context track, the main track, and all the parameters of each sequence, the prediction error can be effectively reduced by the GCES-adRNNe model. The final prediction results are outputted through post-processing.
In summary, the context track provides a summary of historical data, while the predictions are performed using a recurrent neural network in the main track. The connection and coordination between the two tracks are achieved using XGBoost. The framework diagram of the GCES-adRNNe model is shown in Figure 1.
A. Context Track
1) FGC Similar Load Days Selecting
In sequence prediction tasks, context is a summary or abstraction of past data. To address the challenge of connecting a large number of sequences into a single input, the Load Day Selection Method based on Load Data Similarity is employed. The correlation degree between different numerical factors is measured using the FGC method. The degree of correlation between various numerical values in a data sequence is determined by comparing their similarity. In contrast, cosine similarity is a method used to measure the similarity of different data sequence trends. Load data can be influenced by multiple factors and exhibit certain trends. To comprehensively consider these factors, the Comprehensive Grey Correlation Theory is utilized. The selection of load days similar to the target sequence is not solely dependent on the Fuzzy Grey Correlation method but rather multiple factors are considered. The similarity index \begin{equation*} \delta _{\textrm {i}} =\gamma y_{0i} +(1-\gamma)D_{\cos i} \tag{1}\end{equation*}
Considering the impact of working days on the load, the reference load day at the same time as the target prediction day is chosen as the previous week. For example, when predicting the load on Sunday of this week, the previous Sunday is selected as the reference load day. The selection of similar load days is performed using comprehensive grey relational analysis, and 50% of the historical data is chosen as the set of similar load days. The selection of similar load days is shown in Figure 2.
2) ES Time Series Decomposing
The contextual time series and the forecasted time series are decomposed into seasonal and level components, respectively, by the ES model. The ES model represents a simplified Holt-Winters model with multiplicative seasonality and consists of two equations representing the level and weekly seasonality.\begin{align*} l_{t,\tau } &=\alpha _{t} \frac {Z_{\tau }}{S_{t,\tau }}+(1-\alpha _{t})l_{t,\tau -1} \tag{2}\\ S_{t,\tau +168}& =\beta _{t} \frac {z_{\tau }}{l_{t,\tau }}+(1-\beta _{t})S_{t,\tau } \tag{3}\end{align*}
The smoothing coefficient of the above model is not constant but is changed at each step t, being dynamically adapted to the current time series properties. The smoothing coefficient is adjusted by the learned corrections \begin{align*} \alpha _{t+1}& =\sigma \left ({{I\alpha +\Delta \alpha _{t}} }\right) \tag{4}\\ \beta _{t+1} &=\sigma \left ({{I\beta +\Delta \beta _{t}} }\right) \tag{5}\end{align*}
3) Pre-Process Vector De-Seasoning
The pre-processing model has two objectives. The time series is deseasonalized using seasonal components. Contextual orbits are constructed to build the RNN training set.
The input for the contextual orbit is the weekly sequence preceding the day to be predicted. The window of the \begin{equation*} x_{\tau }^{in} =\log \frac {Z_{\tau }}{\bar {Z}_{t} \hat {S}_{t,\tau }} \tag{6}\end{equation*}
The pre-processed input sequence is represented by the vector
To enrich the input information, the input of the main track is expanded by utilizing the horizontal and seasonal data of the weekly sequence, as well as date data and modulated context vectors.\begin{equation*} X_{t}^{i{n}'} =[X_{t}^{in},\hat {S}_{t},\log _{10} (\bar {Z}_{t}),d_{t}^{w},d_{t}^{m},d_{t}^{y},{r}'_{t}] \tag{7}\end{equation*}
The date variables
The target daily sequence covered by the \begin{equation*} x_{\tau }^{out} =\frac {Z_{\tau }}{\bar {Z}_{t}} \tag{8}\end{equation*}
The training patterns are generated by moving adjacent moving windows
4) AdRNNe Load Forecasting
The LSTM-like cell, adRNNCell, is employed by the RNN for handling multi-seasonality in time series data, which is equipped with an internal attention mechanism for weighting input information. The adRNNCell consists of two identical dRNNCells, as depicted in Figure 4.
The adRNNCell is characterized by having two cell states (c-states) and two control states (h-states). The states
The adRNNCell is equipped with two weighting mechanisms derived from GRU, which are controlled by the f-gate and the u-gate. The f-gate controls the weighting of the recent and delayed c-states (
The output of dRNNCell is divided into true output
AdRNNCell features an internal attention mechanism. The attention vector
Based on the weighted input vector
The equation for the adRNNCell at time step
The RNN architecture is composed of three layers of adRNN cells, with expansion factors of 2, 4, and 7, respectively. The number of layers and the expansion factors were chosen through experimentation and may not be optimal for other prediction tasks. The stacked layers with increasing expansion factors help in the extraction of more abstract features within consecutive layers and in achieving a larger receptive field. This contributes to the learning of long-term and seasonal time dependencies at different scales. When the number of stacked units reaches three or more, gradient vanishing may occur, so ResNet-style shortcut connections are used between layers. The input vector
To reduce the input dimensionality of the date variables (
The adRNN output for the main track is generated by another linear layer, consisting of five elements. A vector corresponding to the 24-point prediction (\begin{equation*} \hat {x}_{t}^{RN{N}'} =[\hat {x}_{t}^{RNN},{\underline {\hat {x}}}_{t}^{RNN},\hat {{\bar {x}}}_{t}^{RNN},\Delta \alpha _{t},\Delta \beta _{t}] \tag{9}\end{equation*}
\begin{equation*} \hat {X}_{t}^{RN{N}'} =[r_{t}^{(i)},\Delta \alpha _{t},\Delta \beta _{t}] \tag{10}\end{equation*}
5) Context Track Summarizing
The role of the context trajectory in the GCES-adRNNe model is of significant importance as it is primarily used to provide contextual information for load forecasting tasks. A summary of the role of the context trajectory can be described as follows:
Contextual information is summarized by the context trajectory, which serves as an abstraction of historical data, enabling the provision of the contextual environment for load forecasting tasks. Historical patterns and trends in load variations can be captured by selecting representative sequences that are similar to the target sequence through the utilization of the context trajectory.
The main track prediction is dynamically adjusted by the contextual tracks to adapt to individual sequences. Individualized prediction adjustments are made to each sequence based on the characteristics of the contextual information, thereby improving the accuracy and reliability of the predictions.
Time series dependencies are captured by the contextual tracks, which synchronize with the main track. Multiple stacked recurrent layers and attention-expanded recurrent units are used to capture short-term, long-term, and seasonal dependencies in the time series. This allows the model to better understand changes and influencing factors at different scales in the time series, thereby improving prediction accuracy and generalization ability.
B. XGBoost Context Vectors Modulating
Before being fed into the recurrent neural network (RNN), a connection operation is performed, combining the input data with the output of the contextual trajectory. This connection operation remains the same for each sequence in every prediction batch. If the number of sequences is relatively small, it could lead to each sequence being presented to the prediction network multiple times during training.
To better leverage the information within the sequences, the XGBoost method is chosen to facilitate the connection between the input data and the output of the contextual trajectory. XGBoost, known as a gradient boosting tree model, is capable of effectively handling non-linear relationships and intricate feature interactions.
The process of connecting the contextual trajectory with the main trajectory, along with the role of XGBoost within this process, is outlined below:
Connection Operation: The input data is connected with the output of the contextual trajectory to form a new feature vector, denoted as \begin{equation*} F=Concat(input,output_{context}) \tag{11}\end{equation*}
Learning and feature generation with XGBoost: The XGBoost model is trained to acquire the non-linear mapping between sequences and context, resulting in the generation of the feature representation \begin{equation*} F_{ES} =XGBoost(F) \tag{12}\end{equation*}
The introduction of XGBoost into the connection operation enhances the model’s capability for capturing sequence features and analyzing associations, thereby elevating the accuracy of predictions. Not only can XGBoost learn the intricate mapping relationship between sequences and context, but it can also generate more comprehensive feature representations, further reinforcing the model’s comprehension of sequence data. When XGBoost is applied during the connection between the contextual trajectory and the main trajectory, an increased number of sequence features and associations can be captured. With its potent nonlinear modeling capacity, XGBoost can learn complex nonlinear mappings between sequences and context, ultimately generating distinct feature representations for each sequence. The dynamics and temporal characteristics of sequences are better learned by the model, resulting in enhanced prediction accuracy.
The core idea of XGBoost is to have the results of weak classifiers added as the prediction value. The residual of the prediction value based on the error function is fitted by the next weak classifier until the error requirements are met, as shown in Figure 6.
The objective function of XGBoost, denoted as OB, is calculated using the following formula:\begin{equation*} obj=\sum \limits _{i=1}^{m} {l(y_{i},\hat {y}_{i})} +\sum \limits _{k=1}^{K} \Omega \left ({{f_{k}} }\right) \tag{13}\end{equation*}
C. Main Track
1) ES Time Series Decomposing
In the main track, the contextual time series and the forecast time series are decomposed into seasonal and level components by the ES (Exponential Smoothing) model. Exponential smoothing techniques are utilized to smooth and decompose the time series, enabling a better understanding and prediction of the seasonality and overall trends of the sequence. The role of ES in the contextual track is consistent with this. Further details on ES can be found in section III-A2.
In the main track, the context time series is first subjected to exponential smoothing to obtain the seasonal and level components of the context time series. The seasonal features and overall trends of the context sequence are extracted.
Subsequently, the forecast time series is subjected to exponential smoothing, similarly obtaining the seasonal and level components of the forecast time series. The forecast sequence is decomposed into seasonal and overall trend information.
By decomposing the context and forecast sequences into seasonal and level components, the seasonal changes and overall trends of the time series can be better captured by the main track, thereby improving the accuracy and reliability of load forecasting. This decomposition allows the model to have a better understanding of the cyclic and trend changes in the sequence.
2) Pre-Process Vector De-Seasoning
In the main track, the pre-processing model has two main objectives. Firstly, the seasonal component is utilized to remove seasonality from time series data. This step aims to eliminate seasonal variations in the time series, allowing the model to focus more on the residual part of the sequence, which consists of the trend and random components after the seasonal component has been removed. By removing seasonality, the interference of seasonal patterns on load forecasting can be reduced, enabling the model to better capture trends and random variations. The pre-processing details described in section III-A3 are consistent with this process.
In the main track, the training set for the RNN is constructed using the pre-processing model. The time series is split into training samples and target value sequences, and appropriate sliding window operations are applied to transform the time series into input-output pairs suitable for the RNN model. In this way, the patterns and features of the sequence can be learned, and load prediction can be performed, utilizing the RNN model’s memory and temporal dependency.
Further processing and normalization of the training set are performed by the pre-processing model as required to ensure data stability and appropriate scaling. This can include standardization, normalization, or other pre-processing steps to properly transform and adjust the data before training the RNN model.
By these pre-processing steps, the training data for the RNN model is constructed, while seasonal effects are removed to improve the accuracy and reliability of load prediction.
3) AdRNNe Load Forecasting
The processing of time series data in the main track is carried out using AdRNNe. AdRNNe is a type of recurrent neural network that incorporates attention mechanisms and dilated recurrent neural networks for capturing multi-seasonal dependencies in time series data. The functioning of AdRNNe in the context track remains consistent, as described in section III-A4.
AdRNNe possesses the following characteristics:
The LSTM-like cell adRNNCell is employed by AdRNNe for handling multi-seasonal patterns in time series data. Long Short-Term Memory (LSTM) cells, known for their ability to retain long-term memory, are utilized to effectively capture long-term dependencies in time series data.
An attention mechanism is integrated into AdRNNe, allowing input information to be weighted. The weights are dynamically adjusted based on the importance of the input, ensuring that more attention is given to the information that is more beneficial for the prediction task.
Dilated RNN, on the other hand, is an improved RNN structure that expands the model’s receptive field by introducing skip connections between recurrent layers and increasing the time steps distance between layers. This enables the model to better capture long-term dependencies in time series data.
With the assistance of AdRNNe, the GCES-adRNNe model is capable of effectively handling power load time series data. The attention mechanism is utilized to dynamically weight the input information, thereby enhancing the accuracy and reliability of the predictions.
4) Post-Process True Values Converting
Real values are obtained by the post-processing model through the transformation equations, which convert the components of the predicted result \begin{equation*} \hat {Z}_{\tau } =\exp (\hat {x}_{\tau }^{RNN})\bar {Z}_{t} \hat {S}_{t,\tau } \tag{14}\end{equation*}
The loss function employs a normalized version.\begin{equation*} \hat {x}_{\tau }^{out} =\frac {\hat {z}_{\tau }}{\hat {z}_{t}} \tag{15}\end{equation*}
5) Main Track Summarizing
The main track in the GCES-adRNNe model is used for handling sequence prediction tasks, specifically for predicting future variations in power load. Pre-processed and context-enhanced input sequences are received and processed by the main track, which utilizes stacked recurrent layers for information propagation and learning. By capturing short-term, long-term, and seasonal dependencies in the time series, the dynamic characteristics of the load data can be captured by the main track.
An attention-expanded recurrent neural unit is incorporated into the main track, which has an inherent attention mechanism. This allows input information to be dynamically weighed, with a focus on time steps that are more crucial for the prediction task. The accuracy of predicting key time steps is improved as a result.
The input information is dynamically weighted by the main track through an attention mechanism. The contribution of different time steps of input to the prediction can be adaptively determined by the model, allowing for better adaptation to the changing characteristics of the load data. By means of dynamic weighted input information, important features in the load data can be better captured by the main track, thereby improving the accuracy of the prediction.
The output of the main track is subjected to post-processing steps in order to further process and adjust the predicted results, with the aim of obtaining the final prediction results and further enhancing the accuracy and reliability of the predictions.
Experimental Results and Analysis
A. Experimental Dataset
The experimental dataset is derived from the ENTSO-E database (https://www.entsoe.eu/data/power-stats). This dataset comprises hourly electricity demand data from 35 European countries spanning the years 2006 to 2018. A variety of time series with different characteristics are provided by the dataset, including level and trend, stability of variance over time, intensity and regularity of seasonal fluctuations at different periods (annual, weekly, and daily), as well as the strength of random fluctuations. The diverse time series with different characteristics allow for a better evaluation of the models.
B. Evaluation Metrics
The following seven evaluation metrics are employed to assess the performance of the model comprehensively:
MAPE (Mean Absolute Percentage Error): The average of the absolute percentage errors between predicted values and actual values is calculated. It measures the degree of error between the predicted and actual values in terms of percentage.
MdAPE (Median Absolute Percentage Error): The median of the absolute percentage errors between predicted values and actual values is taken. It exhibits more robustness to outliers compared to MAPE.
IqrAPE (Interquartile Absolute Percentage Error): The interquartile range of the absolute percentage errors between predicted values and actual values is determined. It provides a certain level of robustness against outliers.
RMSE (Root Mean Square Error): The square root of the average of the squared differences between predicted values and actual values is computed. It quantifies the average difference between predicted and actual values.
MPE (Mean Percentage Error): The average of the percentage errors between predicted values and actual values is calculated. It is used to measure the average deviation of predicted values from actual values.
StdPE (Standard Percentage Error): The standard deviation of the percentage errors, which measures the variability of percentage errors between predicted values and actual values, is determined.
GWtest (Gauss-Newton test statistic): The Gauss-Newton test statistic is employed to test the goodness of fit of a time series forecasting model. It relies on the Gaussian distribution assumption of prediction errors and helps determine if the model is appropriate.
C. Model Comparison Experiment
In this experiment, we compare the performance of the proposed model with baseline models, including statistical models, classical machine learning models, as well as recursive, deep, and hybrid neural network architectures.
Statistical models:
Naive [18]: Naive model in the form: the forecasted demand profile for the
-th day is the same as the profile for the (i – 7)-th day.i ARIMA [19]: Autoregressive integrated moving average model.
ES [20]: Exponential smoothing model.
Prophet [21]: Model additive regression model with non-linear trend and seasonal components.
N-WE [22]: Nadaraya-Watson estimator.
GRNN [23]: General regression NN.
MLP [24]: Perceptron with a single hidden layer and sigmoid nonlinearities.
SVM [25]: Linear epsilon insensitive support vector machine (o-SVM).
LSTM [26]: Long short-term memory.
ANFIS [27]: Adaptive neuro-fuzzy inference system.
MTGNN [28]: Graph NN for multivariate TS forecasting.
DeepAR [29]: Autoregressive RNN model for probabilistic forecasting.
WaveNet [30]: autoregressive deep NN model combining causal filters with dilated convolutions.
N-BEATS [31]: deep NN with hierarchical doubly residual topology.
LGBM [32]: Light Gradient-Boosting Machine.
XGB [33]: eXtreme Gradient Boosting algorithm.
ES-adRNNe100 [34]: hybrid model combining ES and di-lated RNN with attention mechanism (predecessor of the proposed model). The result is presented for an ensemble of 100 ES-adRNN base models.
cESadRNN single model, cES-adRNNe ensemble of five individual models, cES-adRNNe ensemble of 100 individual models [35].
GCESadRNN single model, GCES-adRNNe ensemble of 80 individual models. (This paper model).
The model was tested on data from 2006-2018. Due to incomplete data for about 40% of the countries in the dataset during the period of 2006-2015, the models had poorer capabilities in handling missing values. As a result, better results were achieved in the shorter period of 2016-2018. For these models, an asterisk (
A MAPE (Mean Absolute Percentage Error) metric of 1.85 is exhibited by GCES-adRNNe, while the values for other models range from 2.07 to 5.03. This indicates that the average prediction error of GCES-adRNNe is smaller, and in comparison to other models, its predicted results are closer to the actual values. Lower values are also achieved by GCES-adRNNe in metrics such as MdAPE (Median Absolute Percentage Error), IprAPE (Absolute Percentage Error within 80% Prediction Interval), RMSE (Root Mean Square Error), and MPE (Mean Percentage Error), which suggest higher accuracy in its predictions. Additionally, impressive performance is demonstrated by GCES-adRNNe in the StdPE (Standard Percentage Error) and GWtest (Grubbs’ Test) metrics. The small values observed in these two metrics indicate higher stability and significance of its predicted results. It can be observed from Figure 7 that the minimum MAPE is achieved by GCES-adRNNe when the ensemble size is 80.
The concept of context trajectories and main trajectories is introduced by the GCES-adRNNe model, and representative sequences of similar load days are selected using the comprehensive grey relational analysis method, providing context information for the prediction task. The correlations and similarities between sequences are captured by the GCES-adRNNe model through the extraction of context trajectories from representative sequences. The model is dynamically adjusted using the XGBoost algorithm to adapt to individual sequences for main trajectory prediction. This mechanism allows the model to be dynamically adjusted in real-time based on the features and context information of the current sequence, thereby improving the flexibility and accuracy of the prediction. In terms of the RNN architecture, the GCES-adRNNe model incorporates multiple stacked recurrent layers and attention-expanded recurrent units, enabling the model to capture short-term, long-term, and seasonal dependencies in time series and dynamically weight input information. The introduction of attention mechanisms allows the model to automatically focus on and highlight important time steps in the sequence, thereby improving the prediction of future load changes. Partial load prediction examples are presented in Figures 8, 9, and 10.
D. Model Ablation Analysis
In this section, the proposed GCES-adRNNe model is analyzed through a series of ablation experiments. The experimental settings are presented in Table 2.
Ab1: The performance of ES-adRNNe was achieved with a MAPE of 2.13 and an RMSE of 288.17. It utilized the main orbitals but did not incorporate additional inputs from contextual orbitals.
Ab2: A performance improvement was observed in cES-adRNNe, with a MAPE of 1.96 and an RMSE of 270. It achieved this by directly introducing the input vector into the second and third layers, and extending the output vectors from the previous layer.
Ab3: In Ab3, the same contextual vectors were used for all series in the main orbitals, and it resulted in slightly lower performance compared to Ab2. It achieved a MAPE of 2.05 and an RMSE of 274.
Ab4: By introducing the adaptive context vector modulation vector g, Ab4 improved its performance compared to Ab3. It achieved a MAPE of 1.95 and an RMSE of 266.12.
Ab5: Further performance improvement was observed in Ab5, with a MAPE of 1.92 and an RMSE of 261.71. It achieved this by utilizing XGBoost for the modulation of contextual vectors.
Ab6: The best performance was achieved in GCES-adRNNe, with a MAPE of 1.85 and an RMSE of 252.73. It combined comprehensive grey relational analysis for selecting similar load days and utilized XGBoost for the modulation of contextual vectors.
As shown in Table 3, the specific ablation experiment results are as follows.
The following conclusions can be drawn based on the above results. The performance is positively influenced by the introduction of adaptive modulation vector g and the utilization of xgboost for context vector modulation in the cES-adRNNe model. Furthermore, performance can be further improved by integrating comprehensive grey correlation analysis to optimize the selection of load days.
Conclusion
The GCES-adRNNe model is proposed in this paper to fully utilize the relationships among multiple factors and variables, as well as to extract contextual information from representative sequences. By dynamically adjusting model parameters and combining the features of the current sequence with contextual information, improvements in stability and accuracy are achieved by the GCES-adRNNe model. The effectiveness of the model is demonstrated through experimental results, where GCES-adRNNe performs well in seven evaluation metrics, including MAPE, MdAPE, and IprAPE. For instance, a MAPE value of 1.85 is obtained for GCES-adRNNe, while other models range from 2.07 to 5.03. The average prediction error of GCES-adRNNe is smaller, and relative to other models, its predicted results are closer to the actual values. This signifies that the model demonstrates a reduced average prediction error in electricity load forecasting, enabling it to accurately capture intricate variations in power demand. This provides robust support for practical applications.
Multiple stacked recurrent layers and attention-expanded recurrent units are adopted in the GCES-adRNNe model in the RNN architecture. This allows the model to effectively capture short-term, long-term, and seasonal dependencies in time series data and dynamically weight the input information. The introduction of the attention mechanism enables the model to automatically focus on and highlight important time steps in the sequence, resulting in better predictions of future load changes.
Power system planning and operational decisions can be strongly supported by the GCES-adRNNe model, promoting the sustainable development and optimization of power supply.