Introduction
Power load forecasting plays a core role in planning and scheduling of power system, for it not only reduces the costs of mismatching between generated power and actual demand, but also enhance the reliability of the whole system by eliminating the inadequate dispatching of energy. Among all literature introducing load forecasting techniques, most of them focus on point forecasting by generating fixed forecasting point at a specific moment in the future. Nevertheless, the power load is becoming cumulatively volatile with the growing fluctuation and uncertainty caused by natural and manual variation such as distributed renewable energy integration. As a result, forecasting approaches reflecting uncertainty on load are required by increasing number of decision-makers in the energy industry. Apparently, single-point prediction cannot represent the randomness appearing in load, and may sometimes invalidate the investment on power supply because of the sporadic gap between real and predicted values [1], [2].
Compared with point forecasting, probabilistic load forecasting describes the variation of the load by providing outputs in form of probability density function (PDF), confidential intervals, or quantiles of the distribution. It can be more suitable to confirm objective demands in system planning and energy trading, therefore being utilized in a wider range.
Literature on probabilistic load forecasting are relatively limited compare to traditional point forecasting. According to Hong and Fan [3], the combination of two or three of the following component can be utilized to generate probabilistic load forecasts: creating input scenario simulation, designing probabilistic models, and transforming point forecasts to probabilistic forecasts through post-processing. References [4]–[6] mainly utilized input scenario simulation, therefore, creating probabilistic forecasts. In [7], three basic input scenario generation methods, fix-date, shifted-date, bootstrap, were discussed, and an empirical study on these methods was established, measured by pinball loss.
Besides, more efforts have been devoted to generating probabilistic forecasting models. They can be summarized in following aspects: time series based, statistical regression based, sequence operation theory-based, and other machine learning method based. Fang [8] proposed a model based on chaotic time series. Sequence operation theory (SOT) was established by Kang [9], aiming to handle complicated probabilistic modeling. It has been utilized in modeling correlated stochastic variables [10] that can be used in generating probabilistic forecasts together with other statistic models. Statistical and other machine learning models were even more widely adopted in probabilistic forecasting like multiple linear regression [5], [11], quantile regression [12], gradient boosting [13], general addictive model (GAM) [14], kernel density estimation (KDE) [15], etc.
In addition, probabilistic forecasts according to postprocessing are also proved to be effective. In Xie's [6] and Mcsharry's [16] studies, residual simulation was used to convert point forecasts to probabilistic forecasts. Liu [12] applied forecasting combination to optimize results, which tended to manifest a great boost in performance.
It can be concluded from the literature that probabilistic forecasting has a wide time scope from short-term to long term. Some of the works focused on short term probabilistic load forecasting [1], [17], whereas even more works were keen on medium and long term probabilistic load forecasting [4–7, 14, 15], because there is great significance in energy trading and system planning [3].
This paper offers a solution for long term probabilistic forecasting in terms of hourly loads, applying the combination of input variable scenario simulation and a probabilistic model to generate forecasts. Concretely, artificial neural network (ANN) is utilized as the basic structure capturing nonlinear relationships of variables. Although ANN was mentioned in some literature related to probabilistic load forecasting [7], [18], it was simply treated as model generating point forecasts, yet the uncertainty of outputs which can be described by the model itself was ignored. Thus, we innovatively refined the traditional ANN to an intricate model that can generate probabilistic forecasts. We first fed the model with multiple inputs generated by the scenario-based method. Then regularized loss resembling quantile regression as loss function to be optimized by ANN, and advanced optimization algorithms to avoid local minimum are adopted to describe the randomness of the load in an annual scope.
Besides, we also use the embedding, a technique mapping low dimensional variables into high dimensional space, which has been widely adopted in handling categorical variables in other neural-network scenarios [19], [20]. It is proved to achieve better results than other common techniques utilized in previous literature, like one-hot encoding. Altogether, It turns out that the proposed method overrides state-of-art benchmarks in medium term probabilistic load forecasting in the dataset described in the section of the case study.
It should be pointed out that some literature have already considered both uncertainties in the input scenarios and output variations. However, they either combined input scenarios and output residual simulation based on relatively statistical methodology [21], or traditional probabilistic statistical model [14]. Compared to these efforts, our method stands out in the fusing probabilistic outputs into a malleable non-linear network, which does not require setting up an extra combination of input variables and can capture the non-linear dependencies between input and output variables better due to its complex structure.
Our key original contributions can be summarized in two aspects compared to previous researchers:
An ANN-based probabilistic forecasting model with regularized quantile optimization objective is proposed, considering both the randomness of inputs and the output variation described by a solid non-linear model.
A novel embedding method is utilized to handle categorical input variables, manifesting potential effectiveness in enhancing the performance of load forecasting. It has strong malleability in other machine-learning related scenarios in the field of scheduling and operation for the power system.
Dual uncertainties are considered based on the input fluctuation and load variation described by a robust non-linear model, which is relatively less considered in previous studies.
The rest of the paper is organized as follows: Sect. 2 introduces the overall objection and procedure of the proposed probabilistic load forecasting. Section 3, the core methodology in generating temperature scenario and describing load variation is proposed. Section 4 illustrates the evaluation criterion to qualify the performance of probabilistic forecasting and proposes several benchmark models for comparison. In Sect. 5, case study with data from ISO New England is established to verify the superiority of the proposed model. Finally, conclusions are drawn in Sect. 6.
Framework
The objection of probabilistic forecasting proposed in this paper is to generate hourly quantiles of probability density function (PDF) of annual hourly load utilizing information of one year and before. The overall procedure of probabilistic load forecasting can be summarized as follows. Figure 1 illustrates the procedure in a flow chart. It consists of five main steps: outliers detection, trend analysis, data normalization, probabilistic forecasting models training, and load variation and temperature uncertainty combination.
2.1 Outliers Detection
Two steps are designed for outliers detection. The first step is a nave continuity-based method. It is hypothesized that the hourly load should not have a dual-side salutation at each point. So the anomalous criteria is set as:
\begin{equation*}
\begin{cases}
\vert \frac{E_{t-1}-E_{t}}{E_{t}}\vert > 50\%\\
\vert \frac{E_{t+1}-E_{t}}{E_{t}}\vert > 50\%
\end{cases}
\tag{1}
\end{equation*}
However, this nave method cannot capture outliers beyond the temporary false record. Thus, the multiple linear regression model (also Vanilla Model in [11]) is utilized as an outliers detector in the second stage. This method is firstly proposed and proved to be effective in [21]. The absolute percentage error (APE) is calculated after fitting the historical hourly load for each hourly load in training set. The original load observations in training set with APE values higher than 50% are considered as outliers and are replaced by values estimated by the outliers detector.
Besides, it should be stated that it is of great significance to apply nave outliers detection in the first place. Granted, the baseline model can be a panacea to detect and modify relatively sparse outliers, yet the model based method can be detrimentally affected when the amount anomalous load points increases. For example, in bus load forecasting, the amount of outliers appearing in the bus load data cannot be neglectable, therefore researchers have to utilize a nave method to clean the data in the first place. It can be concluded that applying nave outlier detection before other more advanced anomalous modification method is quite necessary, bringing robustness to the process of load forecasting as a whole.
2.2 Trend Analysis
We extract the linear trend by simply adding linear variables ranging from 0 to 1 as inputs of the following regression model. The experiment results turn out that the forecasting model performs better considering linear trend than that without linear trend inputs.
2.3 Data Normalization
Due to the scaling sensitivity of inputs fed into the neural network, we set the inputs in the same scale by normalizing temperature with a min-max scaler. The normalized features fall in the range of [0, 1], and is calculated with the following equation:\begin{equation*}
I_{\mathrm{norm}}=\frac{I-I_{\min}}{I_{\max}-I_{\min}}
\tag{2}
\end{equation*}
2.4 Training Probabilistic Forecasting Models Considering Load Cariation
The first stage of forecasting is training a regression model considering load variation, which is proposed as a quantile regression neural network (QRNN) in this paper, generating probabilistic results in the form of quantiles. Normalized hourly variables (temperature, day types etc.) act as training features whereas corresponding hourly loads are training labels, supervising the training process of QRNN. The training process iterates with fine tuning the parameters of the model, and it is terminated as long as the validation loss no longer decreases.
2.5 Combining Temperature Uncertainty in Load Forecasting on the Basis of QRNN
Since QRNN is trained based on temporally simultaneous features, it cannot be utilized directly in forecasting one year ahead because some features, like hourly temperature in the next year, cannot be foreseen. So temperature uncertainty should be considered in real forecasting stage. The final results of load forecasting are generated by replacing the simultaneous temperature fed into QRNN with historical temperature scenarios.
Probabilistic Load Forecasting Considering Load Variation and Temperature Uncertainty
In this section, formulation of the forecasting problem is illustrated, following the detailed description of the proposed model in this paper.
3.1 Problem Formulation
As is mentioned in Sect. 2, to implement a probabilistic forecasting, we need to generate the PDF of the load for each hour. The distribution can be discretely manifested by a vector consisting of several quantiles of the PDF vector. Thus, the forecasting problem can be formulated as follows:\begin{equation*}
\pmb{E}_{t}=h(T_{t},Trend_{t},M_{t})
\tag{3}
\end{equation*}
\begin{equation*}
M_{t}=\{Hour_{t},Weekday_{t},Holiday_{t},Month_{t}\}
\end{equation*}
3.2 Embedding Technique for Categorical Variables
In a forecasting problem, categorical variables like the day type at moment
As is mentioned earlier, \begin{equation*}
\pmb{M}_{t}^{em} =\pmb{M}_{t}^{one-hot}\pmb{Q}
\tag{4}
\end{equation*}
In order to connect to other parts of the network being discussed in following paragraph, \begin{equation*}
\pmb{m}_{t}^{em}=flatten(\pmb{M}_{t}^{em})
\tag{5}
\end{equation*}
3.3 Quantile Regression Neural Network
Artificial neutral network (ANN) has been proved to be suitable for regression problem with multiple features due to its complicated connection of variables and non-linear transformation through activation function [22]. Most commonly used ANN for regression problems utilize back propagation (BP) algorithm to update parameters by minimizing the loss between outputs of ANN
However, conventional neural network can only raise single output at a time, which is incompatible with the aim to forecast load in a probabilistic manner. Therefore, a neural network for probabilistic forecasting is proposed based on the fundamental structure of ANN. We name the proposed model as QRNN (quantile regression neural network). The idea is that QRNN can generate vectors consisting of quantiles of aimed PDF of hourly load by adjusting parameters in defined loss function. Three layers are constructed as the basic structure of QRNN. The first layer is the concatenation of flattened embedding feature \begin{equation*}
\begin{cases}
\pmb{X}_{t}=Concatenate(T_{t},\ Trend_{t},\ \pmb{m}_{t}^{em})\\
\hat{E}_{t}^{\tau}=f(\pmb{WX}_{t}+\pmb{b})\qquad \tau=1,2,\ldots,N_{\tau}
\end{cases}
\tag{6}
\end{equation*}
Figure 2 demonstrates the overall structure of QRNN. The parameters of the neutral network are learned by minimizing the loss function with back propagation. The loss function for training the neural network is defined as:\begin{align*}
L & =\frac{\lambda_{1}}{2N}\Vert\pmb{Q}\Vert^{2}+\frac{\lambda_{2}}{2N}\Vert\pmb{W}\Vert^{2}+\frac{\lambda_{3}}{2N}\Vert\pmb{b}\Vert^{2}.\\
& \frac{1}{N}\sum_{t=1}^{N}(\max(E_{t}-\hat{E}_{t}^{\tau},0)\tau+\max(\hat{E}_{t}^{\tau}-E_{t}, 0)(1-\tau))
\tag{7}
\end{align*}
It consists of two parts. The first part of the lost function act as regularization preventing the QRNN from from overfitting.
Overall structure of QRNN model, modeling the load variation when temperature is known beforehand
By concatenating these results, the estimation of
3.4 Combining Temperature Uncertainty on the Basis of QRNN
It should be noted that
To formulate the process, let \begin{equation*}
\pmb{Ts}_{y,d}^{h}=\{T_{y-y_{0},d-d_{0}}^{h}\vert y_{0}\in[1,m],d_{0}\in[-n,n]\}
\tag{8}
\end{equation*}
Then we replace
Comparison and Evaluation Criteria
In this section, several evaluation criteria in the field of probabilistic forecasting are reviewed, and benchmark models for further comparison in case study will be proposed.
4.1 Evaluation Criterion
Generally speaking, PDF of hourly loads provide maximum information on forecasting, yet it may not be practical to obtain the real PDF of real-world quantities and for most of the time, the real PDF are downsampled with sparse empirical results. Therefore, evaluation over simplified results should be considered to be more practical. As is discussed in [24], reliability, resolution, and sharpness are commonly used evaluation criteria for probabilistic forecasting. In [25], the author utilizes Prediction interval coverage probability (PICP) as an evaluation criterion, which is described to be a significant measure for the reliability of prediction intervals [25]. Nevertheless, PICP only considers the upper and lower bounds of the forecasting intervals, thus ignoring inner characteristics of the distribution. To balance the complexity caused by real PDF and potentially ignored information in interval-based measures like PICP, pinball loss function is presented as a sound evaluation criterion for load forecasting. It is defined as:\begin{equation*}
L_{\tau}(E_{t},\hat{E}_{t})=\begin{cases}
(E_{t}-\hat{E}_{t})\tau & E_{t}\geq\hat{E}_{t}\\
(\hat{E}_{t}-E_{t})(1-\tau) & \hat{E}_{t} > E_{t}
\end{cases}
\tag{9}
\end{equation*}
4.2 Benchmark Models
Three benchmark models are discussed and utilized in performance evaluation. The first benchmark model is the multiple linear regression model (MLR) appeared as outliers detector. It is regarded as nave benchmark models in several probabilistic forecasting research [12], [14], [15]. The model is defined by:\begin{align*}
E_{t} & =\beta_{0}+\beta_{1}\cdot Trend_{t}+\beta_{2}T_{t}+\beta_{3}T_{t}^{2}+\beta_{4}T_{t}^{3}\\
& +\beta_{5}\cdot Month_{t}+\beta_{6}\cdot Weekday_{t}\\
& +\beta_{7}\cdot Hour_{t}+\beta_{8}\cdot Hour_{t}\cdot Weekday_{t}\\
& +\beta_{9}T_{t}\cdot Month_{t}+\beta_{10}T_{t}^{2}\cdot Month_{t}+\beta_{11}T_{t}^{3}\cdot Month_{t}\\
& +\beta_{12}T_{t}\cdot Hour_{t}+\beta_{13}T_{t}^{2}\cdot Hour_{t}+\beta_{14}T_{t}^{3}\cdot Hour_{t}
\end{align*}
In addition, a neural-network based model is introduced with (10) as optimizing target, we denote this model as MLP (multi-layer perceptron). This model act as a parallel with MLR since they all take in similar inputs and estimate parameters by optimizing the same objective (10), and merely consider temperature uncertainty. MLP has a similar structure with QRNN, yet it contains no embedding layers, only one hidden layer after the inputs are fed into the network, and ReLU as the activation function.
Except MLR and MLP as benchmark models, another benchmark model is proposed considering both uncertainties in temperature and load variation when inputs are fixed with linear quantile regression (LQR). To express load variation more directly, we train the quantile regression model separately on each hour and day type in order to connect hourly load directly with fixed temperature and its polynomials as the only inputs. For a specific hour and day type, the LQR model is given as:\begin{equation*}
E_{t}^{\tau}=\beta_{0}+\beta_{1}T_{t}+\beta_{2}T_{t}^{2}+\beta_{3}T_{t}^{3}
\tag{11}
\end{equation*}
Besides, it should be mentioned that
Case Study
In this section, we demonstrate an experiment based on real world dataset. This section will be organized as follows. The proposed model is built up with Keras, an advanced deep learning library in Python, and benchmark models are built up with Scikit-Learn.
5.1 Introduction of Dataset and Experiment Settings
The hourly load and corresponding weather information are obtained from the official website of ISO New England, which is accessible to the public. The data consists of 8 different zones in New England, US. We only utilize the time information (hour, week, month, year), load, and drybulb temperature in this case study. In our experiment, the data from 2004 to 2015 are selected as the combination of our training set, validation set, and test set.
Figure 3 shows load variation and temperature uncertainty appeared in the recorded data. It can be concluded from Fig. 3a that even the temperature and other input variables are fixed, the load still appears to fluctuate. Besides, Fig. 3b indicates that temperature has great uncertainty at the same time of each year. Therefore, dual uncertainties should be considered to generate a more reasonable probabilistic forecasting intervals.
5.2 Procedures of Proposed Forecasting Approach in the Experiment
Above all, dual stage anomalous detection is implemented. Figure 4a demonstrate a anomalous measure record captured by the nave outliers detection method. Figure 4b shows the anomalous drop in load monitored by the model-based outliers detector.
Then training process on the training set is implemented by feeding normalized data described by (2) into QRNN. Concretely, seven years of hourly load and temperature from 2008 to 2014 serve as the training set, whereas 20% of the training set is randomly split as the validation set during each training epoch and stop training in advance by monitoring the validation loss. Concretely, when the validation loss does not decrease for 5 epochs, the training process is terminated. Besides, we tune the parameters: learning rate of the optimizer
Anomalous outliers in hourly load, which can be detrimental to the forecasting performance if not being modified
In the second stage, as what has been declared in the last section, the uncertainty of temperature needs to be considered by giving a probabilistic forecast on the hourly temperature in 2015.
Temperature scenario based method demonstrated in Sect. 3 is proved to be more effective than other temperature forecasting techniques such as quantGAM [14] in this specific case study. Concretely,
5.3 Comparison and Discussion
In this subsection, following crucial questions are about to be answered by making the corresponding comparison. Firstly, is a model combining output variation described by probabilistic model and temperature input uncertainty performs better than one only taking stochastic temperature scenarios into account? Secondly, can QRNN out perform other statistic models considering dual uncertainty? Thirdly, is embedding of categorical features beneficial for higher performance compared with traditional techniques like one-hot encoding? At the end, an overall comparison of forecasting performance is demonstrated between proposed models and three benchmark models.
Figure 7 shows three forecasting results of the same horizon. Apparently, three models underestimate the hourly load concordantly. Since QRNN captures both temperature uncertainty and load variation, the error is penalized by a greater forecasting interval, leading to the decrease in pinball loss, yet MLR without considering on load variation failed to compensate such error, therefore leading to a significant variance on this test day.
On the other hand, although LQR considers dual uncertainty as what has been illustrated in Sect. 4, the final forecasting results by LQR expressed in Fig. 7c indicates two main problems by simply modeling hourly load and temperature separately with nave linear quantile regression.
Since the LQR model is trained separately when the hour and day types are fixed, loads are estimated independently and concatenated by the hour and dates to the final load series. This will lead to the discontinuity between hours, which can be detrimental to forecasting results due to the lack of smoothness. This argument actually undermines the “training in separate hour” pattern in [14] since the load continuity within time is ignored. Besides, the forecasting interval is conspicuously widened. This can be explained that LQR only set temperature and its polynomials as inputs in the case study, which can lead to an overestimating problem because of scarcity in input feature types.
In addition, MLP is used as another benchmark model in final comparison. We use RMSprop as an optimizer for back-propagation of error for MLP. The number of perceptrons in the hidden layer can be treated as hyperparameter in this model, thus can be finetuned the till optimum. Only the best forecasting results are reported.
Table 1 shows the final forecasting pinball loss in 8 zones in New England by means of one proposed approach together with three benchmarks, and the maximum relative improvement (MaxRI) as well. With the fact that a lower pinball loss indicates a better probabilistic forecasting, QRNN overrides three benchmark models in 7 zones of 8 in total, yet it only underperforms 3.8% worser compared with the best model in this area. We can read the column of MaxRI that QRNN outperforms the benchmark models significantly. The relative improvements among all area reach 20% approximately, indicating the effectiveness of our proposed method against benchmarks in the case study.
In addition, MLR and MLP are parallel benchmarks as representatives of models considering the single uncertainty of temperature. The result turns out that they have similar performance in the case study, yet MLP performs slightly better since it has a higher capability in modeling non-linear effects and interactions between variables. Although LQR considers both load variation and temperature, the widened forecasting interval and discontinuity in load series may contribute to the high pinball loss.
To demonstrate the potential effectiveness of embedding toward categorical parameters, another comparison is conducted and the final results are shown in Table 2. It should be mentioned that the results of QRNN with embedding reported here are finetuned by adjusting embedding layers to minimize the validation loss. It can be concluded that compared to one-hot encoding, optimized parameter embedding can decrease the pinball loss and in other words, can better captures features of input variables in probabilistic forecasting.
Apart from that, in order to observe the forecasting performance in a more detailed time scope, we select a zone with QRNN as its best forecaster in 2015 and visualize the pinball loss with a bar chart in Fig. 8. Two main conclusions can be drawn from this figure. It is observed that QRNN does not perform best in March, April, May, September, even if the annual loss is low. However, there is a significant drop in pinball loss compared with single uncertainty-based models (MLP, MLR) in temperature extreme months, like February and August. It can be inferred that QRNN considering dual uncertainty can handle forecasting problem better than single uncertainty-based models because the load variation is more intense during temperature extreme period, so QRNN captures this characteristic better, leading to better performance in this period. On the other hand, the single uncertainty-based models are presented to achieve better performances when the temperature is mild since it is enough only taking temperature into account, while considering dual uncertainty may act as a conservative estimation by widening the forecasting interval.
Conclusion
In this paper, an innovative method on probabilistic load forecasting is proposed. By considering both input uncertainty and output variation, it turned out that the proposed QRNN model performs better than commonly used benchmark models. Besides, embedding techniques have shown potential in handling categorical inputs, which can enhance the overall performance of forecasting. Further studies can be conducted from multiple aspects, such as optimizing network structure with state-of-art techniques like deep neural networks and utilizing multi-temporal information to train the model, therefore mining more hidden information and enhance the performance of load forecasting.
ACKNOWLEDGEMENTS
This work was supported by National Key R&D Program of China (No. 2016YFB0900100).