Journals & Magazines >IEEE Access >Volume: 8

A Comprehensive Review of the Load Forecasting Techniques Using Single and Hybrid Predictive Models

Information flow presented in the paper, with the explanation of single and hybrid models for load forecasting.

Abstract:

Load forecasting is a pivotal part of the power utility companies. To provide load-shedding free and uninterrupted power to the consumer, decision-makers in the utility s...Show More

Metadata

Abstract:

Load forecasting is a pivotal part of the power utility companies. To provide load-shedding free and uninterrupted power to the consumer, decision-makers in the utility sector must forecast the future demand for electricity with a minimum error percentage. Load prediction with less percentage of error can save millions of dollars to the utility companies. There are numerous Machine Learning (ML) techniques to amicably forecast electricity demand, among which the hybrid models show the best result. Two or more than two predictive models are amalgamated to design a hybrid model, each of which provides improved performances by the merit of individual algorithms. This paper reviews the current state-of-the-art of electric load forecasting technologies and presents recent works pertaining to the combination of different ML algorithms into two or more methods for the construction of hybrid models. A comprehensive study of each single and multiple load forecasting model is performed with an in-depth analysis of their advantages, disadvantages, and functions. A comparison between their performance in terms of Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), and Mean Absolute Percentage Error (MAPE) values are developed with pertinent literature of several models to aid the researchers with the selection of suitable models for load prediction.

Information flow presented in the paper, with the explanation of single and hybrid models for load forecasting.

Published in: IEEE Access ( Volume: 8)

Page(s): 134911 - 134939

Date of Publication: 20 July 2020

Electronic ISSN: 2169-3536

DOI: 10.1109/ACCESS.2020.3010702

Contents

CCBY - IEEE is not the copyright holder of this material. Please follow the instructions via https://creativecommons.org/licenses/by/4.0/ to obtain full-text articles and stipulations in the API documentation.

Nomenclature

AbbreviationExpansion

Abbreviations
ABC	Artificial Bee Colony
AIN	Artificial Immune Network
AIS	Artificial Immune System
ANFIS	Adaptive Neuro Fuzzy Inference System
ANN	Artificial Neural Network
AR	Auto Regressive
ARIMA	Auto Regressive Integrated Moving Average
ARMAX	Auto Regressive Moving Average with Exogenous Input
BA	Bat Algorithm
BFGS-FA	Broyden-Fletcher-Goldfarb-Shanno-Firefly Algorithm
BP	Back Propagation
CI	Computational Intelligence
CNN	Convolutional Neural Network
CT	Clustering Technique
DAE	Deep Auto Encoder
DAF	Dynamic Activation Function
DBN	Deep Belief Network
DL	Deep Learning
DNN	Deep Neural Network
D-RNN	Deep Recurrent Neural Network
EEMD	Ensemble Empirical Mode Decomposition
ELM	Extreme Learning Machine
EMD	Empirical Mode Decomposition
FA	Firefly Algorithm
FCM	Fuzzy C-Mean
FCW	Fuzzy Combination Weight
FL	Fuzzy Logic
FOA	Fruitfly Optimization Algorithm
FTS	Fuzzy Time Series
FRBS	Fuzzy Rule-base System
GA	Genetic Algorithm
GAF	Genetic Algorithm with Fuzzy Logic
GHSA	Global Harmony Search Algorithm
GNN	Generalized Neural Network
GOA	Grasshopper Optimization Algorithm
GP	Genetic Programming
GRNN	Generalized Recurrent Neural Network
GRU	Gated Recurrent Unit
GSA	Gravitational Search Algorithm
HGASVR	Hybrid Genetic Based Support Vector Machine
HS	Harmony Search
IA	Immune Algorithm
IAGA	Improved Adaptive Genetic Algorithm
IEMD	Improved Empirical Mode Decomposition
IGSA	Improved Gravitational Search Algorithm
IMF	Intrinsic Mode Function
IS	Immune System
IT2FLS	Interval Type-2 Fuzzy Logic System
KF	Kalman Filter
LM	Levenberg-Marqardt
LTLF	Long-term Load Forecasting
LS	Least Squares
LSTM	Long Short-term Memory
MAE	Mean Absolute Error
MAPE	Mean Absolute Percentage Error
MFA	Modified Firefly Algorithm
MGGP	Multi-Gene Genetic Programming
ML	Machine Learning
MLP	Multilayer Perceptron
MSCNN	Multi Scale Convolutional Neural Network
MTLF	Medium-term Load Forecasting
NARX	Nonlinear Autoregressive Models with Exogenous Input
NFIS	Neural Fuzzy Inference System
NN	Neural Network
PDRNN	Pooling-based Deep Recurrent Neural Network
PSO	Particle Swarm Optimization
RBF	Radial Basis Function
RELM	Recurrent Extreme Learning Machine
RMSE	Root Mean Squared Error
RNN	Recurrent Neural Network
RT	Regression Tree
SA	Stimulated Annealing
SAF	Static Activation Function
SARIMA	Seasonal Autoregressive Integrated Moving Average
SLFN	Single-hidden Layer Feed-forward Neural Network
SOM	Self-Organizing Map
STLF	Small Term Load Forecasting
SVM	Support Vector Machine
SVR	Support Vector Regression
THI	Temperature Humidity Index
TSK-FIS	Takagi-Sugeno-Kang Fuzzy Inference System
VaR	Value at Risk
VMD	Variational Mode Decomposition
VSTLF	Very Short-term Load Forecasting
WCI	Wind Chill Index
WNN	Wavelet Neural Network
WT	Wavelet Transform
Symbols
$y_{i}$	Actual desired value of the model
$y_{i}'$	Predicted value of the model
$m$	Number of input neurons for ELM; any real number greater than 1, cluster for FCM algorithm
$n$	Number of output neurons for ELM
$I_{m}$	Objective function for FCM algorithm
$x_{m}$	Input values for ELM
$\omega _{m}$	Values of weight from input layer to hidden layer for ELM
$\beta _{m}$	Values of weight from hidden layer to output for ELM
$O_{j}$	Output value from ELM algorithm
$u_{ij}$	Degree of membership of $x_{i}$ in the cluster j
A	Premise for FRBS
B	Consequence for premise A for FRBS
$w$	Vector of weight for MLP
$x$	Vector of inputs for MLP
$y$	Single output for MLP
$b$	Bias for MLP
$\varphi$	Non-linear activation function for MLP
$\xi$	Euclidean vector
$t$	Index of data in a given sequence
$x\left ({t }\right)$	New data item for SOM algorithm
$\alpha (t)$	Scalar factor for size correction of SOM
$c$	Index for SOM having smallest distance from $x(t)$ in Euclidean signal space
$X_{i},Y_{i}$	Training data for SVM
$R$	Regularized risk function for SVM
$C$	Regularization constant for SVM, K-clusters for training Neural Network
$L$	Loss function for SVM
$W$	Regularizer for SVM
$X_{k}$	Randomly selected firefly for MFA
$w_{i}$	Adjusting coefficient for ANN-MFA
$M_{w}$	Weighting factor for ANN-MFA
$M_{b}$	Biasing factor for ANN-MFA
$F_{c}$	Set for K-prediction model
$k$	Number of clusters for K-prediction model
$v_{ij}$	Synaptic connection weight from the $i$ -th input node $x_{i}$ to the $j$ -th neuron
$n_{h}$	Nodes in hidden layer for ANN-GA
${net}_{s}^{j}\mathrm {(\cdot)}$	$j$ -th Static Activation Function
${net}_{d}^{j}\mathrm {(\cdot)}$	$j$ -th Dynamic Activation Function
${net}_{o}^{l}\mathrm {(\cdot)}$	Activation function for $l$ output neurons
$m_{d}^{j}$	Dynamic mean for $j$ -th Dynamic Activation Function
$\sigma _{d}^{j}$	Dynamic standard deviation for $j$ -th Dynamic Activation Function
$\zeta$	Regularization parameter for GHSA-FTS-LSSVM
$\mu$	Euclidean vector, RBF kernel function parameter
$a_{3}$	Approximate wavelet component for GNN-WT-GAF
$d_{1},d_{2},d_{3}$	Detailed wavelet components for GNN-WT-GAF

SECTION I.

Introduction

Modern power system demands an uninterrupted supply of electricity to the load side. This requires a proper idea of predicting present and future load demand with the least amount of error. For achieving this goal, scientists and scholars have been trying to develop the most efficient and optimal state-of-the-art method for predicting the future demand for electricity consumption by a method known as load forecasting. Load forecasting is used to control several operations and decisions such as dispatch, unit commitment, fuel allocation, and off-line network analysis [1]. This gives the power utility company an idea about the future demand of the consumers and an ample amount of time to mitigate the difference between the generation capacity and load demand. Demand prediction minimizes the power generation cost and helps to establish an organized power system utility, especially because of the large expense pertaining to power generation. Different Machine Learning (ML) based techniques are widely used by many power and energy utility companies to predict the power or energy needed to equilibrate between generation and demand. In general, load forecasting can be termed as a technique for demand and supply management. However, it is a complex task requiring the analysis of various direct and indirect factors affecting the process. Even though there are several benefits of using load forecasting techniques, some challenges inhibit the accuracy of the methods. The process used for forecasting is convoluted and sometimes stochastic in nature. Weather-related variables further influence the data, complicating the forecasting. Therefore, the load at a given hour is not only dependent on the previous hour; rather it is affected by the data of the load consumption of the previous day, weather, demographic data, appliances number in the forecasting area, customer type, customer number, and econometric data, etc., [2]. Even in such varying circumstances, it is necessary to keep the load forecasting error as minimum as possible. Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), and Mean Absolute Percentage Error (MAPE) are the measures of prediction accuracy, usually measured in percentage. The values for the evaluation of a certain algorithm are to be kept within a few percentage points for its viability in load forecasting.

Historical load data is the key component of a load forecasting model. Because to train the model, it has to learn the pattern of the electrical load data consumption. After that, the load data must be prepared for training purposes. The missing values and erratic data are corrected. Afterward, this electrical data is used to collaborate with data pertaining to other factors such as historical weather data, historical event data, etc. The accuracy of the forecasting is highly contingent upon such factors. All the aggregated data is then analyzed, and several models are selected for load forecasting. Among these models, the best with the most accuracy is chosen for implementation. There are several other factors that determine the accuracy of the load forecasting model. Most of them vary largely based on locations and equipment. These are also necessary to be brought under consideration while designing a precise load forecasting model. Such factors are usually considered as the input variable for the development of a model. Due to the unavailability of the desired data, it is sometimes difficult to consider all the factors. To develop a forecasting model in a certain region, the data of the factors are collected from respective sources, such as the weather data, which is collected from the local weather office, or the time factor data, which is collected from the calendar. The collection of data also faces several impediments due to the unavailability of proper data, anomalies in cases, missing values (more than 5%) in data, loss of data points are a few of the reasons to create problems in the data accumulation process. These problems have solutions that entirely depends on the case where the model is applied. The presence of one of these factors can cost the precision of the value. Hence, the possible solutions for addressing such challenges are being brought forth by researchers. Load forecasting with greater accuracy can be a great money-saving potential for electric utility corporations, whereas unwanted errors can lead to a great amount of financial and infrastructural loss. According to Haida and Muto et al. [3], both positive and negative forecasting errors can result in an increased generation cost of electricity. Only by 1% decrease in mean absolute percentage error (MAPE) and precise load forecasting have a consequential impact of 3-5% on the generation side by reducing the cost of generation about 0.1% to 0.3% [4]. In recent years, government and electric utility companies are showing great interest in renewable energy-based power generation. The progressing increase in renewable energy influences some great challenges. The stochastic nature of solar and wind power makes it challenging for the utility companies to match the load with the variable production of infrequent generation side. Researchers are trying to develop a new dispatch technique to replace traditional dispatch techniques that can operate with the addition of high-level intermittent energy sources like solar, wind energy, etc., [5]–[8].

This paper focuses on different single and hybrid methods based on ML. Two supervised learning methods, Support Vector Machine (SVM) and Artificial Neural Network (ANN) are widely used to design hybrid predictive models. In order to explore these two methods, different popular methods are delineated in the paper before discussing SVM and ANN-based hybrid algorithms. Some of these single methods are also optimized with either of these two algorithms. On the success of two method-based models, three or more single predictive models are being experimented upon to contribute to the improvement of forecasting accuracy. This paper reviews the current and previous work done on both the single methods and hybrid models for short-term load forecasting (STLF) utilizing the most used methods and prepares a comparative result of different algorithms used in distinct literature in order to present a comprehensive study of the load forecasting technology. Beginning with different aspects of load forecasting, the technique is classified based on the time horizon to scrutinize the algorithms’ effectiveness in accordance with the prediction time. The efficacy of the utilized models is evaluated based on statistical criteria, to realize the significance of the models based on contingent factors. The hybrid models comprising of two or more methods are discussed to stipulate how the integration of algorithms can affect the forecasting technique. The flow of the information, as such, is shown in Fig. 1.

FIGURE 1.

Information flow presented in the paper, with the explanation of single and hybrid models for load forecasting.

Show All

The rest of the paper is categorized as follows: Section II describes the factors, benefits, and challenges of load forecasting methodologies, section III categorizes the load forecasting methods based on the time horizon, section IV delineates the criteria for the evaluation of the accuracy of the models, section V reviews the basic principle and work done on single load forecasting model with the comparison of advantages and disadvantages; Section VI reviews the latest work done on hybrid models composed of two methods, where Part A discusses on the hybrid models optimized with SVM and Part B discusses on the hybrid models optimized with ANN. Section VII performs a comparative study on some hybrid models combining more than two methods. Section VIII enlists the significant findings of the paper. Finally, the conclusion is drawn in section IX with the indications of future research work.

SECTION II.

Aspects of Load Forecasting

A. Factors Affecting Load Forecasting

The empirical process of forecasting relies on several agents, which affects the precision of the process. Researchers need to choose the dependent factors carefully to acquire a correct prediction from the system. Time, weather and economic factors can be considered as the major effects to consider, although other major and minor factors affect every step of forecasting. Fig. 2 depicts the flowchart of the development of a load forecasting model, where the historical load, weather and event data are collected first from smart meters, sensors, data servers or other sources though various technologies and algorithms so that they can be prepared for analysis [9]. The selection of the model also requires significant attention as different algorithms use varying parameters as per these factors. Notable factors affecting data accumulation and model selection are briefed as follows:

Time Factor: Time is the most important factor in load forecasting. From observing load curve of several different grid stations, Ruzic et al. [10] found that load curve has “time of the day” property along with “day of week”, “week of month” and “month of season” property. This also indicates that not only the current data, but also the data of the previous days for a certain location also contribute to the accuracy of the prediction, which cannot be acquired by the reliability on the former data. Moreover, the timeframe to detect the load is also crucial as it defines the amount of data required for the process to run.
Weather Factor: Weather is most independent variable in load forecasting domain, having its greater impact on domestic and agricultural consumers. The weather has a dominant effect on the behavior of the consumers. For example, in hot summer and cold winter, the electricity consumption goes up as the heating and cooling devices are turned on. This causes an increased demand of electricity at the warmest or coolest weather compared to the load demand during the days with average temperature. Also, sudden drop of temperature can cause less electricity consumption and thus causing overestimated load forecast. For the purpose, different models utilize the results from the weather forecast to predict the future load demand. The weather factors include temperature, humidity, dew point temperature etc. Also, temperature humidity index (THI) and wind chill index (WCI) are broadly used by the utility companies. THI and WCI are the measurement of summer heat discomfort and cold stress in winter respectively.
Economic Factor: Economic factors such as electricity price, management of load and degree of industrialization have important impact on system average load and maximum demand [11]. Moreover, customer behavior, change in tariff, description of appliances, population of the forecasting area, the age of the equipment, and employment levels play influential roles in determining the perfection of forecasting. For proper prediction in a particular area, usually for long term load forecasting, these factors must be brought under calculation as economic prediction considers public behavior to extrapolate load generation and demand, affecting data acquisition and model selection procedure [12].

FIGURE 2.

Flowchart of developing a load forecasting model; the load, weather and event data are collected to analyze and prepare the model to choose the best one for implementation.

Show All

B. Benefits

Load forecasting has its own benefits which attracted the researchers into this domain. Since the early age of the electricity generation, it was a burning question among the utility companies to determine the demand for the next hour, next day and even years, to make a balance between the limited resources and ever-increasing demand of electricity. Even though renewable energy sources have mitigated the resource management issue, the energy harvesting processes are still expensive and cumbersome. The electricity dispatching can also be optimized if the load demand can be estimated beforehand. That is when the term load forecasting came in. Apart from the economic and environmental perspective, there are other benefits of load forecasting. They are:

Understanding the future load demand helps the utility companies to plan, make economical viable decisions and minimize the risk for the companies. Also, decisions for future generation and transmission investments are made from load forecasting.
Load forecasting helps in planning the required resources for the future such as fuels required to operate the generation side as well as other resources that are important to ensure uninterrupted power to the consumers. This confirm economical generation and uninterrupted distribution of electricity.
To build a future generation plant, load forecasting helps in planning the plant size, location, capacity and type of the future generation plant. This gives a clear idea of the cost of transmission and distribution infrastructure as well as other associated loss.
By understanding the demand, load forecasting helps in deciding and planning for the maintenance of the power systems.
Finally, the load forecasting ensures maximum utilization of power plants by eradicating under generation and over generation which in turns helps to reduce the usage of fossil fuel and reduce the carbon emission.

C. Challenges

For many years, researchers are trying to develop accurate load forecasting model to improve the efficiency and revenues for the electrical generating and distribution companies. This has, so far, lead to the invention of many state-of-the-art methods. In recent years, researchers to investigate to improve the accuracy of these models, but several challenges are impeding the objective of the investigations. These challenges are the main hindrance in getting the best accurate model. Some of these difficulties to achieve the best forecasting model are mentioned below:

Electrical load forecasting is very much depended on weather. Unfortunately, sometimes the weather is not predictable and the forecasting may have a great error for sudden change in weather condition. Also, different regions in a big electrical system may have different weather conditions which affects the electricity demand. This creates a negative impact on the revenues.
Consumers in different region use different types of meters e.g. smart and traditional meters with different tariffs. Also, the usage behavior varies between these customers. The utility should have a clear understanding of the system utilization and develop separate forecasting model for each of the metering systems. Then add them up for the final forecast value. Otherwise there will be a huge forecasting error.
It is also difficult to get accurate data on consumption behavior due to sudden change in factors such as pricing and the corresponding demand based on such a price change.
Accurate load forecasting is difficult due to the complexity in fitting numerous complex factors which affect the demand for electricity into the forecasting models.
Sudden disturbances in the power system affects the regular load models. Unexpected faults and transients create exceptional data logs, which are to be considered separately for the design of the models, lest they should be distributed with the data equally, resulting in the design of a poor and unreliable forecasting system.
In addition, it is not easy to obtain an accurate demand forecast based on parameters such as change in temperature, humidity and other factors which influence the consumption of electricity.
Finally, the distribution company may suffer losses if they do not understand and decide on an acceptable margin of error in short term load forecasting.

The variation in the challenges motivated the designers to focus on different aspects of the model, collecting incidental data, parameterizing the algorithms and choosing the best model to make load forecasting as reliable as possible.

SECTION III.

Categories of Load Forecasting

In terms of the time horizon, load forecasting can be categorized into four classes, on the basis of which different ML algorithms can be implemented: very short-term load forecasting (VSTLF), short-term load forecasting (STLF), medium-term load forecasting (MTLF) and long-term load forecasting (LTLF). VSTLF is popular for load forecasting from few seconds to few minutes. In VSTLF the load in the near future can be forecasted by the load in the past [13]. That is why temperature, economics and land use information can be optional. Extrapolating the recently observed load pattern to the nearest future is being used instead of modeling relationships between weather conditions, load, time and other affecting factors to the load. Methods for VSTLF are few, mostly including Autoregressive Moving Average Models, Artificial Neural Network, and Genetic Algorithm etc. STLF is used for a lead time from few minutes to few hours. It plays a vital role in system operations and it is the main source of information for all daily and weekly operations concerning generation commitment and scheduling [14]. As the load for long time horizons can be approximated from STLF, researchers are mostly interested in designing predictive models for the domain. For better short-term prediction modeling, it is required to have a proper knowledge on the factors affecting the load. Some of these factors are the weather conditions, the season, the type and time of the day of a specific area and many others. Integration between the factors and the load demand is the primary object to look for as the demand at any time of the day is different. MTLF is generally used for forecasting load from a few days to few months [15]. It is popular for forecasting load in seasonal changes such as winter or peak-summer etc. LTLF is used for lead time from few weeks to several years [16]. It takes into account the historical load and weather data, customer’s number in categories, the characteristics of the appliances of the area etc. The economic factors are specially considered for long term forecasting methods. Table 1 shows the comparison between load forecasting methods based on time period, factors and application.

TABLE 1 Classification of Load Forecasting Methods According to Time Period, Factors and Their Application

Among all the different methods, STLF is the most popular one. It plays a key role in the formulation of economic and secure operating strategies for the power system because of its inherent connectivity to other type of forecasts. STLF can be transformed into MTLF and LTLF by adding econometric variables to the STLF and extrapolating the model to the longer horizon. On the other hand, VSTLF model can be achieved from STLF by adding the loads of some preceding hours as a part of the inputs to the STLF model. Autocorrelation of the current hour load and the previous hour loads can be captured by short-term load forecasting. Also, the residuals of historical load can be collected and form a new series having the STLF as a base. Avery short term forecast can be obtained by forecasting the future residuals and adding them back to the short-term forecast. Process flow of conversion between STLF and LTLF, MTLF and VSTLF is shown in Fig. 3.

FIGURE 3.

Process flow of conversion between STLF and LTLF, MTLF, VSTLF with the transaction of economic variables, statistical process and time.

Show All

As depicted in Fig. 4, the simple process flowchart of STLF takes weather and load history as the input of the modeling process to model the extrapolating process with the accumulation of the weather forecasting data. The forecasting data is then considered for the minute or hourly prediction of the load. There are many STLF techniques that are designed for the model. Some of them are time series analysis, regression analysis, artificial neural networks (ANN), support vector machine (SVM), fuzzy logic (FL), genetic algorithms (GAs) and hybrid methods etc., as listed in Fig. 5. Due to their self-adaptive mechanism and the features of mimicking the intelligent behaviors in complex and continually changing behavior.

FIGURE 4.

Flowchart of simple short-term load forecasting method, showing the modeling and extrapolating process driven by weather, load data and weather forecast values for load prediction.

Show All

FIGURE 5.

Most used single methods for short-term load forecasting method, some of which are used for other medium and long term predictions as well.

Show All

Computation Intelligence (CI) methods are widely used in the current research works. CI method is denoted as the potentiality of computer algorithms to intuitively learn a particular task from the trial and error perceptions form the available data from the past and predict accurately in the future based on the learning. The unique feature of CI methods is related to their ability for autonomous operation without requiring any complex mathematical formulations or quantitative correlations between the inputs and the output. The hybridized version of CI models has achieved significantly better performance than their single model counterparts [17].

SECTION IV.

Evaluation Criteria

Several criteria are used as the evaluation indices of load forecasting techniques to justify how correct the methods are to predict the actual load values. Different researchers have depended on different statistical metrics to quantify the accuracy of their model and with the passage of time new statistical metrics are coming into play such as metrics for probabilistic load forecasting. The probabilistic load forecasting literature is still developing as it shows promising academic value and wide adaptation in the industry. Table 2 shows the most popular static metrics among the researchers around the world.

TABLE 2 Evaluation Criteria Formula

Here, $\boldsymbol {n}$ is the number of samples, $\boldsymbol {y}_{ \boldsymbol {i}}^{ \boldsymbol {'}}$ is the predicted value of the model and $\boldsymbol {y}_{ \boldsymbol {i}}$ is the actual desired value. Each of these metrics has its own advantages and disadvantages. RMSE provides a loss function of second degree but emphasizes greater errors than the small ones. MAE is unambiguous and it can measure average error naturally. MAPE does not depend on scale and it can be applied easily to both high and low volume products. But due to differential penalty often it can lead to biased forecasting. The weaknesses of MAPE, such as difficulties in handling small and zero denominators, are not very relevant for traditional load forecasting problems, because the load at the aggregated level is rarely zero or approaching a very small number [18].

Also the value of these metrics are different for different datasets and parameters. Therefore it is quite difficult to compare the results of different techniques. Also, there is no such task where all the methods are experimented in a single dataset, to find the comparison between them. In this research work, the best accuracy of each discussed method are tabulated in the subsequent sections from different forecasting methods.

SECTION V.

Single Method for Load Forecasting

Using parametric, nonparametric and artificial intelligence based single methods, load forecasting is performed for very short time. ANN, Local Fuzzy Reconstruction Method, and specific regression and statistical models are being used to forecast for a few seconds to a dozen of minutes [19]. For STLF, the versatility of algorithms such as Expert Systems, Time Series Analysis, Similar Day Look Up Approach, SVM and FL augments as shown in Fig. 5. Single methods used for STLF are somehow reliable for MLTF as well. Grey model, Adaptive Neuro Fuzzy Inference System (ANFIS), and Wavelet Transform (WT) and several distinguished methods are added to the list for LTSM along with those used for MLTF [20]. For the employment of such methods, a clear concept of the exercised terminologies are necessary, which are provided and compared with respect to the evaluation criteria in the subsection that follows.

A. Learning Based Methods

1) Deep Learning (DL)

The term Deep Learning (DL) was first introduced by Dechter [21] in 1986. Apparently, after a gradual improvement in the algorithm and context, it is now renowned as Deep Neural Network (DNN). A neural network is a shallow structure consisting of an input layer, a hidden layer and an output layer. However, a deep learning system architecture consists of more layers than a traditional three-layered multilayer perceptron (MLP). The networking structure of deep learning is the best possible simulation of human cerebral cortex, where it mimics the human brain function [22]. Due to its feature of modeling nonlinearity, DL is widely used in various forecasting applications [23]. It can represent the complex high-dimensional functions and hyper variable functions, which deteriorates the computational complexity of the forecasting models. One major drawback of this system is the over-fitting issue due to the requirement of large number of layers for precise output. Due to its algorithmic complexities, DL often requires a huge amount of runtime. There are several DL methods including convolution neural network (CNN), deep belief networks (DBN) and deep auto encoder (DAE) etc. Ryu et al. [23] proposed two different DNN models to learn complicated relations between weather variables, date and previous consumptions for individual customers.

This DNN model is later used to produce a day-ahead forecast of 24 hours load profile from the past data observations. He et al. [24] proposed a model to forecast the hourly load of a power grid. Their model combines the co-movement analysis from Copula model with layer-wise pre-training-based deep belief network. A comparative analysis between Support Vector Regression (SVR), Neural Network (NN), Extreme Learning Machine (ELM) and classical DBN in both day-ahead and week-ahead forecasting shows better result in the proposed semi-parametric data-driven method. The major over-fitting issue of DL is addressed in a study of Shi et al. [25]. By increasing the diversity and volume of the data, they were able to address the over-fitting issues of DL. Their proposed novel pooling-based deep recurrent neural network (PDRNN) batches a group of customer load profiles into a pool of inputs. The proposed method consists of two stages: 1) load profiles pooling, and 2) household STLF with Deep-Recurrent Neural Network (D-RNN). This method has a prerequisite of sufficient diversity in customer load for learning common sharing uncertainties such as weather conditions and pertaining factors. Testing on 920 smart metered customers from Ireland, the proposed method outperforms ARIMA by 19.5%, SVR by 13.1% and classical D-RNN by 6.5% in terms of RMSE. A framework based on resident behavior [26] was designed with a long-short term memory (LSTM, a variant of RNN) based DL, which took appliance consumption sequence into consideration. A constant nature of the power grid network system is its high variability and volatility. To mitigate this challenge and reduce the errors of the forecasting of the electric parameters, DBN is combined with Copula Model [27] and bidirectional Recurrent Neural Network [28], [29] for day and week ahead forecasting. An LSTM based hybrid DL method was tested by Motepe et al. [30] for South African distribution network, showing an improved performance, considering the inclusion of temperature data. Deng et al. [31] devised a deep multi-scale CNN (MSCNN) with time cognition and a self-designed time coding algorithm, which outperformed recursive multi-step LSTM, direct multi-step MSCNN and the direct multi-step gated CNN by MAPE of 34.73%, 14.22% and 19.05% respectively. An improved DBN especially for Demand Side Management was designed by Kong [32] for STLF, outperforming autoregressive integrated moving average (ARIMA), Least Square SVM and conventional DBM with MAPE and RMSE of 3.864 and 341.601 respectively. RNN and its different approaches are widely used for short-term residential load forecasting [33]. LSTM was further integrated with Gated Recurrent Unit (GRU) for hybrid distribution feeder LTLF by Dong and Grumbach [34]. The model was scrutinized for an urban grid in West Canada, which exceeded the performance of ARIMA, bottom-up and feed-forward NN.

2) Extreme Learning Machine (ELM)

Huang et al. [35] reviewed that learning speed of feed-forward neural network is far slower than required and the main reasons behind this are: (1) the slow gradient-based learning algorithms that are extensively used to train neural networks, and (2) all the parameters of the networks are tuned iteratively. To overcome this problem, the authors proposed a new learning algorithm called extreme learning machine (ELM) for single-hidden layer feed-forward neural networks (SLFNs) which randomly chooses hidden nodes and analytically determines the output weights of SLFNs. ELMs are feed-forward neural networks with a single layer of multiple layers of hidden nodes. These hidden nodes can be randomly assigned and never updated (i.e. they are random projection but with nonlinear transforms), or can be inherited from their ancestors without being changed. Another interesting feature of this method is that these hidden layers need not to be tuned.

Fig. 6 shows the structure of ELM, depicting $\boldsymbol {m}$ number of inputs denoted as ${\mathrm { \boldsymbol { }} \boldsymbol {x}}_{ \boldsymbol {1}}{,}{\mathrm { \boldsymbol { }} \boldsymbol {x}}_{ \boldsymbol {2}}{,\ldots }\mathrm { }{ \boldsymbol {} \boldsymbol {x}}_{ \boldsymbol {m}}$ . These inputs traverse through $\boldsymbol {n}$ hidden layers. Each values from the input neurons are assigned a different weights ${ \boldsymbol {} \boldsymbol {\omega }}_{ \boldsymbol {1}}{,}{\mathrm { \boldsymbol { }} \boldsymbol {\omega }}_{ \boldsymbol {2}}\mathrm {,\ldots }{ \boldsymbol {} \boldsymbol {\omega }}_{ \boldsymbol {n}}$ from input to the hidden layer and linear weights ${ \boldsymbol {} \boldsymbol {\beta }}_{ \boldsymbol {1}}{,}{\mathrm { \boldsymbol { }} \boldsymbol {\beta }}_{ \boldsymbol {2}}\mathrm {,\ldots }{ \boldsymbol {} \boldsymbol {\beta }}_{ \boldsymbol {n}}$ from the hidden layer to acquire the predicted output $\boldsymbol {}{\mathrm { \boldsymbol { }} \boldsymbol {O}}_{ \boldsymbol {j}}$ . Its learning speed can be thousands of times faster than traditional feed-forward neural network learning algorithms like back-propagation (BP), which can obtain better generalization performance [35]. Also the ELM tends to reach the solutions straightforward without facing issues like local minima, improper training rate and over fitting etc. Li et al. [37] proposed a novel ensemble method for short-term load forecasting where wavelet transform, ELM and partial least squares regression are integrated. From different combinations of mother wavelet and number of decomposition levels the individual forecasters are derived. A parallel model consisting of 24 ELMs is invoked to predict the hourly load of the next day, for each sub-component from the wavelet decomposition. The individual forecasts are then combined to form the ensemble forecast using the partial least squares regression method. This proposed method has been tested using data from two electric utilities for 1 hour and 1 day ahead load forecasting and the output result have showed better forecasting accuracy than other state-of-the-art models. Ertugrul [38] proposed a new model incorporating ELM with RNN, naming the model Recurrent Extreme Learning Machine (RELM). RNN shows better results in the forecasting dynamic systems compared to the feed forward ANN model. This model shows great result as the training time is much lower and this method can be used for real-time dynamical systems for load forecasting. Zhang et al. [39] worked with the short-term load forecasting emerged on ELM under the supervision on Improved Gravitational Search Algorithm (IGSA), which is a combination of Particle Swarm Optimization and Gravitational Search Algorithm, where it is used to search the optimal set of input weights and hidden biases for the ELM. Li et al. [40] proposed a STLF method based on the improved extreme learning machine which is the capable of autonomously selecting the number of hidden neurons according to the group number of input samples, which makes the training error close to zero, and the test error is as tiny as possible. Chen et al. [41] proposes a novel short-term load forecasting method which is based on empirical mode decomposition (EMD) and ELM. EMD is an empirical approach to obtain instantaneous frequency data from non-stationary and non-linear data sets. The empirical mode decomposition is utilized to decompose the load series for capturing the complicated features of the electric load and to de-noise the data [42]. This method is tested on the state of New South Wales, Victoria and Queensland in Australia for half hourly electric load forecasting and the results clearly showed great improvement.

$FIGURE 6. - Structure of extreme learning machine, with ${m}$ input neurons choosing between ${n}$ hidden layer neurons proceeding towards the forecasting of output ${\mathrm {} \boldsymbol {O}}_{j}$ [36].$

FIGURE 6.

Structure of extreme learning machine, with ${m}$ input neurons choosing between ${n}$ hidden layer neurons proceeding towards the forecasting of output ${\mathrm {} \boldsymbol {O}}_{j}$ [36].

Show All

3) Multilayer Perceptron (MLP)

A perceptron is an algorithm that classifies input by separating two categories with a straight line. In other words, it is a linear classifier. Based on several real-valued inputs by forming a linear combination using its input weights, a perceptron generates a single output, $\mathbf {y}$ as shown in the following equation. $\begin{equation*} \mathrm {y}=\varphi \left ({\sum \nolimits _{i=1}^{n} {\mathrm {w}_{\mathrm {i}}\mathrm {x}_{\mathrm {i}}} \mathrm {+b} }\right){\mathrm {=\varphi (w}}^{\mathrm {T}}\mathrm {x+b)}\tag{1}\end{equation*}$ View Source where, $\mathbf {w}$ denotes the vector of weights, $\mathbf {x}$ is the vector of inputs, $\mathbf {b}$ is the bias and $\boldsymbol {\varphi }$ is the non-linear activation function.

An MLP is basically a deep, artificial neural network (ANN). Fig. 7 demonstrates the structure of an ideal MLP, having a three-layered structure. One is input layer which receives the signal, an output layer that makes a prediction about the input and in between those two, an arbitrary number of hidden layers that are the core computational machine of the MLP. It has the ability of manipulating the input space by adjusting weight matrices continuously between layers until the error between target value and the predicted value is minimized.

FIGURE 7.

Structure of an ideal MLP model with two hidden layers, taking the output error into consideration for enhancing the accuracy.

Show All

Ferreira and Da Silva [43] developed two nonparametric procedures for solving the problems of NN structure and input selection for STLF. Their proposed model can improve the outcome of MLP and RBFs. Ding et al. [44] compared between the Naïve model and NN model using MLP for STLF. The investigation shows that accuracy of the MLP model is 4.7% better than the Naïve model. Bokingkito et al. [45] used Multilayer Perceptron Neural Network model for week ahead load forecasting. The sigmoid activation function in resilient propagation showed the most efficient and least network error in the training. Interestingly, in the study of Kuo and Huang [46], the proposed deep neural network named DeepEnergy which outperformed MLP, RBF and several other popular machine learning technology. Due to the effectiveness of MLP in MTLF, Askari and Keynia [47] combined two search algorithms, i.e. Particle Swarm Optimization (PSO) and improved Ant-Lion Optimiser to design an MLP based model to solve the MTLF problem.

4) Self-Organizing Map (SOM)

The Self-Organizing Map also known as Kohonen network was initially developed by Kohonen [48] in 1982. It is a computational method for the analysis and visualization of high-dimensional data, a type of ANN that is trained by unsupervised learning to generate low-dimensional, discretized representation of the input space of training samples, called a map. SOM apply competitive learning as opposed to error correction learning e.g. BP with gradient descent and use a neighborhood function to preserve the topological properties of the input space. Consider first data items that are n-dimensional Euclidean vectors $\begin{equation*} \boldsymbol {x}\left ({\boldsymbol {t} }\right){=[} \boldsymbol {\xi }_{ \boldsymbol {1}}\left ({\boldsymbol {t} }\right){,} \boldsymbol {\xi }_{ \boldsymbol {2}}\left ({\boldsymbol {t} }\right){,\ldots,} \boldsymbol {\xi }_{ \boldsymbol {n}}\left ({\boldsymbol {t} }\right) \boldsymbol {]}\tag{2}\end{equation*}$ View Source where, $\boldsymbol {t}$ is the index of the data item in a given sequence.

Let the $\boldsymbol {i}$ -th model be represented by the following equation, $\begin{equation*} \boldsymbol {m}_{ \boldsymbol {i}}\left ({\boldsymbol {t} }\right){=[} \boldsymbol {\mu }_{ \boldsymbol {i1}}\left ({\boldsymbol {t} }\right){,} \boldsymbol {\mu }_{ \boldsymbol {i2}}\left ({\boldsymbol {t} }\right){,\ldots,} \boldsymbol {\mu }_{ \boldsymbol {in}}\left ({\boldsymbol {t} }\right) \boldsymbol {]}\tag{3}\end{equation*}$ View Source where, t denotes the index in the sequence in which the models are generated. This sequence is defined as a smoothing-type process in which the new value $\boldsymbol {m}_{ \boldsymbol {i}}\left ({\boldsymbol {t}+1 }\right)$ is computed iteratively from the old value $\boldsymbol {m}_{ \boldsymbol {i}}\left ({\boldsymbol {t} }\right)$ and the new data item $\boldsymbol {x}\left ({\boldsymbol {t} }\right)$ as: $\begin{equation*} { \boldsymbol {m}}_{ \boldsymbol {i}}\left ({\boldsymbol {t}+1 }\right){=} \boldsymbol {m}_{ \boldsymbol {i}}\left ({\boldsymbol {t} }\right)+ \boldsymbol {\alpha (t)}{ \boldsymbol {hc}}_{ \boldsymbol {i}} \boldsymbol {(t)[x}\left ({\boldsymbol {t} }\right){-} \boldsymbol {m}_{ \boldsymbol {i}}\left ({\boldsymbol {t} }\right) \boldsymbol {]}\tag{4}\end{equation*}$ View Source where, $\boldsymbol {\alpha (t)}$ is a scalar factor that defines the size of the correction; its value decreases with the step index $\boldsymbol {t}$ . The index $\boldsymbol {i}$ refers to the model under processing, and $\boldsymbol {c}$ is the index of the model that has the smallest distance from $\boldsymbol {x}\left ({\boldsymbol {t} }\right)$ in the Euclidean signal space. The factor ${ \boldsymbol {hc}}_{ \boldsymbol {i}}\left ({\boldsymbol {t} }\right)$ is a kind of smoothing kernel, also called the neighborhood function. It is equal to 1 when $\boldsymbol {i}= \boldsymbol {c}$ and its value decreases when the distance between the models mi and mc on the grid increases. Moreover, the spatial width of the kernel on the grid decreases with the step index $\boldsymbol {t}$ . These functions of the step index, which determine the convergence, must be chosen very exquisitely.

Fan et al. [49] used traditional SOM to learn time-series load data with weather information as parameters and to improve the accuracy of the prediction. The dataset used in this study consist of the daily peak electrical consumption and weather data in New York City and Long Island from July 1, 2001 to September 31, 2004. An extension of SOM algorithm based on error-correction rule is also used. The peak load is generated by averaging the output of all the neurons. Their model achieved minimum number of MAPE value 1.93% using $15\times 15$ number of neurons. In their study, López et al. [50] showed the effect of SOM neural networks in load forecasting. They also performed a deep and through analysis of real world prediction. In this study, data from Spain energy consumption from 2001 to 2010 has been used to validate the model. The proposed model forecasts daily market load with 2.32% MAPE value. Li et al. [51] proposed a localized Bayesian-Regularization NARX (Nonlinear Autoregressive models with Exogenous Input) Neural Network model combining with Self-Organizing Mapping, where SOM is utilized to extract the meteorological distribution and K-means is used to cluster the data according to the nearest mean. They also used the Bayesian-Regularization BP algorithm to train the target network structure which eventually improved the predictive ability of the localized NARX neural network. The proposed model is tested in Sydney using half hourly power system load and electricity prices data. The results showed a great accuracy in prediction which can prove beneficial both economically and socially.

B. Rule Based Methods

1) Fuzzy C-Means (FCM)

The Fuzzy C-means (FCM) clustering algorithm was initially developed by Dunn [52] and was later developed by Bezdek et al. [53]. FCM is a method of clustering which allows one piece of data to belong to two or more clusters. Clustering is a popular and widely used mathematical tools that attempts to discover structures or certain patterns in a dataset and the objects inside each cluster showing a certain degree of similarity. Among the clustering algorithms, one of the most popular algorithms, Fuzzy clustering algorithms and Fuzzy Set theory was first proposed by Zadeh [54]. He gave an idea of the uncertainty of belonging which was described by a membership function and the central idea in fuzzy clustering which is the non-unique partitioning of the data into a collection of clusters. FCM algorithm is based on minimization of the following objective function: $\begin{equation*} \mathrm {I}_{\mathrm {m}}=\sum \nolimits _{i=1}^{N} \sum \nolimits _{j=1}^{C} \mathrm {u}_{\mathrm {ij}}^{\mathrm {m}}\tag{5}\end{equation*}$ View Source where, $\boldsymbol {m}$ is any real number greater than 1, ${ \boldsymbol {} \boldsymbol {u}}_{ \boldsymbol {ij}}$ is the degree of membership of ${ \boldsymbol {} \boldsymbol {x}}_{ \boldsymbol {i}}$ in the cluster $\boldsymbol {m}$ , ${ \boldsymbol {} \boldsymbol {x}}_{ \boldsymbol {i}}$ is the ith of d-dimensional measured data, ${\mathrm { \boldsymbol { }} \boldsymbol {c}}_{ \boldsymbol {j}}$ is the d-dimension center of the cluster.

Radial Basis Function (RBF) network is superior to BP network because the ability to approach nonlinear function and the convergent speed are better the ones of BP network model [55]. Zhu and He [56] overcame the slow convergence speed and local minima problem of BP network algorithm by introducing the application of FCM based RBF model to short term load forecasting problem. The proposed algorithm was tested on the actual power load data and the results showed a good average percentage error of 4.04%. Yi et al. [57] proposed an ultra-short time forecasting based on FCM algorithm to increase flexible process capability of power systems. FCM algorithm was used to calculate the daily load change rate. This method showed more accurate and predictable result in the practical application of power grid company where the error analysis index showed much better result than the traditional prediction algorithm. Clustering is equally used for LTLF [58] and residential load forecasting [59] due to its improved fluctuation management algorithm.

2) Fuzzy Rule Base System (FRBS)

Fuzzy rule-base system (FRBS) is one of the great applications of fuzzy set or fuzzy logic. Fuzzy logic systems address the imprecision of the input and output variables by defining fuzzy numbers and fuzzy sets which that can be expressed in semantic variables. Fuzzy rule-based approach is based on verbally formulated rules which is overlapped throughout the parameter space. They use numerical interpolation to handle complex non-linear relationships. Fuzzy rules are linguistic IF-THEN- constructions that have the general form “IF $\boldsymbol {A}$ THEN $\boldsymbol {B}$ ”, where $\boldsymbol {A}$ and $\boldsymbol {B}$ are (collections of) propositions containing linguistic variables.

$\boldsymbol {A}$ is called the premise and $\boldsymbol {B}$ is the consequence of the rule. In effect, the use of linguistic variables and fuzzy IF-THEN-rules exploits the tolerance for imprecision and uncertainty. In this respect, fuzzy logic mimics the crucial ability of the human mind to summarize data and focus on decision-relevant information. As shown in Fig. 8, the raw output, called the crisp output is inserted in a process “Fuzzyfier” to ‘Fuzzify’ the crisp values for generating the fuzzy input. The Fuzzy rule base are determined by the user to the Fuzzy inference engine or software as per the requirement of the forecasting system, From the Fuzzy input, the inference engine follow the conditions to generate Fuzzy output, which are then converted to a crisp input by a “Defuzzifier” for real world application.

FIGURE 8.

Algorithmic structure of fuzzy rule base system, showing the conversion of crisp to fuzzy values through fuzzification and defuzzification process.

Show All

FRBS is the combination of two ideas; one is knowledge and another is reasoning. The approximate reasoning is the very popular fuzzy set that has multi-valued logic systems was introduced by Lukasiewicz in the 1930 and modified by Zadeh in the 1960 [54]. Ranaweera et al. [60] did a comprehensive exploratory investigation of the application of a Fuzzy Logic model for STLF problem. The proposed model showed a great forecast with a MAPE value below 2.3%. One of major problems faced when designing a fuzzy model is the finding of an optimized fuzzy rule base. Generally it is done by a process called trial-and-error which is time consuming and arduous. But Kang et al. [61] presented an approach to the evolutionary design of an optimum fuzzy rule base for modeling and control where evolutionary programming is used to simultaneously evolve the structure and the parameter for a fuzzy rule. The method’s capability in simultaneous handling of quantitative or qualitative information and uncertainties have been discussed by Khosravi et al. [62]. They have proposed the application of IT2FLS for STLF. Later, they have developed a model using Takagi-Sugeno-Kang fuzzy inference systems (TSK-FIS) to develop IT2-TSK-FLS hybrid algorithm for load forecasting. The experiment conducted on actual load data shows precisely approximated future load demands with an acceptable accuracy. Hassan et al. [63] presented a method of interval type-2 fuzzy logic systems (IT2FLS) for load forecasting where extreme learning machine (ELM) is used to tune the parameter of IT2FLS for chaotic and nonlinear data. This method used multi-inputs for proper input-selection along data pre-processing. To select the influential inputs a partial autocorrelation analysis was utilized. The time-delays of the dataset which have cabalistic coefficients are selected as input to the model. Ali et al. [64] proposed fuzzy logic model for LTLF based on the weather parameters (temperature and humidity) and historical load data. This model comprises the steps for data collection, constructing fuzzy interface, Fuzzifying input and output, assigning membership function, setting up Fuzzy rule base, building fuzzy logic models and simulations, and analyzing errors. The fuzzy logic model forecast a year-ahead load with a MAPE of 6.9% and efficiency of 93.1%. Another TSK based method used Fuzzy model across the entire domain, improving MAPE by 0.13% compared to seasonal autoregressive integrated moving average (SARIMA) and Gustafson-Kessel clustering [65].

3) Fuzzy Regression (FR)

Regression-based methods such as autoregressive (AR) models, autoregressive moving average (ARMA) models, and autoregressive integrated moving average (ARIMA) models are usually applied to make short-term load forecasting [66]. Fuzzy regression is introduced in order to overcome some of the limitations of linear regression, such as the vague relationship between the dependent variable and the independent variables, insufficient numbers of observations, and hard-to-verify error distributions. The fundamental difference between the assumptions of the two techniques relate to the deviations between the observed and estimated values: linear regression assumes that these values are supposed to be errors in measurement or observations, while fuzzy regression assumes that they are due to the indefiniteness of the system structure [18]. Many interesting models based on fuzzy regression for electric load forecasting are already described in the available literature. Song et al. [67] developed the forecasting model for short-term prediction using Tw-based fuzzy arithmetic operations. In [68], several fuzzy linear regression models were compared in forecasting performance on electric load data. For oil consumption forecasting the FLR served also very well [69].

More complex model for time series forecasting with several pre-processing and post-processing procedures, including fuzzy regression, was proposed in [70]. This paper presents a fuzzy polynomial regression method with data selection based on Mahalanobis distance incorporating a dominant weather feature for holiday load forecasting. Selection of past weekday data relevant to a given holiday is critical for improvement of the accuracy of holiday load forecasting. In the paper, a data selection process incorporating a dominant weather feature is also proposed in order to improve the accuracy of the fuzzy polynomial regression method. The dominant weather feature for selection of historical data is identified by evaluating mutual information between various weather features and loads from season to season [71].

This application-oriented paper proposes a fuzzy interaction regression approach to STLF. Through comparisons to three models (two fuzzy regression models and one multiple linear regression model) without interaction effects, the proposed approach shows superior performance over its counterparts [72].

A possibilistic linear function can be defined as: $\begin{equation*} Y=A_{1}X_{1}+A_{2}X_{2}+\ldots.. A_{n}X_{n}=Ax\tag{6}\end{equation*}$ View Source where, ${x_{n}}$ is non-fuzzy. $A_{n}$ is a symmetric fuzzy number denoted by ${(\alpha _{i}, c_{i})_{L}}$ , with $\alpha _{i}$ as the center and ${c_{i}}$ as the spread. In this paper, we assume that the reference function $L(x)$ = max(0, 1− |x|). The type of fuzzy parameter $A{i}$ is a symmetrical triangular fuzzy number $\begin{equation*} \mu _{Ai} {\it (ai)} = L ((a_{i} - \alpha _{i})/c_{i})\tag{7}\end{equation*}$ View Source where ${c_{i}} >0$ . The possibilistic linear function $Y$ = Ax is obtained by the following membership function: $\begin{align*} \mu _{Y}(y)=\begin{cases} \mathrm {L((y-}x^{T}a_{i}\mathrm {/}c^{T}\left |{ x }\right |\mathrm { },& x\ne 0 \\ 1,& x\ne 0,y=0 \\ 0,&x=0,y\ne 0 \\ \end{cases}\tag{8}\end{align*}$ View Source where, —x — = (— ${{x_{1}}}$ —, $|x_{2}$ —,…, — ${x_{n}}|)^{T}$ .

Identification of the parameters of the fuzzy linear regression model can be formulated as a linear programming problem: $\begin{align*}&\min _{\alpha,c}{J\left ({c }\right)}=c^{T}\left |{ x }\right | \\&s.t ~y_{i}\le \left |{ L^{-1}\left ({h }\right) }\right |c^{T}\left |{ x_{i} }\right |-x_{i}^{T}\alpha, \\&\hphantom {s.t ~}-y_{i}\le \left |{ L^{-1}\left ({h }\right) }\right |c^{T}\left |{ x_{i} }\right |-x_{i}^{T}\alpha, \\&\hphantom {s.t ~}c\ge 0, \\&\hphantom {s.t ~}i=1,2,\ldots \ldots,N,\tag{9}\end{align*}$ View Source where, h is the threshold to control the width of the spread, and $0\le h < 1$ .

Vantuch and Prílepok [73] proposed an innovative algorithm entitled as ensemble of fuzzy linear regression (EFLR) and it bases on fuzzy linear regression combined with boosting mechanism. The fuzzy linear regression is optimized making use of multi objective optimization. The original data of electric load patterns are involved in order to develop and evaluate a load forecasting model as an experimental application of EFLR. The comparison of EFLR with basic fuzzy linear regression revealed improvement of more than 2% in all measures which proves the necessity of ensemble-based approach in the fuzzy linear regression. Luy et al. [66] studied on handling the problem that is caused by the growing knowledge base, and improves the load forecasting performance of fuzzy models through nature-inspired methods. The proposed models have been optimized by using ant colony optimization and genetic algorithm (GA) techniques. The training and testing processes of the proposed systems were performed on historical hourly load consumption and temperature data collected between 2011 and 2014. The results show that the proposed models can sufficiently improve the performance of hourly short-term load forecasting. The mean absolute percentage error (MAPE) of the monthly minimum in the forecasting model, in terms of the forecasting accuracy, is 3.9% (February 2014). The results show that the proposed methods make it possible to work with large-scale rule bases in a more flexible estimation environment. Although fuzzy regression has been tried for STLF for about a decade, most research work is still focused at the theoretical level, leaving little value for practical applications. A primary reason is that inadequate attention has been paid to the improvement of the underlying linear model [72].

C. Metaheuristic Methods

1) Artificial Bee Colony (ABC)

Artificial Bee Colony is a swarm based meta-heuristic algorithm that was first developed by Karaboga [74] in 2005. Karaboga developed this model based on the model proposed by Tereshko and Loengarov [75] for the foraging behavior of honey bee colonies. The minimal model of forage selection that leads to the emergence of collective intelligence of honey bee swarms consists of three essential components: food sources, employed foragers, and unemployed foragers. It is an optimization tool which is developed by the intelligent behavior of honey bees where individuals called food positions are modified by the artificial bees with time and the aim of the bees is to discover the food sources with high nectar amount and finally the one with the highest nectar. Honey bee colony is the combination of employed bees, onlookers and scouts where number of employee bee same as the food source. The employee bee first calculates the nectar amount and gives this information to the onlooker’s bee. Onlookers bee find the best solution of source, and then the scout bees go to collect the honey [74]. ABC algorithm was developed based on this basic working principal of honey bee where first half swarm is consider employee bee and least of the half is Onlookers bee. Employed bees perform the exploitation process in the search space and the scout’s control the exploration process [76]. It has the great ability to get out of a local minimum problem. Fig. 9 demonstrates the flow of command for the ABC algorithm. After the initialization of the parameters, employed, onlooker and scout bees are updated to remember the best source of food. The search iterates until it meets the termination criteria to choose the best solution. One of the great limitations of this algorithm is convergence performance of ABC for local minimum to be slow [77]. Safamehr and Rahimi-Kian [78] developed cost efficient method to solve the proposed two-objective optimization problem of a micro grid using intelligent demand-response program such as ABC. Çevik et al. [79] proposed swarm intelligence viz. ABC and PSO algorithm for load forecasting on smart grids. They used previous data of four years (2009-2012, Turkey) for training and validation where three years data were used for creating model and one year data were used for validation. Load profile divided into four parts as winter, spring, summer and autumn.

FIGURE 9.

Algorithmic flowchart of artificial bee colony, the employed, onlooker and scout bees are updated to save the best food source till the termination criteria is satisfied.

Show All

2) Artificial Immune System (AIS)

The Artificial Immune System (AIS) was first introduced in the mid-1980s with articles authored by Farmer et al. [80] and later developed by Bersini and Varela [81] on immune networks. According to Rowe [82], the IS can function as “second brain”, as it can generate responses to new and novel networks. Artificial Immune System is a mathematical and computer modeling of immune systems, or the abstraction of immunology-related principles into algorithms to address systematic goals. Immune System (IS) has an adaptive response that enables it to learn protein structures that characterize pathogens it encounters, and remember those structures [83]. This helps the IS to response to the same pathogens quickly and efficiently in future [84]. Development of AIS can be seen as having two target domains: the provision of solutions to engineering problems through the adoption of immune system inspired concepts; and the provision of models and simulations with which to study immune system theories [85]. However in the mid-1990s, AIS became a field in its own right and initial articles were published by Forrest et al. [86] and Kephart [87]. Also, in an article, Dasgupta [88] made an elaborate overview on Artificial Immune Systems and its application. According to Fig. 10, an initialization of antibody population led the algorithm to perform clonal proliferation. Affinity of each mutated clone is evaluated. The tournament for further proliferation is selected based on the aging operation. The iteration proceeds until the stopping rule is met. Yong et al. [89] propounded a load forecasting method using AIS which is the combination of Immune Network Regulation and Immune Programming.

FIGURE 10.

Algorithmic flowchart of artificial immune system. After starting the antibody population and affinity evaluation, the clonal operation is conducted until the stopping criteria is met.

Show All

AIS algorithm is used to sustain the diversity of population and defeated the deficiency of premature phenomenon so that it enhances the speed of searching and precision of optimization. This model showed the MAPE of 2.038%. A load note of a certain 10 KV distributing net in Beijing and its data are processed as the example. The proposed method of this study has been validated by the historical load data used in this study.

Dudek [90] proposed an electricity load forecasting method to mitigate gap between the demand with generation where AIS is used to train to recognize the antigens. It gives the one-day ahead power system load forecasting. The complete learning process of antibody illustrates overlapping clusters of same antigens where solving the problem is considered as antigen and solution of the problem can be considered as antibody.

3) Genetic Algoritm (GA)

Inspired by Charles Darwin’s theory of natural evolution, Genetic Algorithms (GAs) are adaptive heuristic search algorithm which is used for solving both constrained and unconstrained optimization problems. GAs are frequently used to produce high-quality solutions to optimization and search problems based on bio-inspired operators such as mutation, crossover and selection. This method was initially discovered by Holland [91] of University of Michigan. Three main types of rules are used by Genetic Algorithm at each step to create the next generation from the current population such as:

Selection rules: select the individuals known as parents which contribute to the population of the next generation.
Crossover rules: combine two parents to form children for the next generation.
Mutation rules: apply random changes to individual parents to form children.

Fig. 11 presents the order of the algorithm. The data or the population, upon initialization, proceeds to the evaluation process, where specific populations are selected for a crossover to generate a new population with combined characteristic of the previous population. After crossover and mutation, the new population is fed back for evaluation, until the evaluation meets the cherished result.

FIGURE 11.

Algorithmic flowchart of genetic algorithm, showing the order of operation to generate a new population by selection, crossover and mutation to be fed back for evaluation.

Show All

Ghareeb and El Saadany [92] presented a new variant of genetic programming, namely Multi-Gene Genetic Programming (MGGP) for STLF. In MGGP, each individual of the population is a weighted linear combination of sparse trees and each tree in this combination can be considered as a gene. This model outperformed other models like standard GP and RBF network with lowest average MAPE value 1.5716%.

Table 3 shows the comparative study between the single load forecasting techniques.

TABLE 3 Comparative Study Among Single Load Forecasting Method

SECTION VI.

Hybrid Models With Two Methods

Single methods for load forecasting often comes with several types of disadvantages including computing efficiency, computational complexity, and high error percentage. Over the years, researchers have been working on building hybrid load forecasting methods and models to get better accuracy with minimum error rate. Hybrid models are generally a combination of two or more single methods where each method helps to make the forecasting more accurate and efficient. The single methods in a hybrid model is chosen according to its best advantages to contribute to the forecasting. SVM and ANN are the two most popular methods that are being hybridized with other single methods to gain the best load forecasting model with minimal error percentage. SVM works well with unstructured and semi structured data to gain the best output. Kernel trick is the real strength of SVM’s model where an appropriate kernel function can solve any complex problem. When the optimality is rounded, SVM can produce a unique solution. This forms a fundamental difference of SVM and Neural Networks, which produce multiple solutions based on local minima, which makes them not trustable over different samples. On the other hand ANN can learn and generalize from training data so there is no need for vast feats of programming. Like the “graceful degradation” found in biological systems, ANNs are particularly fault tolerant. Overall, there adaptive learning capability based on the data given for training made ANN one of the most popular method for load forecasting. This section reviews and shows comparison between the hybrid models developed based on SVM and ANN to maximize the best forecasting result. Also hybrid models with more than two single methods are discussed here with comparison between them.

A. Support Vector Machine (SVM)

Support Vector Machine is one of the popular machine learning algorithms which was initially developed by Cortes and Vapnik [93] in 1995. Support Vector Regression (SVR) is a methodology to train the SVM where a function is estimated using observed data and is used for many machine learning tasks including time series prediction, regression analysis etc. In a given training data set: $\begin{equation*} \left \{{\left ({X_{1},Y_{1} }\right),\ldots,\left ({X_{i},Y_{i} }\right),\mathrm { }\ldots,\left ({X_{1},Y_{1} }\right) }\right \}\subset R_{n}\times R\tag{10}\end{equation*}$ View Source

In SVM for linear regression (SVR), the following function is to be estimated, $\begin{equation*} f\left ({x }\right)=< w,x>+b\mathrm { }\quad w,x\in R_{n},~b\in R\tag{11}\end{equation*}$ View Source

Here the regularized risk function $R$ is being minimized, $\begin{equation*} R\left ({w,b }\right)=C\frac {1}{l}\sum \nolimits _{i=1}^{n} {L(Y_{i}},f( X_{i}))+\frac {1}{2}\left \|{ W }\right \|^{2}\tag{12}\end{equation*}$ View Source where, the former term indicates the empirical error measured by loss function $L$ . $C$ is called the regularization constant which determined tolerated deviations from the loss function. The latter term consists of a regularizer $W$ , which will make the function as flat as possible to control the function capacity.

SVM has been a popular choice among the machine learning enthusiasts since 1990. SVM is defined by a convex optimization problem (no local minima) for which there are efficient methods (e.g. Sequential Minimal Optimization). Besides SVMs deliver a unique solution, since the optimality problem is convex. This is an advantage compared to NNs, which have multiple solutions associated with local minima. Computationally expensive and slow running is one of the common drawbacks of SVM. To obtain the global minimum for load forecasting, SVM is an assuring learning tool. By providing a necessary parameter for the user-selected kernel function, this unique method can be optimized with the different type of kernel function and all the parameters of SVM. Some of the most popular hybrid methods integrating SVM with other single methods are:

SVM and Broyden-Fletcher-Goldfarb-Shanno Firefly Algorithm
SVM and Harmony Search Algorithm
SVM and Fruit Fly Optimization Algorithm
SVM and Genetic Algorithm
SVM and Particle Swarm Optimization
SVM and Artificial Bee Colony
SVM and Simulated Annealing Algorithm

Fig. 12 presents the commonly used hybrid two-method models comprising of SVM. These methods are presented in the following subsection with their uses in load forecasting technology, the performance of which are further compared on the basis of their accuracy.

FIGURE 12.

Pictorial representation of notable hybrid models with two methods based on support vector machine that are explained in the subsection.

Show All

1) SVM and Broyden–Fletcher–Goldfarb–Shanno Firefly Algorithm (SVM-BFGSFA)

Firefly Algorithm (FA) is one of the recent swarm intelligence methods developed by Yang [94] in 2008 and is a kind of stochastic, nature-inspired, meta-heuristic algorithm that can be applied for solving the hardest optimization problems including load forecasting. Its unique ability to search and find the global and local optima concurrently made it a very popular algorithm for forecasting. This algorithm is based on the characteristics of the flashing light of fireflies. The flashing light produced by the fireflies by biochemical process of bioluminescence which helps to characterize them. Kavousi-Fard et al. [95] combined a Modified Firefly Algorithm (MFA) and the SVR model to enhance both the search ability and the convergence of FA. As a case study, this work used half-hourly electrical power consumption data of New South Wales, the state of Victoria, and the State of Queensland in Australia to validate the predictability of the proposed combined model. To overcome the weaknesses of FA which are slow convergence speed and the imprecise accuracy of convergence, Xiao et al. [96] proposed a new combined model where they integrated multiple seasonal patterns, several neural networks, non-positive constraint theory and Broyden–Fletcher–Goldfarb–Shanno Firefly Algorithm (BFGSFA) where it has shown great result compared to the other single models. To retain the advantages of FA, such as powerful global exploration and exploitation abilities, and overcome the weakness of FA during the latter period of optimization, such as the slow convergence speed and the imprecise accuracy of convergence. BFGS is used when FA updates solutions in an iteration to find a local optimal solution to enhance the local optimization ability and the speed of the local convergence of the whole algorithm. This combined model can be useful to forecast electric load, power scheduling and can avoid power grid collapse and reduction of the spinning reserve capacity of thermal power plants. The average MAPE values of the combined model were 0.7138%, 1.0281%, 4.8394%, 0.9239%, 9.6316% and 7.3367% lower than other models e.g. Back-Propagation Neural Network, GA based ANN, Wavelet based NN, GRNN, ARIMA and Random Walk.

2) SVM and Harmony Search Algorithm (SVM-HS)

Harmony Search (HS) is a meta-heuristic optimization search algorithm inspired by a musician effort to search for the better harmony [97]. It is called during the training phase to select the optimum parameter. In order to develop the process of determining the parameters of STLF model harmony search was adopted to LS-SVM by Zeng et al. [98] Their results showed the improvement in solution quality and a higher training speed. Interestingly, the MAPE value of the proposed methods is 0.77% lower than PSO while the training time and speed are 36.7% and 2.59% higher, respectively. From their study, it is being concluded that through the analysis of load influencing factors with GRNN, the accuracy of the forecasting model (HS-LS-SVM) can be improved. The key factors were selected at first by GRNN as the economic and non-economic factors affect the STLF. The proposed method achieves higher precisions and faster speed than BPNN, LS-SVM and PSO, and its correctness and efficacy are also verified.

3) SVM and Fruit Fly Optimization Algorithm (SVM-FOA)

Pan [99] proposed Fruit Fly Optimization Algorithm (FOA) at first, based on food process of the fruit fly. The shorter program code made it popular compared to other optimization algorithm. FOA algorithm has been applied to various forecasting applications including load forecasting, as it can reach the global optimal solution fast. Li et al. [100] have proposed a hybrid model SVM-FOA in their work. Compared to other heuristic optimization algorithm such as GA and Stimulated Annealing (SA), SVM-FOA, showed great results by minimum searching time to locate the global optimum including decent forecasting error (~3%) for annual load. In another work, Cao and Wu [101] proposed a novel hybrid approach to forecast monthly electricity consumption, combining SVM with FOA. In this study, monthly electrical load consumption of China has been used since January, 2010 to December, 2015. This 72 months of dataset has been divided into training set (the preceding 60 months) and the testing set (last 12 months). The results shows that FOA has outperformed PSO algorithm in some cases.

4) SVM and Genetic Algoritm (SVM-GA)

GAs are well suited to the concurrent manipulation of models with varying resolutions and structures since they can search non-linear solution spaces without requiring gradient information or a priori knowledge of model characteristics [102]. It provides an optimization solution to find a computer program of unspecified size and shape to solve, or approximately, a problem. GA has the ability to search through a space to find the nearly optimal solution. Fitness function is evaluated to measure the computer solving speed of a given problem [103]. One of the advantages of GA is that, it search parallel from a population of points, therefore, it has the ability to avoid being trapped in local optimal solution like traditional methods e.g. regression. It uses probabilistic selection rule rather than deterministic ones. GA works on the chromosome, which is encoded version of potential solutions’ parameters, rather the parameters themselves. It uses fitness score which is obtained from objective functions, without other derivative or auxiliary information. In his work, Lee et al. [104] has shown that Genetic Programming (GP) model outperforms the results of regression methods to solve the LTLF problem.

GA has been hybridized with SVM to optimize the SVR parameter values in many studies. Hsu et al. [105], at first proposed a GA-SVR model to overcome the problem of SVR parameters. SVR has been integrated with an improved adaptive genetic algorithm (IAGA) [106] to optimize the ratio values of meteorological factors and electricity cost, outperforming state-of-the-arts. A separate hybrid genetic-based SVR model (HGASVR) has been proposed by Wu et al. [102]. The dataset used in this study was downloaded from EUNITE network. It consists of all-half-hour electricity values and average temperature value for 1997 and 1998. Additionally, the holidays of these two years also included in the dataset. To achieve better forecasting accuracy their proposed model can automatically optimize the SVR parameter integrating the real-valued genetic algorithm and inter genetic algorithm. In their study, it is proposed that the Poly kernel function could be an appropriate choice of SVR Kernel function in forecasting daily electricity load. The Poly kernel function might outperform the RBF kernel function in a non-linear electricity load forecasting problem. The optimal MAPE, RMSE and maximum error by HGASVR are 0.76, 7.73 and 20.88 respectively.

5) SVM and Particle Swarm Optimization (SVM-PSO)

PSO was initially developed by Kennedy and Eberhart in 1995 [107]. It is similar to GA in that sense that the system is initialized with a population of random solutions [108]. It has a unique feature of storing all the good solutions of all the particles. For non-linear problems, a few parameters can be adjusted. Unlike GAs and SA, it has a memory storage function. Hong [109] proposed a hybrid SVM-PSO algorithm to locate optimal parameters of SVR model which can be used for load forecasting. It shows better result than SVM-GA and SVM-SA. One of the biggest drawbacks of PSO is that unlike GA and SA, it cannot overcome the local minimization drawback efficiently. Jiang et al. [110] designed an SVR based forecaster with a two-step hybrid parameter optimization scheme with Grid Traverse Algorithm for global to local parameterization, and with PSO for determining the best local parameters. Their method showed better performance compared to ARIMA (11.21%), SVM-GA (5.27%) and ANN (6.62%) with MAPE value of 2.53%.

6) SVM and Artificial Bee Colony (SVM-ABC)

ABC was applied for searching the global minimum of three well-known test functions (Sphere function, Rosenbrock valley and Rastrigin function) [74]. It can make a good balance between global and local searches by conducting both in each iteration instead of initiating global search at the beginning and the local search at the end stage of PSO [111]. For better parameter determination, Duat et al. [112] proposed a modified ABC algorithm which can provide a better accuracy of an electric load forecasting model based on Least Square Support Vector Machine (LS-SVM).

7) SVM and Simulated Annealing Algorithm (SVM-SA)

Simulated annealing algorithm is an optimization technique, analogous to the annealing process of material physics. SVMs have been employed to solve nonlinear regression and time series problems. Pai and Hong [113] applied SA algorithm to choose the parameters of a SVM model. The SVM-SA model performs structural risk minimization rather than minimizing the training errors. This method outperforms the ARIMA and GRNN model in generalized performance. Wang et al. [114] proposed a new optimal model Stimulated Annealing Particle Swarm Optimization Algorithm that combines the advantages of PSO and SA algorithm. The model is able to enhance the accuracy and improved the convergence ability and reduced operation time by numerical experiment.

Table 4 presents a comparative study of the advantages and disadvantages between the SVM based hybrid methods. A comparative analysis of the MAPE of these methods are presented in Table 5. Also the value of these metrics are different for different datasets and parameters. Therefore it is quite difficult to compare the results of different techniques. Also, there is no such task where all the methods are experimented in a single dataset, to find the comparison between them. In this research work, the best accuracy of each discussed method are tabulated in the subsequent sections from different forecasting methods.

TABLE 4 Comparative Study of SVM Based Hybrid Methods

TABLE 5 Comparative MAPE, RMSE Values and Factors of SVM Based Hybrid Model

B. Artificial Neural Network (ANN)

ANNs are processing devices (algorithms or actual hardware) that are loosely modeled after the neuronal structure of the mammalian cerebral cortex but on much smaller scales. A large ANN might have hundreds or thousands of processor units, whereas a mammalian brain has billions of neurons. ANNs provide an analytical alternative to conventional techniques which are often limited by strict assumptions of normality, linearity, variable independence etc. Because an ANN can capture many kinds of relationships it allows the user to quickly and relatively easily model phenomena which otherwise may have been very difficult or impossible to explain otherwise. ANN is popular choice for load forecasting technique. A recurrent type ANN, proposed by Baek [118], have shown an MAPE value of 1.57% for MTLF in South Korea. In this study, dataset includes daily load consumption, real-time temperature, weather, and day type from July, 2011, a week before 168h load data as the training set and July 8th, 24 hour of the same year as test set. The power consumption value has been taken in 15 minutes interval same as temperature record. Whereas, the day and weather type recorded once a day.

MLP, another architecture of ANN model was employed by Park et al. [119] to STLF. One of the major disadvantage of this method is it takes large MLP structure to train the real data and it also raise redundancy issues. Some of the most popular hybrid methods integrating ANN with other single methods are:

ANN and Fruit Fly Optimization Algorithm (ANN-FOA)
ANN and Firefly Algorithm (ANN-FA)
ANN and Clustering Techniques (ANN-CT)
ANN and Neural Fuzzy Interference System (ANN-NFIS)
ANN and Artificial Immune System (ANN-AIS)
ANN and Wavelet Transform (ANN-WT)
ANN and Particle Swarm Optimization (ANN-PSO)
ANN and Genetic Algorithm (ANN-GA).

Fig. 13 presents the commonly used hybrid two-method models comprising of ANN. These methods are presented in the following subsection with their uses in load forecasting technology, the performance of which are further compared on the basis of their accuracy.

FIGURE 13.

Pictorial representation of notable hybrid models with two methods based on artificial neural network that are explained in the subsection.

Show All

1) ANN and Fruit Fly Optimization Algorithm (ANN-FOA)

Pan et al. [99] discussed about the basic fundamental part of the FOA algorithms which helps to find the maximal value and minimal value. The searching viability of FOA algorithm is a help for parameter selection of neural network. Li et al. [120] combined GRNN and FOA for annual power load forecasting where FOA is used to determine the spread parameter value of the GRNN model. The MAPE and MSE values of their proposed model were 1.149% and 1.421 respectively which is also compared with the other five forecasting models namely FOA-GRNN, GRNN, PSO-GRNN, Sparse Approximation Scheme for Least Square SVM, and Ordinary Least Squares-Likelihood Ratio. It was found that FOA-GRNN shows the best performance compared to other methods.

2) ANN and Firefly Algorithm (ANN-FA)

In ANN-FA method, FA is used to create the nonlinear mapping and ANN works to achieve the learning ability. It helps to develop relatively efficient and accurate forecasting model, but shows a poor performance with respect to RMSE. Kavousi-Fard et al. [121] combined FA and ANN to develop a more efficient and accurate forecasting model where FA was used to create the nonlinear mapping and ANN works to achieve the learning ability. Their proposed hybrid MFA and ANN model mathematical expression is $\begin{align*} X_{k}=&{[w_{i,1},w_{i,2},\mathrm { }\ldots,w_{i,Mw},\mathrm { }b_{i,1},b_{i,2},\ldots,b_{i,Mb}]}_{(1,M)} \qquad \\ M=&M_{w}+M_{b}\tag{13}\end{align*}$ View Source where, $X_{k}$ is the randomly selected firefly from a set of fireflies, $w_{i}$ is the adjusting coefficient for the ANN arranged in an $M$ dimensional row vector, and $M_{w}$ , $M_{b}$ are the number of weighting and biasing factors, respectively. In addition, by comparing ANN-MFA method with the other methods showed that the proposed method results very low forecasting value of 2.0322 which is a more suitable result than the others. Another study was done by Xiao et al. [96] where he proposed a combined model of FA neural network where sufficient work is done to reduce the error of the load forecasting.

3) ANN and Clustering Technique (ANN-CT)

Clustering Technique (CTs) was developed by Driver and Kroeber in 1932.The major advantage of CT is suitable for datasets with compact spherical clusters that are well-separated. Hernández et al. [122] discussed about the STLF using two different data set for microgrid where they amalgamated the K-means clustering algorithms with MLP. This model shows MAPE value for the data Set A 15.34% and data Set B 16.69%. Quilumba et al. [123] used the CT to group customers behavior based on similarities in consumption. The average MAPE value was reduced by 1.7%, increasing the number of clusters from one to four and showed a good forecasting accuracy. Fahiman et al. [124] combined the DL and clustering methods to improve the accuracy of load forecasting where clustering is used to identify the customers behavior into sub-populations based on similar demand profiles. This model used K- clusters $C=\{c_{1},c_{1},\mathrm { }\ldots,c_{k}\}$ to train a neural network with k prediction models $F_{C}=\{F_{c_{1}},\mathrm { }F_{c_{2}},\ldots,\mathrm { }F_{c_{k}}\}$ as: $\begin{equation*} {F}_{aggr}=\sum \nolimits _{i=1}^{k} F_{c_{i}}\tag{14}\end{equation*}$ View Source

For real-time forecasting, the DNN based K-clusters forecasting model shows the best accuracy among the four proposed methods discussed by the author. The author suggest to use dynamic clustering methods for real-time forecasting.

4) ANN and Neural Fuzzy Inference System (ANN-NFIS)

This forecasting method is a combination of ANN with the neural fuzzy inference system (NFIS). ANN is a method to obtain an approximate solution through Artificial Intelligence techniques, which is based on the imitation of the functioning of brain [125]. ANNs are organized as layers, i.e. input layer, hidden layer and output layer. Fuzzy inference use the FL to process the mapping from given input variable to an output variable. Using fuzzy logic (IF-THEN) $\backslash$ rules, the variables are matched and the response is acquired through fuzzy implications [126]. The researchers combine both the techniques to enrich the performance by controlling decision-making and analyzing data. To find the best parameters, fuzzy classification technique can help the neural network by grouping the rule base. This is extremely helpful to forecast with irregular datasets which is usual in most of the cases of forecasting. Neuro-fuzzy modeling is also easy to incorporate human expertise about the target system directly into the modeling process [127], [128]. Khotanzad et al. [129] combined FL along with ANNs to develop a two-stage fuzzy logic where it is found that the MAPE is less than the Artificial Neural Network based Short Term Load Forecaster.

5) ANN and Artificial Immune System (ANN-AIS)

The advantage of the immune system is that it has the powerful information processing capabilities which made the system highly parallel. This system is inspired by the immunology of human body. Immune Algorithm (IA) was applied by Yong et al. [89] to design back propagation neural network (BPNN). It was named as Artificial Immune Network (AIN). This MAPE value of this model is 2.52% whereas the AIN MAPE is 2.038%. A new model was developed by Hamid and Rahman [130] where the ANN is trained by the AIS. It is capable to provide a comparable forecast of ANN with BP.

6) ANN and Wavelet Transform (ANN-WT)

Wavelet is a complement to classical Fourier decomposition method. Wavelet transforms allow the components of a non-stationary signal to be analyzed and the filters to be constructed for stationary and non-stationary signals. To decompose the loads into multiple frequency components WT is prescribed in various works. Jawerth and Sweldens [131] focused on multi-resolution analysis of WT. Bashir and El-Hawary [132] used the back propagation algorithm for solving short-term load forecasting and build a structure of wavelet neural network. This technique showed the average percentage error is less than the individual ANN. Zhang and Dong [133] included Artificial Neural models along with wavelet transformation to show that the useful information can be captured on various time scales. The proficiency of this method has been tested by the Australian electricity market to predict electricity demands in short term data series and showed promising results. Guan et al. [134] integrated wavelet neural networks (WNN) with data pre-filtering. Li et al. [135] utilized wavelet decomposition to reduce the non-stationary load sequence and performed an augmented Dickey-Fuller test to determine the stationary components of the decomposition. The forecasting of the wavelet features was analyzed using a second-order gray neural network for performing STLF with MAPE value of 2.41% for 9 layers. Wavelet based decomposition are used for the construction of probabilistic load forecasting models, combined with tree based Random Forest and quantile regression forests as proposed by Alfieri and De Falco [136]. El-Hendawi and Wang [137] forecasted the load of Ontario, Canada and successfully reduced 20% of MAPE value with the ensemble method of ANN-WT in comparison to conventional NN.

7) ANN and Particle Swarm Optimization (ANN-PSO)

El-Telbany and El-Karmi [138] combined the PSO with ANN where PSO used to train the feed forward network for load forecasting of the Jordanian electricity system. The authors found that PSO algorithm performed better than the BP algorithm. Due to the over trained and complexity, BP (trained by Neural Network) showed the poor performance. Another application of ANN-PSO is made by Liu et al. [139]. They applied PSO to optimize the parameter of the ANN. The superiority of PSO is that from a set of possible alternatives it can find best element and also computationally inexpensive, easily implemented, and does not require gradient information of the objective function but only its values. Zhang and Ma [140] integrated the PSO algorithm and the RBF neural network algorithm where PSO used to optimize the weights and RBF neural network to learn the accuracy. This model showed better performance the typical RBF network.

8) ANN and Genetic Algorithm (ANN-GA)

Ling et al. [141] proposed a novel neural network based on GA algorithm. The proposed network’s parameter can be tined by a GA with arithmetic crossover and non-uniform mutation. This model has a two activation function namely static activation function (SAF) and dynamic activation function (DAF). If $v_{ij}$ is the synaptic connection weight from the $i$ -th input node $x_{i}$ to the $j$ -th neuron, $n_{h}$ is the nodes in hidden layer, ${net}_{s}^{j}(\cdot)$ is the $j$ -th SAF, ${net}_{d}^{j}(\cdot)$ is the $j$ -th DAF, and ${net}_{o}^{l}(\cdot)$ is the activation function for $l$ output neurons, $m_{d}^{j}$ and $\sigma _{d}^{j}$ are the dynamic mean and dynamic standard deviation for the $j$ -th DAF, their proposed model mathematical expression for the daily electric load forecasting is: $\begin{align*}&\hspace {-.5pc} y_{l} ({t})={net}_{0}^{l} \left({\sum \limits _{j=1}^{n_{h}} {net}_{d}^{j} \left({net}_{s}^{j} \left({\sum \limits _{i=1}^{24} {x_{i}v_{ij}} }\right),m_{d}^{j},\mathrm { }\sigma _{d}^{j}\right)}w_{jl}\right) \\&\qquad \qquad \qquad \qquad \qquad \qquad \qquad {\displaystyle {l=1,2,\ldots, 24. }}\tag{15}\end{align*}$ View Source

This model showed the MAPE value is less than the traditional neural model, also providing the training information which is better on proposed model. An integration study was performed by Azadeh et al. [142] to developed a logarithmic-linear model which was later studied on the Iranian agriculture sector. The author proposed ANN model MAPE value which is much better than time series model. Ventura et al. [143] studied on which parameters are best for the implement the integration between GA and ANN. Previously mentioned PSO and ANFIS has also been integrated with GA and Gaussian process regression to optimize the input of the ANFIS based PV power forecasting for the plants [144].

Table 6 presents the comparative study of the advantages and disadvantages, computational complexities between the ANN based hybrid methods. A comparative analysis of the MAPE factors of the ANN based hybrid methods are presented in Table 7.

TABLE 6 Comparative Study of ANN Based Hybrid Methods

TABLE 7 Comparative MAPE, MSE/RMSE Values and Factors of ANN Based Hybrid Methods

SECTION VII.

Hybrid Models With More Than Two Methods

In recent times, scientist and researchers are working more on developing hybrid models with more than two single methods where each single method has their own purpose of work. Some of these hybrid models are discussed below:

A. Generalized Neural Network-Wavelet Transform-Genetic Algorithm With Fuzzy Logic (GNN-WT-GAF)

To overcome the major narrowness of single and double hybrid methods, Chaturvedi et al. [150] proposed a technique where Generalized Neural Network (GNN) integrates with WT to train the genetic algorithm with fuzzy system (GAF) namely GNN-WT-GAF model. This proposed model is useful to limiting the drawbacks of ANN, to improve the convergence of GA and enhance the performance of fuzzy concepts. This proposed method has been tested using datasets from a 15 MVA, 33/11 kV substation at Dayalbagh Educational Institute (D.E.I.), Agra, India where datasets are divided into four wavelet components where wavelet is a mathematical function used to divide a given function of continuous- time signal into different scale components, namely one approximate component ( $a_{3}$ ) and three detailed components ( $d_{1},\mathrm { }d_{2},d_{3}$ ).These component are used to train the GNN model with given load pattern at time $t,\mathrm { }t_{1},t_{2}$ as input and $t+1$ as output. In final stage the wavelet components of GNN was trained by an adaptive genetic algorithm using fuzzy system (GAF). Furthermore, the proposed GNN-WT-GAF techniques compare with the other methods such as ANN-BP, GNN-BP, GNN-GAF, GNN-WT-BP and GNN-WT-GAF. This proposed method shows a RMSE value 0.0486 kW which is a more suitable result than the others.

B. Empirical Mode Decomposition-Particle Swarm Optimization-Support Vector Regression (EMD-PSO-SVR) or Adaptive Neuro Fuzzy Inference System (EMD-PSO-ANFIS)

For enhance the accuracy of load forecasting, EMD-PSO are garnering popularity due to the ability of EMD to decompose complicated load data into different intrinsic mode functions (IMFs), which can further be optimized by PSO. Wang and Wang [151] developed a load forecasting method where EMD-PSO is combined with SVR. Semero et al. [152] proposed a similar approach by integrating EMD-PSO with ANFIS. Their models have three superior capability. Firstly load data is divided into a number of IMFs components and one residue by EMD, secondly prognosis IMFs and residual value separately by SVR or ANFIS, thirdly PSO select the parameter of SVR or ANFIS automatically. Then the combination is used to make aggregate calculation and predict the result. For testing the ability of the proposed model they used the actual daily peak load data of State Grid Handan Electric Power Company in China for SVR and of a microgrid in Beijing for ANFIS. The proposed EMD-PSO-SVR model is compared with the various model such as EMD-SVR, PSO-SVR, SVR which showed that the proposed method results very low error values, i.e. MAPE value of 2.7510, a RMSE of 0.0595, and an MAE of 0.0414 for an alternate case study.

C. Ensemble Empirical Mode Decomposition-Extreme Learning Machine-Grasshopper Optimization Algorithm (EEMD-GLM-GOA)

Wu et al. [153] combined three single method for short term load forecasting to propose a new excellent hybrid model,which is assembled by the Ensemble Empirical Mode Decomposition (EEMD), ELM and grasshopper optimization algorithm (GOA). This model is superior to find the suitable weight and threshold values of ELM by GOA. This model has been examined using datasets from five main states in Australia, New South Wales, Queensland, Tasmania, South Australia and Victoria. In training phase, original electrical data set was decomposed by EEMD as well as suitable parameter are select by GOA for ELM. Then a non-linear model was constructed by ELM. The propose EEMD-ELM-GOA model was compared with three other types of models, i.e. Ensemble Empirical Mode Decomposition – Extreme Learning Machine – Dragonfly Algorithm (EEMD-ELM-DA), Ensemble Empirical Mode Decomposition – Extreme Learning Machine – Grey Wolf Optimizer (EEMD-ELM-GWO), and EEMD-ELM-PSO where proposed model has the best performance.

D. Global Harmony Search Algotrithm-Fuzzy Time Series-Least Squares-Support Vector Machine (GHSA-FTS-LS-SVM)

Chen et al. [115] proposed a hybrid GHSA-FTS-LS-SVM electrical load forecasting model based on Global Harmony Search Algorithm (GHSA) with Least Squares Support Vector Machines (LS-SVM) and the Fuzzy Time Series (FTS). In this proposed model, FTS calculate the clustering center of each cluster where the LS-SVM is used to model the resultant series, which is optimized by GHSA. Datasets from the Guangdong Province Industrial Development was used to test the accuracy of the proposed model result, which was also compared with different types of models such as ARIMA and other algorithms hybridized with LS-SVM including PSO, HS and GA. In addition, the proposed GHSA-FTS-LS-SVM techniques shows MAPE, MAE, and RMSE values of 3.709, 14.358 and 18.180 respectively which has a more suitable result than the others.

E. T-Copula-Improved Empirical Mode Decomposition-Deep Belief Network (T-Copula-IEMD-DBF)

The proposed hybrid technique by Haq and Ni [14] is designed for STLF, which are tested by the real time data from Australia and the United States of America. Dataset included weather data, time categorical data, social data and energy load demand for particular sampling time. Load demand time series, after decomposing into low frequency components with improved Empirical Mode Decomposition (IEMD), the effects of the loss components are retrieved using T-Copula based correlation analysis. On deriving the peak load indicative binary variables from value at risk (VaR), the combined data is fed to DBN for STLF. For five locations in Australia, the proposed method is compared to the standard NN, DBN and EMD-DBN data derived from [154] and for the standard NN, DBN and Copula-DBN derived from Texas, USA [27]. The model outperformed with respect to MAPE and RMSE for most of the cases. On average, the values for the hybrid models are reduced by 21.19% and 16.93% in case of Australia, and by 15.27% and 13.86% in case of the USA.

F. Genetic Algorithm-Non Linear Auto Regressive With Exogenous Input-Neural Network (GA-NARX-NN)

Jawad et al. [155] combined GA based training method for NARX-NN for 168 hours ahead STLF and 1 month ahead MTLF problems. A wind speed pattern recognition was also designed with the model. Preliminary model was designed using elitist GA, an advanced form of GA to improve the convergence in real-number coding. The chromosome population of GA are provided to NARX-NN as the initial weight to assess the fitness function for iteration. The concluding performance of the proposed model was compared with different single methods including auto-regressive (AR) with exogenous input (ARX), AR moving average with exogenous input (ARMAX), NARX, regression tree (RT), Levenberg-Marqardt (LM)-ANN and LM-NARX-NN. Experimentation showed a regression value greater than 0.99 for load forecasting, showing excellent values of 1.12% MAPE and 1.39% RMSE (with 0.00036 variance) tested for MTLF from ERCOT data for the month of December.

G. Fuzzy Combination Weight-Empirical Mode Decomposition and Kalman Filtering-Bat Algorithm-Support Vector Machine (FCW-EMD and KF-BA-SVM)

A multiple method based hybrid model is proposed by Liu et al. [156] which addresses multiple decomposition based factors into consideration (such as preprocessing, same day selection, sequence decomposition, selection and optimization). Fuzzy Combination Weight (FCW) assists the selection of dense days, reducing the data for forecasting. After EMD decomposes the components into series IMF, Bat Algorithm (BA) is then used to optimize the SVM parameters, the forecasted values of which are fine-tuned by Kalman Filter (KF). SVM is in the core of the complete model. The model is tested with the transformer substation data in South China for specific days with varying load profile. The model excelled over KF-BA-SVM, EMD and KF-BA-SVM, showing MAPE value of 1.10, which was 1.77 and 0.96 less than those methods respectively. In comparison with GA-SVM, PSO-SVM and EMD-SVM, where these models showed MAPE value of 13.5115%, 12.932% and 4.2519% respectively for a holiday of 2015, the proposed combination showed 1.9052%, outperforming all the aforementioned models.

H. Genetic Algorithm-Particle Swarm Optimization-Adaptive Neuro Fuzzy Inference System (GA-PSO-ANFIS)

Semero et al. [157] proposed a hybrid model for VSTLF in microgrids integrating GA, PSO, and ANFIS. The binary genetic algorithm selects important predictors that notably influence the load pattern among a number of input variables whereas PSO algorithm is used to optimize an ANFIS-based model for VSTLF. The proposed model was compared with other three models: BP neural network, ARIMA, and persistence models. The proposed method shows 40% reduced average execution time for one-step-ahead load forecasting. Apart from these notable hybrid methods, novel combined and complex algorithms such as Bidirectional Gated Recurrent Unit (Bi-GRU) based EGA-STLF [158], two layer hybrid neural network and three stage based enhanced ELITE (E-ELITE) framework and consensus-based mixed integer PSO assisted TRUST-TECH (CMPSOATT) method [159], EMD and fuzzy RBF-AR method for energy time series forecasting [160], and multiple scale Variational Mode Decomposition (VMD), K-Means Clustering and LSTM assisted for short term wind power forecasting [161] are being analyzed. These complex algorithms work for specific cases, and are being researched for general applications of load forecasting for different time horizons. Table 8 summarizes the advantages, disadvantages, computational complexity and difficulties in designing the notable models for load forecasting.

TABLE 8 Comparative Study of Hybrid Methods (More Than Two Model)

SECTION VIII.

Outcomes

The major findings after performing a comparative study of different state-of-the-arts of predictive models are enlisted as follows:

For designing a particular load forecaster, forecasting time, weather of the location and economy of the system have significant effect on the accuracy.
Previous load data can contribute to the augmentation of the accuracy.
The values from STLF can be approximated to MTLF or LTLF by adding econometric variables, load data or by performing statistical operations.
The values of MAE, RMSE and MAPE can be utilized to evaluate the accuracy of any single or hybrid predictive model.
Even though single predictive models can forecast loads with substantial precision, different models are integrated to improve the value of the forecast.
Among the single methods mentioned, ELM provides fast training, FCM provides better error analysis index, FRBS has the universal approximation capability and MLP has the highest classification accuracy.
SVM or ANN based double predictive models have shown improved results compared to the combination of other single predictive methods for load forecasting.
More than two methods are designed in cascade to perform sequentially and improve the accuracy of each individual output.
Each hybrid method, having their own merits, perform better than the combination of two methods as amalgamated methods contribute to the fine tuning of the parameters.

SECTION IX.

Conclusion

As electrical load prediction requires precision and frequent check for the varying parameters, researchers have worked on numerous ML models to enhance the performance of the existing forecasting technologies. This paper has presented a comprehensive study of notable single methods and delineated the state-of-the-art for constructing hybrid models. Different load forecasting technologies with respect to time horizons have been described in the study, based on which various prediction models have been enlisted for performing an extensive literature study and comparative analysis. The combination of two or more predictive models has been shown in the literature to construct hybrid models for demanded accuracy. SVM, ANN, and their relevant models have been proven fruitful, as these schemes have showcased excellent opportunities in achieving a well-organized power system utility where the demand load can be predicted with a minimum error percentage.

Statistical measurement has been utilized to compare the percentage error to assist in the selection of the most appropriate method for the particular forecasting routine. The outcome of the research has been enlisted to aid the readers in acquiring a detailed knowledge of contemporary predictive models, which can prove useful in selecting a particular model to design new hybrid forecasters as per the feature of the individual methods. As hybrid models with more than two methods have the considerable advantage compared to single or double models, future research will be focused on the design and comparison of hybrid models with three or more methods which will address the disadvantages of the current forecasters.

References is not available for this document.

MIT Libraries

MIT Libraries

A Comprehensive Review of the Load Forecasting Techniques Using Single and Hybrid Predictive Models

Alerts

Abstract:

Metadata

Abstract:

Nomenclature

Introduction

Aspects of Load Forecasting

A. Factors Affecting Load Forecasting

B. Benefits

C. Challenges

Categories of Load Forecasting

Evaluation Criteria

Single Method for Load Forecasting

A. Learning Based Methods

1) Deep Learning (DL)

2) Extreme Learning Machine (ELM)

3) Multilayer Perceptron (MLP)

4) Self-Organizing Map (SOM)

B. Rule Based Methods

1) Fuzzy C-Means (FCM)

2) Fuzzy Rule Base System (FRBS)

3) Fuzzy Regression (FR)

C. Metaheuristic Methods

1) Artificial Bee Colony (ABC)

2) Artificial Immune System (AIS)

3) Genetic Algoritm (GA)

Hybrid Models With Two Methods

A. Support Vector Machine (SVM)

1) SVM and Broyden–Fletcher–Goldfarb–Shanno Firefly Algorithm (SVM-BFGSFA)

2) SVM and Harmony Search Algorithm (SVM-HS)

3) SVM and Fruit Fly Optimization Algorithm (SVM-FOA)

4) SVM and Genetic Algoritm (SVM-GA)

5) SVM and Particle Swarm Optimization (SVM-PSO)

6) SVM and Artificial Bee Colony (SVM-ABC)

7) SVM and Simulated Annealing Algorithm (SVM-SA)

B. Artificial Neural Network (ANN)

1) ANN and Fruit Fly Optimization Algorithm (ANN-FOA)

2) ANN and Firefly Algorithm (ANN-FA)

3) ANN and Clustering Technique (ANN-CT)

4) ANN and Neural Fuzzy Inference System (ANN-NFIS)

5) ANN and Artificial Immune System (ANN-AIS)

6) ANN and Wavelet Transform (ANN-WT)

7) ANN and Particle Swarm Optimization (ANN-PSO)

8) ANN and Genetic Algorithm (ANN-GA)

Hybrid Models With More Than Two Methods

A. Generalized Neural Network-Wavelet Transform-Genetic Algorithm With Fuzzy Logic (GNN-WT-GAF)

B. Empirical Mode Decomposition-Particle Swarm Optimization-Support Vector Regression (EMD-PSO-SVR) or Adaptive Neuro Fuzzy Inference System (EMD-PSO-ANFIS)

C. Ensemble Empirical Mode Decomposition-Extreme Learning Machine-Grasshopper Optimization Algorithm (EEMD-GLM-GOA)

D. Global Harmony Search Algotrithm-Fuzzy Time Series-Least Squares-Support Vector Machine (GHSA-FTS-LS-SVM)

E. T-Copula-Improved Empirical Mode Decomposition-Deep Belief Network (T-Copula-IEMD-DBF)

F. Genetic Algorithm-Non Linear Auto Regressive With Exogenous Input-Neural Network (GA-NARX-NN)

G. Fuzzy Combination Weight-Empirical Mode Decomposition and Kalman Filtering-Bat Algorithm-Support Vector Machine (FCW-EMD and KF-BA-SVM)

H. Genetic Algorithm-Particle Swarm Optimization-Adaptive Neuro Fuzzy Inference System (GA-PSO-ANFIS)

Outcomes

Conclusion

References

IEEE Account

Purchase Details

Profile Information

Need Help?