Loading [MathJax]/extensions/TeX/boldsymbol.js
A Granular Computing-Based Hybrid Hierarchical Method for Construction of Long-Term Prediction Intervals for Gaseous System of Steel Industry | IEEE Journals & Magazine | IEEE Xplore

A Granular Computing-Based Hybrid Hierarchical Method for Construction of Long-Term Prediction Intervals for Gaseous System of Steel Industry


The framework of the proposed Granular Computing-based hybrid hierarchical method for construction of long-term Prediction Intervals for gaseous system of steel industry.

Abstract:

Byproduct gaseous energy is crucial to the iron-steel manufacturing process, where the tendencies of its generation and consumption can be deemed as a significant referen...Show More

Abstract:

Byproduct gaseous energy is crucial to the iron-steel manufacturing process, where the tendencies of its generation and consumption can be deemed as a significant reference for scheduling production and decision-making. Besides the requirements imposed on numeric prediction, practical applications also demand that the result be represented in terms of intervals expressing the reliability of prediction outcomes. Meanwhile, prediction intervals should cover a long period of time for delivering more information on future long-term trends. Bearing this in mind, in this study, a Granular Computing-based hybrid hierarchical method is proposed for constructing long-term Prediction Intervals (PIs), in which the horizontal modelling gives rise to long periods of prediction, and the vertical one extends them to the interval-valued format. Information granules are hierarchically distributed over single data and then on industrial features-based segments. Considering the criteria of coverage and specificity as sound performance indexes of the model, a suite of optimization problems is formulated and solved by involving Particle Swarm Optimization (PSO). Experimental studies demonstrate that the proposed approach exhibits better performance when compared with the performance reported for other commonly encountered methods.
The framework of the proposed Granular Computing-based hybrid hierarchical method for construction of long-term Prediction Intervals for gaseous system of steel industry.
Published in: IEEE Access ( Volume: 8)
Page(s): 63538 - 63550
Date of Publication: 26 March 2020
Electronic ISSN: 2169-3536

Funding Agency:


CCBY - IEEE is not the copyright holder of this material. Please follow the instructions via https://creativecommons.org/licenses/by/4.0/ to obtain full-text articles and stipulations in the API documentation.
SECTION I.

Introduction

As the essential medium for iron-steel making process, byproduct gaseous energy plays a pivotal role in real-world production. It is of significant importance to reasonably schedule the generation and consumption of the energy so as to sustain a secure and stable manufacturing condition. Without prediction information, the scheduling scheme can be developed on the basis of experience only, which naturally leads to the lack of efficiency and accuracy. Therefore, identifying future tendencies of the byproduct gas system becomes a primary demand stemming from the practical field, which will be beneficial for the scheduling work.

Regarding the prediction of steel industry, data-driven approaches are commonly deployed as the solution, such as nonlinear state-space modeling [1], Relevance Vector Machine [2], Gaussian Process (GP) [3], Neural Networks (NNs) [4]–​[5], Kalman filtering [6]–​[8], etc. However, due to the involvement of the iteration mechanism present, the models reported in these literatures can only produce a short period of forecasts, i.e., 30–60 points. While compared with the methods that focus on the single data and numeric prediction, Granular Computing (GrC) [9]–​[11] concentrates on processing information granules such as fuzzy sets [12]–​[13], rough sets [14]–​[15], interval sets [16]–​[17], etc. As such, the prediction horizon becomes extended and this offers the possibility of realizing long-term prediction. For instance, considering the semantics of industrial processes, a GrC-based long-term prediction model was proposed in [18] with the prediction length of 480, 720, and 1440 points. Adaptive data granulation coming with their intuitively perceived characteristics and combined with the collaborative conditional clustering method, [19] led to a scale-varying prediction model where the prediction horizon ranges from 60 to 180 points. In [20], the study involved structural characteristics of the energy network and established a hybrid collaborative fuzzy clustering method for predicting the storage amount of Linz-Donawitz converter Gas (LDG) realized for long horizons, i.e., 720 and 1,440 points. Although the studies above successfully produced some practical models with extended prediction horizons, the formation of final results was restricted to numeric values. With the development of various applications, the reliability of the prediction results becomes highly required. As a result, it is necessary to propose an effective approach for the construction of prediction intervals (PIs).

The common way of constructing PIs is based on the involvement of statistical vehicles, such as Mean Variance Estimation (MVE) [21]. However, the accuracy of MVE based method greatly depends on whether the related prediction model can precisely estimate the mean value of the target. The study [22] reported an instance of NN-based construction of PIs incorporating fuzzy member. Considering that the modeling considerations in this case are specialized in the field of renewable energy, the method cannot be directly applied to gaseous system of steel industry. Concentrating on electricity consumption forecasting in power system, [23] proposed a long-term probability forecasting model based on fuzzy Bayesian theory and expert prediction. While owing to the requirements on constructing accurate fuzzy membership functions, it is hardly to be deployed in energy system of steel industry for the complex pipeline network. Also, it primarily studied the data relationship among years rather than minutes in this study. Furthermore, kernel based dynamic Bayesian networks, Long Short Term Memory (LSTM) were also utilized to construct PIs, refer to [24]–​[26]. Although these approaches can generate interval-like results, they failed to satisfy practical requirements of the long predicting length.

Aiming at providing accurate long-term result along with the reasonable reliability measurement, a GrC-based hybrid hierarchical method, we propose to construct PIs for byproduct gaseous energy system in steel industry. The contributions of this study can be summarized as follows.

  1. On the one hand, the prediction horizon is successfully extended to be long; this is accomplished by modeling completed at the level of information granules instead of single numeric data. On the other hand, the PIs are constructed by vertically extending the numeric values to intervals; it offers a novel granulation method which brings bi-directional considerations.

  2. The established hierarchical structure involving three layers of processing with the consideration of industrial periodic features of the process exhibit substantially positive impact on both efficiency and accuracy. The corresponding programming models are established at each layer of the architecture to optimize the PIs.

In order to further clarifying the importance of this paper, a comparative table is given as Table 1 to make contrast with other existed long-term PIs construction methods. Besides, the experimental study is concerned with the relationships between different parameters of the model and the performance of the proposed approach. Subsequently the optimized PIs are formed and compared with the ones being constructed by traditional methods.

TABLE 1 Comparison Between the Proposed and Other Existed PIs Construction Methods
Table 1- 
Comparison Between the Proposed and Other Existed PIs Construction Methods

The paper is organized as follows. Section II covers some preliminaries, i.e., we look at the description of the industrial problem and traditional GrC-based long-term prediction model. Section III provides the detailed elaboration on the proposed GrC- based hybrid hierarchical PIs construction method. Then experimental studies are reported for real-world data coming from steel industry; Section IV covers studies on both parameters and PIs construction to demonstrate the accuracy and efficiency of the proposed model. Finally, some conclusions and possible future topics are included in Section V.

SECTION II.

Preliminaries

A. Industrial Problem Description

A typical structure of the byproduct gaseous system used in steel industry in China is illustrated in Fig. 1. It consists of three types of gas, i.e., Blast Furnace Gas (BFG), Coke Oven Gas (COG), and Linz-Donawitz converter Gas (LDG). Coke ovens, Blast Furnaces (BFs) and Converters (marked as LD in the figure) are the generators who produce the gases during iron-steel production. The included users are hot/cold rolling, Plate plant, etc. Besides these units, some tanks are also involved in the network for temporarily storing the gases. The pipeline along with press/mixture stations is organized for gas distribution completed across the entire system.

FIGURE 1. - Structure of the byproduct gaseous system for steel industry.
FIGURE 1.

Structure of the byproduct gaseous system for steel industry.

For a variety of reasons, imbalance always arises where the operating crews need to find a solution for fulfilling the gap between the generation and consumption. Without any estimation completed ahead, the energy scheduling work would be blind as well as inaccurate and of low efficiency. While owing to the complicated pipeline network and the large number of gas units, it is usually difficult for the staff to accurately estimate the future trend of gas flows for making a reasonable adjustment. As such, prediction of gas flows becomes necessary for steel industry.

Considering that the accumulated process data still can be sufficiently utilized, data-driven methods become a common solution for solving the complicated industrial problem. The target variables for PIs construction of this study are the generators and consumption units.

Considering that the accumulated process data still can be sufficiently utilized, data-driven methods become a common solution for solving the complicated industrial problem. The target variables for PIs construction of this study are the generators and consumption units.

B. Traditional GRC-Based Long-Term Prediction Model

Assume a time series is divided into a collection of data segments, i.e., \mathbf {S}=\left \{{{\boldsymbol {s}}_{\mathbf {1}},\boldsymbol {s}_{\mathbf {2}},\cdots,\boldsymbol {s}_{\boldsymbol {N}} }\right \}\!, \boldsymbol {s}_{\boldsymbol {i}}\in R^{n} . As an unsupervised learning vehicle, Fuzzy C-Means (FCM) [27]–​[29] clustering algorithm requires little knowledge on the mechanism of the gaseous energy system. Besides, it also provides the foundation of constructing long-term prediction intervals. Therefore, the traditional GrC-based long-term prediction model starts with the use of the FCM clustering to form the prototypes \mathbf {V}=\left \{{ \boldsymbol {v}_{\mathbf {1}}, \boldsymbol {v}_{\mathbf {2}},\cdots, \boldsymbol {v}_{ \boldsymbol {c}} }\right \}\!, \boldsymbol {v}_{ \boldsymbol {i}}\in R^{n} and related fuzzy partition matrix denoting fuzzy membership grades for each segments towards a cluster \mathbf {U}=\left \{{ \boldsymbol {u}_{\mathbf {1}}, \boldsymbol {u}_{\mathbf {2}},\cdots, \boldsymbol {u}_{ \boldsymbol {N}} }\right \}\!, \boldsymbol {u}_{ \boldsymbol {i}}\in R^{c} . Then the mappings are established between information granules, in particular, the prototypes. The related vector of membership degrees \boldsymbol {u}_{ \boldsymbol {k}} is required to form prediction results on the basis of the historical data, i.e. (\boldsymbol {u}_{k-n_{I}},\cdots, \boldsymbol {u}_{k-2}, \boldsymbol {u}_{k-1}) , where n_{I} denotes the number of granules viewed as inputs. We construct a function f expressed as follows \begin{equation*} \hat { \boldsymbol {u}}_{k}=f\left ({\boldsymbol {u}_{k-n_{I}},\cdots, \boldsymbol {u}_{k-2}, \boldsymbol {u}_{k-1} }\right)\tag{1}\end{equation*} View SourceRight-click on figure for MathML and additional features.

For building this mapping, various approaches could be considered such as polynomial fitting [30], neural networks [31] etc. Here the numeric long-term prediction result is expressed in the form \begin{equation*} \hat { \boldsymbol {s}}_{ \boldsymbol {k}}=\frac {\sum _{i=1}^{c} \hat {u}_{ki} \boldsymbol {v}_{ \boldsymbol {k}}}{\sum _{i=1}^{c} \hat {u}_{ki}}\tag{2}\end{equation*} View SourceRight-click on figure for MathML and additional features. where \hat {u}_{ki} is the element of \hat { \boldsymbol {u}}_{k} .

Obviously, the format of the final results \hat { \boldsymbol {s}}_{ \boldsymbol {k}} is directly related to the one of the granules, i.e., the prototypes \boldsymbol {v}_{ \boldsymbol {i}} . Bearing this in mind, a straightforward way to construct PIs is to extend \mathbf {V} to an interval format \begin{equation*} \left [{ \mathbf {V}^{-},\mathbf {V}^{+} }\right]=[\mathbf {V}- \boldsymbol {\varepsilon },\mathbf {V}+ \boldsymbol {\varepsilon }]\tag{3}\end{equation*} View SourceRight-click on figure for MathML and additional features. where \boldsymbol {\varepsilon }\in R^{n} denotes some level of information granularity to be optimized. Once the training of the mappings has been done and after completing the defuzzification (decoding) for both the upper and lower bound, a series of PIs can be obtained. However, there are two drawbacks associated with this approach:

  1. The dimensionality of \boldsymbol {\varepsilon } equals to the one of the final prediction results, which is typically greater than 480 samples when considering a long prediction horizon. The optimization of such scale is comparatively of low efficiency, and sometimes even not feasible which makes this approach not suitable for this real-world application.

  2. The determination of the values of \boldsymbol {\varepsilon } requires a well-structured optimization mechanism. It should consider not only the practical aspects of the data, but also the coverage along with the specificity of the information granules into account which would be noticeably beneficial for achieving an accurate construction of PIs.

SECTION III.

GRC-Based Hybrid Hierarchical Method for PIs Construction

To overcome the shortcomings of the existing approach, a GrC-based hybrid hierarchical method is proposed for the PIs construction of gaseous energy system in steel industry. The practical requirements can be summarized with regard to the two aspects, i.e., a long prediction horizon and the interval-valued results. The long-term prediction is directly related to the modelling concerning the x-axis as well as horizontal one, in which the numeric forecasts are generated by probabilistic based approach and can be deemed as the basis of the following PIs. The construction of intervals is concerned with the vertical direction by using granulation technique again in y-axis. An illustration of this bi-directional hybrid construct is displayed in Fig. 2.

FIGURE 2. - Hybrid granulation realized in horizontal and vertical direction.
FIGURE 2.

Hybrid granulation realized in horizontal and vertical direction.

In this section, the modeling process is illustrated from the perspective of horizontal and vertical modeling first. Then the detailed computing procedures are briefly summarized.

A. Horizontal – GRC-Based Probabilistic Approach for Numeric Long-Term Prediction

In order to extend the prediction horizon, a granulation carried out with respect to the x -axis is required for horizontal modeling. As mentioned in Section II, models for establishing mappings, e.g., neural networks, brings about additional parameters and computing time, whereas the practical setting calls for a simpler approach. Therefore, we consider a supervised probability-based method serving as a quick and convenient modeling alternative.

Some definitions are given to deliver the required prerequisites.

Definition 1:

Probability of a specific prototype \boldsymbol {v}_{ \boldsymbol {i}} (i=1,2,\cdots,c ) is defined as \begin{equation*} p\left ({\boldsymbol {v}_{ \boldsymbol {i}} }\right)\doteq \frac {\sum _{j=1}^{N} {\mathbb {I}_{j}\{u_{kj}=\mathrm {max}(\boldsymbol {u}_{ \boldsymbol {k}})\}}}{N}\tag{4}\end{equation*} View SourceRight-click on figure for MathML and additional features. where u_{kj} is the j -th element of the vector \boldsymbol {u}_{ \boldsymbol {k}} (k=1,2,\cdots,c ). \mathbb {I}_{j}\{u_{kj}=\mathrm {max}(\boldsymbol {u}_{ \boldsymbol {k}})\} is an indicator function for denoting the times when segment \boldsymbol {s}_{ \boldsymbol {j}} has the maximum fuzzy membership grades towards the specific prototype \boldsymbol {v}_{ \boldsymbol {i}} , which is expressed as \begin{equation*} \mathbb {I}_{j}\left \{{u_{kj}=\max \left ({\boldsymbol {u}_{ \boldsymbol {k}} }\right) }\right \}=\begin{cases} 1&k=i \\ 0&k\ne i \\ \end{cases}\tag{5}\end{equation*} View SourceRight-click on figure for MathML and additional features.

Definition 2:

Probability of a data segments \boldsymbol {s}_{ \boldsymbol {i}} (i=1,2,\cdots,N ) is a c\times 1 vector defined as \begin{equation*} \boldsymbol {p}\left ({\boldsymbol {s}_{ \boldsymbol {i}} }\right)\doteq {[p\left ({\boldsymbol {v}_{\mathbf {1}} }\right)\!,p\left ({\boldsymbol {v}_{\mathbf {2}} }\right)\!,\cdots,p\left ({\boldsymbol {v}_{ \boldsymbol {c}} }\right)] }^{T}\tag{6}\end{equation*} View SourceRight-click on figure for MathML and additional features.

Definition 3:

A c\times c co-occurrence matrix \mathbb {P} describing the relationship between consecutive segments, i.e. \boldsymbol {s}_{ \boldsymbol {k}} and \boldsymbol {s}_{ \boldsymbol {k}-\mathbf {1}} (k=2,3,\cdots,N ), is defined as \begin{equation*} \mathbb {P} \doteq \{{p}(\boldsymbol {v}_{ \boldsymbol {i}}\vert \boldsymbol {v}_{ \boldsymbol {j}})\vert i=1,2,\cdots,c;j=1,2,\cdots,c\}\tag{7}\end{equation*} View SourceRight-click on figure for MathML and additional features. where p(\boldsymbol {v}_{ \boldsymbol {i}}\vert \boldsymbol {v}_{ \boldsymbol {j}}) denotes the conditional probability of ‘ \boldsymbol {s}_{ \boldsymbol {k}} having the maximal membership degree on \boldsymbol {v}_{ \boldsymbol {i}} ’ given ‘ \boldsymbol {s}_{ \boldsymbol {k}-\mathbf {1}} having the maximal membership degree on \boldsymbol {v}_{ \boldsymbol {j}} ’. It should be noted that the co-occurrence matrix is nothing but a 1^{\mathrm {st}} order Markov transition matrix of a symbolic dynamics, where the states correspond to the different clusters. Bearing this in mind, the proposed method has the possibility of connecting process control, reinforcement learning and other Markov Decision Process related directions.

Generally, the model aims at estimating the corresponding probabilities. Based on the above definitions, the mechanism of prediction can be expressed in the form \begin{equation*} \hat { \boldsymbol {p}}\left ({\boldsymbol {s}_{ \boldsymbol {k}} }\right)= \boldsymbol {p}\left ({\boldsymbol {s}_{ \boldsymbol {k}-\mathbf {1}} }\right) \boldsymbol \circ \mathbb {P}\tag{8}\end{equation*} View SourceRight-click on figure for MathML and additional features. where ‘ \boldsymbol {\circ } ’ is a composition operation such as multiplication, etc. \mathbb {P} can be built on a basis of the available training data. The prediction result \hat { \boldsymbol {s}}_{ \boldsymbol {k}} is obtained by utilizing the centroid technique \begin{equation*} \hat { \boldsymbol {s}}_{ \boldsymbol {k}}=\frac {\sum _{i=1}^{c} \hat {p}_{ki} \boldsymbol {v}_{ \boldsymbol {k}}}{\sum _{i=1}^{c} \hat {p}_{ki}}\tag{9}\end{equation*} View SourceRight-click on figure for MathML and additional features. where \hat {p}_{ki} is the j -th element of \hat { \boldsymbol {p}}\left ({\boldsymbol {s}_{ \boldsymbol {k}} }\right) .

Such an approach extends the prediction horizon. It is worth noting that this approach involves far less parameters in comparison to the number of parameters used when dealing with neural networks, polynomial fitting, etc., which makes it more suitable for real-world application.

B. Vertical – Hierarchical Structure for Optimization on the Information Granularities

Intervals can be constructed by augmenting numeric prototypes by building them around their specific numeric values. In such a way, every result of prediction comes with the corresponding upper and lower boundary. A large number of parameters would also be generated leading to the reduction of the computational efficiency.

While it should be noted that the data of byproduct gaseous system in this study always exhibits industrial features which in practical denote production procedures. Take the generation of #3 BFG as an example as shown in Fig. 3. Here the periodic patterns can be clearly distinguished here. Two typical phases are involved in each pattern, i.e., air blasting (the amplitude ranges from 400km3/h to 550km3/h in approximately 30 minutes) and temporary suspending (the amplitude ranges from 250km3/h to 450km3/h in approximately 30 minutes). Based on such type of prior knowledge, the levels of information granularities can be hierarchically optimized. This arrangement will result in higher computing efficiency.

FIGURE 3. - Periodic patterns of generation of #3 BFG for horizontal granulation.
FIGURE 3.

Periodic patterns of generation of #3 BFG for horizontal granulation.

By taking into consideration the concerns of efficiency and PI’s performance, we develop a hierarchical structure as shown in Fig. 4. The numeric data are now generalized as intervals located at the 1st layer with \varepsilon _{i,j}\,\,(i=1,2, \cdots,n_{2};j=1,2,\cdots,n_{1}) being the levels of information granularity, where n_{2} and n_{1} refer to the number of nodes forming the 2nd and 1st layer, respectively. Then based on the industrial periodic features identified in Fig. 3, the 2nd layer is established in which the related levels are \varepsilon _{i,0}\,\,(i=1,2,\cdots,n_{2}) . the PIs are constructed using the probability-based approach and form as the node in the last layer.

FIGURE 4. - Hierarchical structure for vertical information granulation.
FIGURE 4.

Hierarchical structure for vertical information granulation.

The levels of information granularities are sequentially allocated starting from the bottom layer to the top one. In contrast, the optimization of the parameters is carried out in an opposite direction which means \varepsilon _{i,0} are first determined and then one optimizes \varepsilon _{i,j} . With regard to the objective function, two criteria describing information granules, i.e., coverage and specificity are considered.

Coverage: The coverage measure concerns the count the number of cases when predicted intervals “cover” the original experimental data. As such, Mean Coverage Error (MCE) which refers to the absolute difference between Empirical Coverage Probability (ECP) and Nominal Coverage Probability (NCP) are employed and the MCE is taken in the form \begin{equation*} MCE=\left |{ ECP-NCP }\right |\tag{10}\end{equation*} View SourceRight-click on figure for MathML and additional features. where ECP and NCP is defined as \begin{align*} ECP=&\frac {1}{T}\sum \nolimits _{i=1}^{T} c_{i} \tag{11}\\ NCP=&\left ({1-\alpha }\right)\times 100\%\tag{12}\end{align*} View SourceRight-click on figure for MathML and additional features. where T is the number of experimental data. c_{i} denotes an indicator which equals to 1 when the PIs covers the target, otherwise it equals to 0.~\alpha \in [{0,1}] is the level of significance. As the result of optimization, the value of the MCE is minimized.

Specificity: In order to quantify the specificity of the levels of information granularities, Mean Prediction Intervals Width (MPIW), is considered here \begin{equation*} MPIW=\frac {1}{T}\sum \nolimits _{i=1}^{T} \left |{ \bar {s}_{i}-\underline {s}_{i} }\right |\tag{13}\end{equation*} View SourceRight-click on figure for MathML and additional features. where \bar {s}_{i} and \underline {s}_{i} denote the upper and lower bounds of PIs, respectively. MPIW also needs to be minimized.

Based on the formulation expressed above, we take the two measures into account and consider their product, which is regarded as an objective function. The optimization problem is subject to the constraints imposed on the levels of information granularity. The details are presented as follows

  1. Second layer – industrial feature-based segments\begin{align*}&\hspace {-0.5pc}\min \vert \frac {1}{n_{2}}\!\sum \nolimits _{i=1}^{n_{2}} c_{i,0} \!-\!\left ({1\!-\!\alpha }\right)\!\times \!100\% \vert \!\times \! \left({\frac {1}{n_{2}}\!\sum \nolimits _{i=1}^{n_{2}} \left |{ \overline s _{i,0}\!-\!\underline {s}_{i,0} }\right | }\right) \\[-2pt]&\hspace {-0.5pc}s.t.~\frac {1}{n_{2}}\sum \nolimits _{i=1}^{n_{2}} \varepsilon _{i,0} =\varepsilon \\[-2pt]&\hspace {-0.5pc}\hphantom {s.t.~} \alpha _{min}\varepsilon \le \varepsilon _{i,0}\le \alpha _{max}\varepsilon \\[-2pt]&\hspace {-0.5pc}\hphantom {s.t.~}\underline {s}_{i,0}\ge 0\tag{14}\end{align*} View SourceRight-click on figure for MathML and additional features.where \varepsilon is the prescribed overall level of information granularity, along with the specific one \varepsilon _{i,0} for each granule in 2nd layer. \alpha _{min} and \alpha _{max} are deployed to constrain \varepsilon _{i,0} from assuming excessively high or low values which are far away from the predetermined level \varepsilon . c_{i,0} is the indicator defined as c_{i} in (11). \overline s_{i,0} and \underline {s}_{i,0} refer to the upper and lower bounds of constructed PIs with \varepsilon _{i,0} , respectively.

  2. First layer – specific data points

The optimization completed for 1st layer can be illustrated as a number of n_{2} problems, each of which is as follows.\begin{align*}&\hspace {-0.5pc}\min \vert \frac {1}{n_{1}}\!\sum \nolimits _{j=1}^{n_{1}} c_{i,j} \!-\!\left ({1\!-\!\alpha }\right)\!\times \! 100\% \vert \!\times \! \left({\frac {1}{n_{1}}\sum \nolimits _{j=1}^{n_{1}} \left |{ \overline s _{i,j}-\underline {s}_{i,j} }\right | }\right) \\&\hspace {-0.5pc}s.t.~\frac {1}{n_{1}}\sum \nolimits _{j=1}^{n_{1}} \varepsilon _{i,j} =\varepsilon _{i,0} \\&\hphantom {\hspace {-0.5pc}s.t.~}\beta _{min}\varepsilon _{i,0}\le \varepsilon _{i,j}\le \beta _{max}\varepsilon _{i,0} \\&\hphantom {\hspace {-0.5pc}s.t.~}\underline {s}_{i,j}\ge 0 \\&\hphantom {\hspace {-0.5pc}s.t.~}i=1,2,\cdots,n_{2}\tag{15}\end{align*} View SourceRight-click on figure for MathML and additional features. where \overline s_{i,j} and \underline {s}_{i,j} refer to the upper and lower bounds of constructed PIs with \varepsilon _{i,j} , respectively. The parameters \varepsilon _{i,j} exhibit the similar meaning as the ones in 2nd layer. It should be noted that the overall levels of information granularity at this layer, i.e., \varepsilon _{i,0} , is nothing but the optimized result of 2nd layer by solving (14). Regarding the convergence rate and applicability, Particle Swarm Optimization (PSO) [32]–​[33] is utilized here as an optimization method to solve (14) and (15). And (14) and (15) should be transformed into constraint-free optimization problem by using penalty function technique for using PSO.

In view of the independence present among the problems of 1st layer, the \varepsilon _{i,j} can be simultaneously optimized by implementing parallel computing. As a result, the computational cost of the proposed hierarchical structure here is O(c+n_{1}) . Given that n_{1} is typically 1/10 to 1/6 of n , it is at least 5 times less than that of O(n) for the traditional GrC-based method. Therefore, based on the above hierarchical structure concerning the vertical direction, the efficiency is remarkably improved so as to provide real-time PI results.

C. Computing Procedures

To clarify the computing procedures of the proposed GrC based hybrid hierarchical method, Fig. 5 outlines an overall framework.

FIGURE 5. - The framework of the proposed method.
FIGURE 5.

The framework of the proposed method.

The detailed processing is outlined in the form:

  • Step 1:

    Divide the raw data into segments \mathbf {S}=\{ \boldsymbol {s}_{\mathbf {1}}, \boldsymbol {s}_{\mathbf {2}},\cdots, \boldsymbol {s}_{ \boldsymbol {N}} \}\!, \boldsymbol {s}_{ \boldsymbol {i}}\in \boldsymbol {R}^{ \boldsymbol {n}} concerning industrial features. Deploy unsupervised clustering method, i.e., FCM, to acquire numeric prototypes \mathbf {V}=\left \{{ \boldsymbol {v}_{\mathbf {1}}, \boldsymbol {v}_{\mathbf {2}},\cdots, \boldsymbol {v}_{ \boldsymbol {c}} }\right \}\!, \boldsymbol {v}_{ \boldsymbol {i}}\in \boldsymbol {R}^{ \boldsymbol {n}} and related fuzzy membership degrees \mathbf {U}=\left \{{ \boldsymbol {u}_{\mathbf {1}}, \boldsymbol {u}_{\mathbf {2}},\cdots, \boldsymbol {u}_{ \boldsymbol {N}} }\right \}\!, \boldsymbol {u}_{ \boldsymbol {i}}\in \boldsymbol {R}^{ \boldsymbol {c}} as the results of horizontal modeling.

  • Step 2:

    Vertically extend the prototypes as [v_{ij}-\varepsilon _{i,0},v_{ij}+\varepsilon _{i,0}] , where \varepsilon _{i,0},i=1,2, \cdots,n_{2} denotes the levels of information granularity in 2nd layer.

  • Step 3:

    Solve the optimization problem (14) with the PSO algorithm. This process involves establishment of co-occurrence matrix \mathbb {P} for both upper and lower boundaries according to Definition 3, calculation of the corresponding forecasted probability by (8) and construction of PIs by (9), which can be regarded as a supervised learning procedure.

  • Step 4:

    Further distribute information granularities \varepsilon _{i,j}, i=1,2, \cdots,n_{2};j=1,2, \cdots,n_{1} on the prototypes which are now as [{v_{ij}-\varepsilon }_{i,j}-\varepsilon _{i,0},{v_{ij}+\varepsilon }_{i,j}+\varepsilon _{i,0}] . Then optimize them via a set of problems as (15) in parallel strategy.

  • Step 5:

    Based on the optimized \varepsilon _{i,j} and \varepsilon _{i, 0} , the final PIs can be obtained via centroid decoding technique (9).

SECTION IV.

Experimental Studies

The experiments reported in this study begin with PIs construction for Mackey-Glass time series. Then they are also implemented using real-world data coming from February 1st, 2017 to June 1st, 2017, of which the collection rate is 1 point/minute. Three significant objects, i.e., #3 BFG generation, #1 COG generation and BFG consumption of #2 hot blast stove are selected as representatives for experimental analysis. For retaining the continuity of the data, the training and validation set are constructed by consecutive data in this study. And the following results are all based on testing set. Parallel Computing Toolbox in MATLAB R2017a is used for the optimization of the 1st layer. The developing environment is listed as follows.

Hardware:

CPU: Intel(R) Core(TM) i7-6950X CPU @ 3.00GHz

Memory: Corsair Vengeance DDR4 3000Mhz 16Gb \times 4

GPU: Nvidia Titan xp \times 2

Motherboard: Asus ROG Rampage V Edition 10

Hard drive: Samsung NVMe SSD 960 PRO 512g

First, parametric studies of the prescribed settings of the proposed method are given to discuss the determination of the values of the parameters. Then the PIs construction experiments are conducted, in which the comparative methods are MVE, Bootstrap, Gaussian Process (GP) and traditional GrC approach. The numeric values for the MVE are predicted with the aid of the LSSVM method (including parameters: \sigma -width of Gaussian kernel, \gamma -penalty coefficient), and the ones for the Bootstrap are acquired by Echo State Network (ESN) (including parameters such as a size of reservoir, \tau -sparseness of the weights, r -spectral of the weights). The GP model employs Squared Exponential Automatic Relevance Determination (SEARD) as kernel function, along with conjugate gradient method for optimization (including parameters: n_{emb} -embedded dimension, n_{sam} -sample size). These methods are compared for manifesting the shortcoming of using iteration mechanism. The mapping for the traditional GrC approach is modeled by the Back Propagation Neural Network (BPNN) (including parameters: n_{in} , n_{hid} , n_{out} denote the number of neurons located in the layer of input, hidden and output). This traditional GrC approach is compared for demonstrating the drawback of simple one-layer construction of the PIs. The hyperparameters for the comparative methods as well as the proposed model are obtained by using trail-and-error to determine a possible range at first and then PSO to acquire the exact optimal values. Considering the measurement of reliability and sharpness of the PIs, three typical criteria involving Prediction Intervals Coverage Probability (PICP) [34], Prediction Intervals Normalized Average Width (PINAW) [35] and Interval Score L(\overline s _{i,0},\underline {s}_{i,0},t_{i}) [36] are employed for error statistics, which are defined as follows \begin{align*} PICP=&\frac {1}{n_{test}}\sum \nolimits _{i=1}^{n_{test}} c_{i} \times 100\% \tag{16}\\ PINAW=&\frac {1}{n_{test}\left ({t_{max}-t_{min} }\right)}\sum \nolimits _{i=1}^{n_{test}} \left |{ \overline s _{i,0}-\underline {s}_{i,0} }\right | \tag{17}\\&\hspace {-3.5pc}L\left ({\overline s _{i,0},\underline {s}_{i,0},t_{i} }\right) \\=&\frac {1}{n_{test}}\sum \nolimits _{i=1}^{n_{test}} l_{i} \times 100\% \\ l_{i}=&\begin{cases} -2\alpha \!\left ({\overline s_{i,0}\!-\!\underline {s}_{i,0} }\right)\!-\!4(\underline {s}_{i,0}-t_{i}),\quad if~t_{i} < \underline {s}_{i,0} \\ -2\alpha \!\left ({\overline s_{i,0}-\underline {s}_{i,0} }\right)\!,\quad if~\underline {s}_{i,0}\le t_{i}\le \overline s_{i,0} \\ -2\alpha \!\left ({\overline s_{i,0}\!-\!\underline {s}_{i,0} }\right)\!-\!4(t_{i}\!-\!\overline s_{i,0}),\quad if~t_{i}>\overline s_{i,0} \\ \end{cases}\tag{18}\end{align*} View SourceRight-click on figure for MathML and additional features. where n_{test} is the size of the testing set. c_{i} equals to 1 when the target is placed in the prediction interval; otherwise, c_{i} equals to 0.~t_{max} and t_{min} denotes the maximum and minimum value of the testing set. \overline s_{i,0} and \underline {s}_{i,0} are the upper and lower boundaries of the PIs respectively. t_{i} is the data point of testing set. Besides, the Computing Time (CT) is also given to assess the efficiency of different methods.

A. Parametric Study

To carry out parametric study, we randomly select the #3 BFG generation as the representative. In order to synthetically articulate the relationship between the performance and the parameter, we integrated PICP, PINAW and Interval Score as one index I as follows \begin{equation*} I=PICP\times (1-\overline {PINAW})\times \overline {L\left ({\overline s _{i,0},\underline {s}_{i,0},t_{i} }\right)}\tag{19}\end{equation*} View SourceRight-click on figure for MathML and additional features. where \overline {PINAW} and \overline {L\left ({\overline s _{i,0},\underline {s}_{i,0},t_{i} }\right)} refer to the normalized PINAW and L\left ({\overline s _{i,0},\underline {s}_{i,0},t_{i} }\right) respectively. PICP and L\left ({\overline s _{i,0},\underline {s}_{i,0},t_{i} }\right) should be as high as possible so that the PIs performs satisfied on reliability and sharpness, while PINAW needs to be lower for having PIs being more specific. Bearing this in mind, here we compute (1-\overline {PINAW}) as the integrated index.

Among the parameters of the proposed method, the most important one is the overall level of information granularity \varepsilon for which has a notable impact on the performance of constructed PIs. An assigned range should be determined at first by expert knowledge and trial-and-error, such as [32, 50] for this example. The result of studying I with different values of \varepsilon is shown here in Fig. 6 indicating that \varepsilon _{opt}=34 . This shows a way how \varepsilon is determined in real-world applications.

FIGURE 6. - Integrated index of the constructed PIs with different 
$\varepsilon $
.
FIGURE 6.

Integrated index of the constructed PIs with different \varepsilon .

Another parameter that needs to be discussed here is the number of prototypes. Similarly, Fig. 7 displays the relationship between the values of I with regard to c . This helps to set c=10 . It should be also noted that the variance of integrated index here is only 0.2. In other words, it is not substantially changed comparing to the one of the overall level of information granularity. Besides, a series of results reported for the three criteria with varying values of c are further given as Fig. 8, in which the difference between maximum and minimum value of PICP, Interval Score and PINAW are only equal to 0.02, 0.003 and 0.2, respectively. Given that the performance exhibited a limited difference, we can conclude that the proposed method is not sensitive to the value of the FCM parameter. This can be regarded as an advantage of the approach so that we only need to concentrate on the values of \varepsilon when coping with the real-world application. Also, this result is only limited to the particular case with the chosen epsilon value. Under different circumstances and applications, this may not be considered as a general result.

FIGURE 7. - Integrated index of the constructed PIs with different 
$c$
.
FIGURE 7.

Integrated index of the constructed PIs with different c .

FIGURE 8. - Three indices of the constructed PIs with different 
$c$
.
FIGURE 8.

Three indices of the constructed PIs with different c .

B. Optimized PIs Results

1) Mackey-Glass

As a typical chaos time series benchmark, Mackey-Glass is deployed here to validate the proposed method. It can be depicted from the elaborative comparison in Fig. 9 that the MVE, Bootstrap and GP give unsatisfied results owing to the iterative error. Also, a considerable number of points are out of the PI constructed by the traditional GrC-based model. While the proposed hybrid hierarchical model exhibits superiority on not only accuracy including PICP, PINAW and Interval Score, but also computing efficiency, which can be also concluded from the statistics in Table 2.

TABLE 2 Statistics and Parameters for Mackey-Glass ( \alpha=0.1 )
Table 2- 
Statistics and Parameters for Mackey-Glass (
$\alpha=0.1$
)
FIGURE 9. - PIs construction results for Mackey-Glass time series: (a) MVE (b) Bootstrap (c) GP (d) Traditional GrC (e) Proposed method.
FIGURE 9.

PIs construction results for Mackey-Glass time series: (a) MVE (b) Bootstrap (c) GP (d) Traditional GrC (e) Proposed method.

2) #3 BFG Generation

The generation amount of BFG plays a significant role for the whole gaseous energy network, which is directly related to iron production. Experiments utilizing different methods are shown in Fig. 10. Due to the iteration mechanism, the MVE only performs well in about first half an hour. The error increases with the increase of the length of the prediction horizon so that the constructed PI turns to exhibit oscillations. Although the Bootstrap and GP both provide a series of periodic features, they fail to accurately estimate the time span and amplitude of each period. While the traditional one-layer GrC-based model exhibits an improved performance, which roughly covers the experimental data. However, it is noted that some data are out of the PIs. Comparing these three approaches, the proposed method apparently behaves much better with regard to coverage. Besides, the constructed PIs also exhibit significant specificity.

FIGURE 10. - PIs construction results for #3 BFG generation: (a) MVE (b) Bootstrap (c) GP (d) Traditional GrC (e) Proposed method.
FIGURE 10.

PIs construction results for #3 BFG generation: (a) MVE (b) Bootstrap (c) GP (d) Traditional GrC (e) Proposed method.

In order to further demonstrate the performance of the methods, error statistics are given in Table 3 along with the parameter settings. Although MVE exhibits some advantage over PINAW, the PICP and Interval Score are far worse than for other methods. The proposed one not only guarantees the accuracy of PIs, but also requires less computing time as forming the most efficient solution.

TABLE 3 Statistics and Parameters of the Methods for #3 BFG Generation ( \alpha=0.1 )
Table 3- 
Statistics and Parameters of the Methods for #3 BFG Generation (
$\alpha=0.1$
)

3) #1 Cog Generation

Another essential byproduct gaseous energy for iron-steel making process, i.e. COG, is addressed here of which the generation amount is deployed as an example. Due to the continuous burners switching in practice, this data exhibits a frequent variation, which makes it hard to be accurately predicted. Fig. 11 presents the elaborative comparison of constructed PIs along with Table 4 lists error statistics and parameters. Due to the huge amounts of parameters to be determined in only one layer, the traditional GrC exhibit inconsistent performance on the widths of PIs so that the statistics are not satisfied. The results clearly indicate the absolute superiority by the proposed method to the others on PICP, PINAW and interval score. It is also noticeable that the computational cost is reduced to a great extent because of the hierarchical structure and the parallel strategy being used.

TABLE 4 Statistics and Parameters for #1COG Generation ( \alpha=0.1 )
Table 4- 
Statistics and Parameters for #1COG Generation (
$\alpha=0.1$
)
FIGURE 11. - PIs construction results for #1 COG generation: (a) MVE (b) Bootstrap (c) GP (d) Traditional GrC (e) Proposed method.
FIGURE 11.

PIs construction results for #1 COG generation: (a) MVE (b) Bootstrap (c) GP (d) Traditional GrC (e) Proposed method.

4) BFG Consumption of #3 LD Converter

The last part of the experimental study concerning a consumption unit comes from LDG network, of which the results are shown in Fig. 12. The PIs constructed by MVE and GP are far from satisfaction since it accumulated iterative errors and also ignored the industrial features during modeling process. The Bootstrap only performs well on the first 30 points and then exhibits low specificity values. Although the traditional one-layer approach successfully forecasted the general trend of this data, the coverage still behaves sometimes in a poor manner. Also, given that the computing time is about 1 minute, which could possibly exceed the required prediction frequency, it is not suitable for application in a scenario of real-world production. Whereas the proposed method exhibits superior performance, which can be also demonstrated based on the statistics shown in Table 5.

TABLE 5 Statistics and Parameters for BFG Consumption of #3 Hot Blast Stove ( \alpha=0.1 )
Table 5- 
Statistics and Parameters for BFG Consumption of #3 Hot Blast Stove (
$\alpha=0.1$
)
FIGURE 12. - PIs construction results for BFG consumption of #3 hot blast stove: (a) MVE (b) Bootstrap (c) GP (d) Traditional GrC (e) Proposed method.
FIGURE 12.

PIs construction results for BFG consumption of #3 hot blast stove: (a) MVE (b) Bootstrap (c) GP (d) Traditional GrC (e) Proposed method.

C. Practical Running Instance

On basis of the proposed hierarchical GrC method for PIs construction, a software system is developed and implemented in the same steel plant of China. The coding language is C++ for the algorithm and C# the GUI, all of which are in Microsoft™.NET Framework 4.0. The data is collected by industrial SCADA system on-site and then stored in the database with the system of IBM™ DB2 9.0.

A practical running instance is depicted in Fig. 13. The prediction objects are listed in the boxes above and the statistics beneath. A figure showing constructed PIs are given at the bottom. Based on the prediction results, a warning is consequently activated and displayed on top stating ‘Current imbalance system: BFG’. Based on such reliable guidance, the operating staff can decide to either start scheduling work or cancel this alarm.

FIGURE 13. - A practical running instance of the developed software system.
FIGURE 13.

A practical running instance of the developed software system.

After applied this system on-site for 6 months, the PICP, PINAW and Interval Score has been improved by around 7%, 10% and 9%, respectively. The constructed PIs successfully provided helpful information for the gaseous scheduling operation.

SECTION V.

Conclusion

By taking into account of the practical requirements imposed on long-term PIs construction for byproduct gaseous energy system of steel industry, a GrC-based hybrid hierarchical method has been proposed where information granules were allocated at each layer of the architecture. Then the related optimization techniques were established with the consideration of both coverage and specificity criteria. The employment of industrial semantics in data granulation substantially improves the accuracy of the proposed method, while the utilization of parallel strategy effectively guarantees the efficiency. The experimental study involved various error statistics and produced the optimized PIs. It can be concluded that the proposed method came with better performance, which is helpful for energy scheduling and optimization work of steel industry.

Some possible future topics are worth pursuing. On the one hand, the optimization problem can be formulated as the multi-objective one. Under such circumstances, a suitable method is required for solving it with both satisfied accuracy and reasonable combination with industrial practice. On the other hand, the forecasted probability is related here to the one of only one-step ahead, as shown in (8). Such a mechanism can be also extended as n-1 tuple, which may lead to complex matrix operation. Besides, PSO can be replaced by other optimization alternatives, such as reinforcement learning, so as to avoid the complexity of determining parameters.

References

References is not available for this document.