Nomenclature
AbbreviationExpansionAC | Actor-Critic |
CNN | Convolutional Neural Network |
DL | Deep Learning |
DNN | Deep Neural Networks |
D&PL | Distributed and Parallel Learning |
DBN | Deep Belief Networks |
DDPG | Deep Deterministic Policy Gradient |
DOE | Design of Experiences |
DQN | Deep Q-Network |
DT | Decision Tree |
GMDH | Group Method of Data Handling |
GA | Genetic Algorithm |
GPR | Gaussian Process Regression |
IL | Incremental Learning |
JK | JackKnife |
KNN | K-Nearest Neighbor |
LDA | Linear Discriminant Analysis |
MAB | Multi-Armed Bandit |
ML | Machine Learning |
MCS | Monte Carlo Simulation |
NN | Neural Networks |
NSGA | Nondominated Sorting Genetic Algorithm |
PCA | Principal Component Analysis |
Probability Density Function | |
PCA | Principal Component Analysis k-Means |
PG | Policy Gradient |
QDA | Quadratic Discriminant Analysis |
ResNet | Residual Network |
RNN | Recurrent Neural Network |
RF | Random Forest |
RL | Reinforcement Learning |
RVM | Relevance Vector Machine |
SARSA | State-Action-Reward-State-Action |
SVD | Singular Value Decomposition |
SOM | Self Organised Maps |
SVM | Support Vector Machine |
SVR | Support Vector Regression |
TL | Transfer Learning |
UKF | Unscented Kalman Filter |
Introduction
Current key challenges of manufacturing processes can be summarized as (i) adoption of advanced manufacturing technologies, (ii) growing importance of manufacturing of high value-added products, (iii) increasing process complexity, uncertainty and dynamism, (iv) utilizing advanced knowledge, data science, and AI systems [1]–[3]. Among their challenges, the process of data collection could be more challenging and resource intensive, due to being costly, time consuming and compute–intensive. As such, the amount of data needed to build accurate models is often limited. System identification, decision making, and predictive analytics based on limited data, may reduce the production yields, increases the production costs or decreases enterprise competitiveness. In the case of limited data, the size of the dataset is not enough to train a reliable model. For example, when the size of the training dataset is less than the number of unknown parameters of the model, e.g. the unknown weights of an artificial neural network architecture, we face the limited data challenge and conventional learning tasks might not property work. In such case, one still needs to develop appropriate data models with small variance of forecasting error and good accuracy based on these small data sets.
On the other hand, in some cases one has to deal with big data, where the data produced by the system has big volume, variety, veracity and velocity. Big data is an ambiguous term to define in data sizes that are difficult to manage, observe, acquisition, store, process and analyses, using prevalent database tools. These processes are often too complex with highly dynamical uncertainties, demanding heavy computational effort to find a simple model. Machine Learning (ML) offers effective solutions to solve challenging issues in various industrial applications [4]. ML includes computer algorithms and statistical methods required for data-driven control, estimation, prediction, classification, or clustering. Although ML is effective in many ways, some of the existing ML techniques have some limitations, such as over-fitting, under-fitting due to the nature of the data, poor generalizability and poor long-term prediction ability [5], [6]. Hybrid ML technique can potentially capture more characteristics of complex systems to overcome these limitations. Hybrid ML works based on developing algorithms to couple model-based and data-driven learning system. Despite research in hybrid data-driven ML techniques, they are not yet widely used mainly because of high computational complexity [7], [8]. This manuscript aims to provide a review of common data modeling techniques for limited and big data based on ML approaches with list of advantages and disadvantages. It also provides an overview of some recent research works on data modeling techniques with limited or big data constraint for various industrial applications as illustrated in Fig.1.
Based on the advantages and limitations of the ML, the paper introduces an intelligent hybrid data-driven algorithm that is robust against limited and big data constrains to reduce the computational cost and increase the estimation accuracy. This manuscript has two main parts. The first part provides brief review of some well-known ML techniques. Then, we introduce a hybrid algorithm and apply it to a modeling task with limited data constraints.
This manuscript is organized as follows. Section 2 and 3 present modeling techniques for limited and big data scenarios. Section 4 introduces a novel intelligent algorithm for robust modeling of nonlinear process. Section 5 shows an industrial case study that can be modeled by the proposed intelligent algorithm. Conclusion remarks are provided in section 6.
Machine Learning Techniques to Model Limited or Big Data
One of the distinctions of industry 4.0 concept is data-informed decision making. In some applications, data collection can be a challenging and expensive task, leading to limited data problem. The level of required modeling accuracy depends on sample size. To make a reliable statistical test under small sample size, Design of Experiments (DOE) methods have been proposed, such as Response Surface, Taguchi and Factorial [9]. Development of DOE based on Taguchi methods to reduce industrial experimental tests requires reliable data modeling [10], [11]. In some other industrial applications, large-scale data is produced, and sophisticated machine learning needs to be employed to process the data. In the following sections, we review a number of machine learning techniques, from conventional to recent, that are often used in industrial applications.
To process, analyze, predict and support decision-making based on limited or big data, Machine Learning (ML) techniques can be used. ML techniques are able to observe, store and model data with high nonlinearity and uncertainty. ML can easily identify trends and patterns associated with black-box (or gray-box) in complex systems. There are different techniques in ML, and the choice of the method depends on factors, such as the nature of the dataset, the scope of problem, and the desired outcomes. Usual ML tasks include regression, modeling, prediction, classification and clustering. Generally, there are three major ML categories: supervised, unsupervised and reinforcement learning. Recently, semi-supervised learning methods are also increasingly developed and used in many applications.
Supervised learning works in preparing a model through labeled training data until the model achieves a desired level of accuracy on the training data. Supervised learning is usually performed when the final values of the output variable or the class labels are known, and one can produce an error function between the output of the model and that of the system. Supervised learning is used in classification (predicting a label) or regression classification (predicting a quantity). Some supervised learning techniques are: rule-based systems, regularization, Bayesian, ensemble, Neural Networks (NNs), instance-based, decision tress and explicit regression. Many of these supervised learning techniques are used in both regression and classification in industry cases.
Unsupervised learning is used to find patterns or hidden structures in datasets that have not been categorized or labeled. Unsupervised learning typically focuses on exploratory analysis, dimensionality reduction methods, and feature extraction. Examples of unsupervised learning include: Principal Component Analysis (PCA), k-Means, Linear Discriminant Analysis (LDA), Quadratic Discriminant Analysis (QDA), and Self Organized Maps (SOM). Reinforcement Learning (RL) differs from supervised and unsupervised learning and it works with data from a dynamic environment. The goal of reinforcement learning is to find the superlative sequence of actions that will generate the optimal outcome and controlling but not to cluster or label data. The way reinforcement learning solves this problem is by allowing a software called agent to explore, interact with, and learn from the environment (system). The idea is a trade-off between exploration and exploitation. Within the agent, there is a function that takes in state observations (the inputs) and maps them to actions (the outputs). This is the single function that will take the place of all of the individual subcomponents of the control system. In the RL nomenclature, this function is called the policy. Given a set of observations, the policy decides which action to take. The policies in RL algorithms can be divided by off-policy: PG, SARSA and on-policy: DQN, AC, DDPG and MAB.
Some of popular techniques of clustering, classification, regression and advanced machine learning are discussed in the following.
A. Classification
Classification is one of the most popular methods in ML with many potential applications in industry settings. Classification is a data mining technique to predict categorical class labels based on the observations. Appropriate models need to be constructed to define imperative data classes in classification. In order to build a classifier, one often divides the data into three parts: training data, test and validation. The training dataset is used to find unknown parameters of the model, which is verified using test data and validated using validation data.
If
The classifier can assign an appropriate label to all unlabeled patterns, i.e. allocate them to the most appropriate class. In order to improve the classifier robustness and generalizability, a number of approaches have been developed. Dimensionality reduction through feature selection is an effective technique to improve generalizability of classification tasks. Linear Discriminant Analysis (LDA) and Principal Component Analysis (PCA) are two frequently used feature extraction methods [12], [13]. There are many classification approaches each with their own pros and cons. One of the frequently used simple yet powerful classification methods is K-Nearest Neighbor (KNN) classifier [14]. Support Vector Machine (SVM) is a novel supervised machine learning which can be used for both classification or regression challenges. SVM builds a separating hyperplane that maximizes the margin between the two classes [15]. SVM have high prediction accuracy, non-parametric, robust to outliers and low prediction time [5]. However, some limitation of SVM are: high computational cost, poor uncertainty management ability and requires cross validation procedure to determine hyper-parameters [5]. Relevance Vector Machine (RVM) employs a Bayesian framework to infer the weights, with which the Probability Density Function (PDF)s of the outputs instead of point estimates can be obtained. RVM provides performance comparable to SVM, while utilizing arbitrary kernel functions with high sparsity and offering probabilistic predictions [16]. High sparsity means that a significant number of weights are zero, leading to more computationally efficient models. Advantages of RVM include ability of generating PDF directly, being non-parametric, and ability of realizing high sparsity and avoiding cross validation process. It has however some limitations including large volumes of data is required for modeling, huge time and memory are consumed during the training process, easily falling into a local optimum and potentially causing over-fitting [6].
A Decision Tree (DT) has a tree structure (flowchart) with numerous nodes and branches. It is a fast and easy method with decent performance in many classification tasks. Bayesian classification is a statistical model and learns the distributions of instance to predict class membership probabilities.
B. Regression
The regression technique is very close to classification technique and the difference is to find a pattern to determine numerical values. The regression modeling tries to find the relationship between a response or dependent variable
C. Clustering
Clustering technique can segment data into groups, based on data similarity. It is using to identify outliers and resulting groups may be the matter of interest. Clustering can be achieved by various algorithms and it is an iterative process (involving trial and error). Some of popular techniques clustering are: K-means, Fuzzy K-means, Hierarchical, NN, Gaussian Mixture. K-means is a partitioning method to partitions data into K exclusive clusters. Each cluster has a centroid (or center) and sum of distances from all objects to the center is minimized. Example neural network architectures for clustering are: (i) self-organizing maps, (ii) competitive layers. Gaussian Mixture is good when clusters have different sizes and are correlated and assume that data is drawn from a fixed number K of normal distributions. In general, clustering technique: (i) dose no method is perfect for data modeling (depends on data), (ii) process is iterative; explore different algorithms, (iii) beware of local minima (global optimization can help).
D. Ensemble Techniques
Ensemble techniques are mixture of numerous models to create a novel learning method with better performance than the individual classifiers. In this technique, the unseen data (test data) is passed to individual classifier, returning some votes. The ensemble technique revenues the final class prediction based on the popular of classification models votes (Fig. 2). This technique is appropriate when there is not enough data available for presenting the data distribution. The technique is a decent option for uncertainty of selecting the computational model. Ensemble techniques are also used when the classifier is not able to solve complex problems. The technique is used in many industrial applications, such as intrusion detection, malware fraud and remote sensing, speech, and identity recognition. Random Forest (RF) is a well-known ensemble technique that is a cluster of many decision trees that any tree is built by sampling with replacement [24], [25]. A recently proposed ensemble technique is Adaptive Boosting (AdaBoost) algorithm that can be used for regression and/or classification regression industrial problems.
E. Resampling Techniques
Resampling techniques are very prevalent methods because of their accuracy, robustness, simplicity and high generalizability.
Resampling generate new data by using difference methods without being correlated to theoretical distribution. This method used when data distribution is very limited or unknown [8]. Resampling generate many times new data with or without replacement. Bootstrapping and randomization methods are examples of resampling technique with and without replacement, respectively. Some popular resampling techniques are JackKnife (JK), Monte Carlo Simulation (MCS) and exact test methods. MCS is a repeated random sampling based on many possible scenarios to obtain numerical results and estimation.
F. Representation Learning
Representation learning is a technique to predict or classify unstructured data, which is useful in various ML tasks, such as dimensionality reduction. Representation learning determines a lower dimensional representation of capturing several input configurations from original dataset. It can offer a solution for big data through facilitating significant improvements in statistical and computational efficiency. The hidden representations of representation nodes inside the dataset search to recall core information for ML statistical processes to model full properties of the data, such as vertex content and topological structure. Following this, modeling and analytics tasks can be effortlessly used through vector-based and conventional machine learning algorithms.
G. Deep Learning
Deep Learning (DL) can automatically find hierarchical representations of data sets, which works on deep architectures in supervised and/or unsupervised learning strategies. By using these strategies, DL has the potential to capture complicated patters and highly nonlinear big data with large feature spaces. Different DL architectures, such as Deep Neural Networks (DNNs) and Recurrent Neural Networks (RNNs), can characterize the multi-layering of commonly narrow algorithms to contain numerous processing data layers [27].
DNN requires much more parameters than traditional systems, which brings huge cost during online evaluation. A new effort on DNN aiming at reducing the model size while keeping the accuracy improvements is using Singular Value Decomposition (SVD). SVD works on the weight matrices in DNN and then restructure the model based on the inherent sparseness of the original matrices. After restructuring we can reduce the DNN model size significantly with negligible accuracy loss [28].
Recent architectures proposed for DL, such as Convolutional Neural Networks (CNNs), Deep Belief Networks (DBNs) and Group Method of Data Handling-type Neural Networks (GMDH-type NN) can process difficult learning algorithm for classification and regression tasks. Different improvements in CNN architecture can be categorized as parameter optimization, regularization, and structural reformulation. Depending upon the type of architectural modification, CNN can be broadly categorized into seven different classes namely; (i) spatial exploitation, (ii) depth, (iii) multi-path, (iv) width, (v) feature map exploitation, (vi) channel boosting, and (vii) attention as shown in Fig.3. [29]. Some other DL include Inception [30], deep Residual Network (ResNet) [31], and VGG16 [32]. Szegedy et al. [30] proposed Inception-v4 that combine Inception architecture with residual connection to accelerate the network training. The core idea of ResNet is recommending a so-called “identity shortcut connection” that skips one or more layers [31]. Moreover, VGG16 is a convolutional neural network model proposed by Simonyan and Zisserman [32] to improve AlexNet by replacing large kernel-sized filters with multiple smaller kernel-sized filters one after another. GMDH is a family of inductive algorithms for computer-based mathematical modeling of multi-parametric datasets that features fully automatic structural and parametric optimization of models. GMDH-type NN is a way of using self-organizing networks and shown to result in successful applications in a broad range of areas [33].
H. Distributed and Parallel Learning
Distributed and Parallel Learning (D&PL) technique uses a parallel processing through distributed network, assigning learning processes to be efficient of ML algorithms. The technical limitation of classical ML can be addressed by D&PL technique, which naturally need the whole datasets to be set within a local memory. D&PL techniques use the divided data in vertical (by features within the training set) or horizontal (by instances within the training set) fashions. In most of the cases the division are worked based on horizontally, as it is the furthermost natural selection of the applications. Although D&PL architectures can be leveraged in classification and regression tasks, classification has found further wide-spread application due to simply way for implementation.
I. Transfer Learning
Transfer Learning (TL) influences heterogeneity of the dataset, which is typical for big data processing and analytics. This is determined by velocity characteristics of big data, and training of new models can use important resource and time. A wide range of models can be trained using TL in a more efficient manner through classifying a group of domains. This approach can be used for regression, classification or clustering tasks. Fig. 4 shows a typical set of training performance accuracy, for ML models with and without TL. The model with TL not only starts at a higher performance, also shows faster converges towards an optimal solution. This can be attributed to the old model being closer to the solution as a starting point, rather than using random initializations, so that through the process of updating via gradient descent through back-propagation, or other applicable algorithms, fewer iterations are required to converge to the solution for the new task.
J. Active Learning
Active Learning (AL) searches to shift ML strategies for big data from huge volumes of unlabelled data, to that which uses labelled data. AL can be used for broad applications of supervised and semi-supervised ML scenarios. Obtaining labelled data is time consuming, and AL can reduce the cost associated through classifying a subsection of points in the original data distribution. Fig. 5 provides an intuitive example diagram of AL. The process of AL classifies key points in a dataset, which if labelled. Similar techniques approaches can be taken for regression modeling.
Active learning technique: (a) class 1 (red) labelled instance belonging, with the other points depicted in grey. By using active learning, two points are recognized in (b) white circles and labelled as class 1 (red) and class 2 (blue). (c) using k-Means machine learning techniques is used to more easily classify the other points.
K. Kernel-Based Learning
Kernel-based learning techniques use some nonlinear kernel functions, along with well-known methods such as SVMs, to transfer the input-space of a dataset to a higher-dimensional feature-space. This method permits for better expressive power and better chance to perform various analyses with the same dataset [34]. However, this often requires significantly higher computational complexity as compared with the cases with linear kernels. It may appear counterintuitive to make a higher dimensional illustration for a big dataset. However, these methods have been shown successful in many applications. One often must make a trade-off for maximizing the computational efficiency and minimizing the impact of increasing the size of the dataset. Fig. 6 illustrates the kernel mapping process to simple prediction models.
L. Multi-Objective Learning
Learning algorithms in ML can be divided into three categories: single-objective learning, scalarized multi-objective learning, and Pareto-based multi-objective learning [35]. There are two main weaknesses of scalarized multi-objective learning: (i) the determination of an appropriate hyperparameter
M. Hybrid Data-Driven Learning
As stated in pervious sections, ML techniques have some limitations, and a hybrid ML technique can potentially capture more characteristics of complex systems to overcome these limitations. Recent studies of hybrid models combining different ML techniques have shown promising results [22], [37]. There are various types of frameworks to develop hybrid models, and it is unknown that which hybrid model can perform the best in data-driven learning. Some recent hybrid data-driven methods are: a combination of NN and EKF, GPR technique with ARD kernel [21], SVM and SVR models optimized by GA [43], RVM with incremental learning [44]. ANNs have become popular practical solutions of engineering problems. In the following we introduce a novel algorithm of this type and apply it to an ML task under limited data constrains.
A Novel Algorithm for Robust Modeling of Nonlinear Processes
In industry 4.0 applications, an offline approximate model needs to be developed that can express the relationship between inputs and outputs of industrial process. In fact, off-line modeling represents an initial identification model of the complex nonlinear system. This model is then updated using limited data obtained from the online process. The online adaptation requires real-time algorithms while high runtime algorithms, such as evolutionary optimization methods can only be used for offline modeling [15]. To address the needs of both offline and online parts, we propose a new algorithm that builds a robust model based on the input-output data using an offline deterministic model and online updating parts.
In the first part, the structure (topology) of the NN and the initial values of its coefficients are extracted based on deterministic input-output data. In other words, the uncertainties included in the observed data are not considered and the modeling is done based on nominal values of input-output data. It is obvious that there are many sources of uncertainties, such as human error, laboratory equipment errors, or errors due to changes in environmental factors, which can affect the data. In order to obtain a robust model, all these sources of uncertainties must be considered as a percentage of the variation around the reported nominal value [45]. In the second part, the coefficients of the obtained model are updated using MCS in combination with UKF to achieve a robust model. In the following, we give details of these steps.
A. Off-Line Modeling Based on Deterministic Data
In the off-line deterministic modeling part, first the modified Taguchi DOE is used to make a reliable statistical test under small sample size [16]. Then, a polynomial model is trained explaining the relationship between the inputs and outputs of the industrial process. To this end, one can use GMDH-type NNs with multi-objective optimization and SVD to overcome both overfitting and singularity. In the proposed algorithm, we used the fuzzy adaptive mutation proposed by some authors in to reach global optimum solutions [46]. As shown in Fig. 7, in the first part (left), a primary model is extracted, and the derived model is used as an input to the second part that builds a more robust model. This model is not sensitive to properties of data and can have reasonable performance with both limited and big data.
A novel algorithm to create a robust model for industrial processes with available input-output data.
As shown in Fig. 7, the input-output data are divided into two sets of training and prediction. The training set, which consists of 60%t of all inputs–output data pairs, is used for training the neural networks model. The prediction set consists of 40% of the data that are indeed unforeseen input–output data samples during the training process and are used for testing the performance of the trained model in correctly capturing relationship between he inputs and outputs of the process. For an acceptable performance of GMDH type NN, the topology and the polynomial coefficients of each neuron should be properly determined. Here, SVD is used to determine the polynomial coefficients of each neuron. Multi-objective Genetic Algorithm (GA) is used to find optimal topology of GMDH-type NN. In multi-objective optimization, both modeling and prediction errors are simultaneously considered as objectives. Using Nondominated Sorting Genetic Algorithm (NSGA)-II [15], Pareto optimum non-dominated models are obtained from the point of view of these two objective functions. An alphabetical chromosome is used to coding the structure and topology of general structure of GMDH (GS-GMDH). In the conventional GMDH, each neuron is building using combination of two neurons in adjacent layer, while in the GS-GMDH, all neurons in the all previous layers used to build a new neuron. The alphabetical coding of such GS-GMDH is shown in Fig. 8. In a GS-GMDH neural network, neuron ac in the first hidden layer is connected to the output layer by directly going through the second hidden layer. Therefore, it is now very easy to notice that the name of output neuron (network’s output) includes ac twice as acac. In other words, a virtual neuron named ac has been constructed in the second hidden layer and used with abac in the same layer to make the output neuron abacacac as shown in Fig.8. The evolutionary process starts by randomly generating an initial population of alphabetical chromosome, each as a candidate solution. Then, using the crossover and mutation and tournament selection, the entire population of symbolic strings improves gradually based on training and prediction errors.
Indeed, in order to achieve high modeling accuracy, the polynomial degree is usually increased, which reduces the generalizability (or prediction capability) of the model, often due to overfitting. Finally, the designer chooses the trade-off obtained from Pareto non-dominated solutions by compromising between these two objective functions. A polynomial model among the inputs and outputs of the industrial process is developed by using the selected GMDH structure. The derived topology of NN is used as the input of the second part of the proposed algorithm.
B. Online Update to Make the Model Robust
In the second part (right side of Fig. 7) of the proposed algorithm, the obtained model in the first part is modified to enhance its robustness. To this end, the coefficients of the derived model are updated using UKF to capture the uncertainties in the input-output data. In order to obtain a robust model, all sources of uncertainties are considered as a percentage of the variation around the nominal values of main data table. To take into account the uncertainties, \begin{equation*} E_{j}=\frac {\sum _{i=1}^{k} \left ({y_{model}-y_{actual} }\right)^{2} }{k},\quad j=1,2,\ldots,N\tag{1}\end{equation*}
The goal of UKF is to obtain the network coefficients that minimize the mean and variance of network error. Indeed, minimizing the mean of error (\begin{equation*} F=Mean\left ({E_{j} }\right)+var\left ({E_{j} }\right)\tag{2}\end{equation*}
The UKF filter equations for determining the GMDH-type neural network coefficients are as follows:
(
)$i$ The weight vector and its covariance matrix in network are initialized with:
where\begin{align*} \hat {a}_{0}=&E\left [{ a_{0} }\right] \tag{3}\\ P_{0}=&E\left [{ \left ({a_{0}-\hat {a}_{0} }\right)\left ({a_{0}-\hat {a}_{0} }\right)^{T} }\right]\tag{4}\end{align*} View Source\begin{align*} \hat {a}_{0}=&E\left [{ a_{0} }\right] \tag{3}\\ P_{0}=&E\left [{ \left ({a_{0}-\hat {a}_{0} }\right)\left ({a_{0}-\hat {a}_{0} }\right)^{T} }\right]\tag{4}\end{align*}
is the vector of coefficients of neural network and$a=\left \{{a_{1},a_{2},\ldots,a_{s} }\right \}$ is the number of coefficients.$s$ (
)$ii$ The time-update equation are:
where\begin{align*} \hat {a}_{k}^{-}=&\hat {a}_{k-1} \tag{5}\\ P_{a,k}^{-}=&P_{a,k-1}+R_{r,k-1}\tag{6}\end{align*} View Source\begin{align*} \hat {a}_{k}^{-}=&\hat {a}_{k-1} \tag{5}\\ P_{a,k}^{-}=&P_{a,k-1}+R_{r,k-1}\tag{6}\end{align*}
indicates the time.$k$ (
)$iii$ The sigma points and measurement update according basic equations of UKF.
Algorithm 1 show the pseudo-code of the proposed algorithm and summarizes the steps.
Algorithm 1 The Proposed Algorithm for Robust Modeling of Data
Part A: offline modeling:
A1. Enter matrix of experimental data set D[X(m,n) Y(m,1)] %
A2. T=(1:t,n) and V=D(t+1:m,n) %Making Training
A3. Pa=
A4.
Part B: on-line updating:
B1. DT =
B2.
B3.
B4.
B5. if
B5.
Industrial Case Study
To examine the performance of the algorithm proposed above, an industry case study is considered. To this end, 27 samples (L27) modified Taguchi DOE method (Table 1) was constructed at Carbon Fiber production line. The PAN fiber inputs were chosen as the controlling parameters based on the feasible processing window of the pilot plant: temperatures of 227, 230, 233, and 236°, space velocity of 20, 25, 30 and 35 m/h, and stretching-ratio of 1.0, 2.0, 3.0 and 4.0%. The measured output was a physical property. To reduce the number of experiments, the Taguchi design was modified by adding some marginal operating parameters as listed in Table 1.
NSGA-II and SVD are employed to design optimal topology and to find optimal values of the polynomial coefficient of GMDH, respectively. It is clear from this limited input-output data that the inputs are not in the same order. For example, \begin{equation*} x_{i}^{M}=1+\frac {\left ({x_{i}-x_{i}^{min} }\right)}{\left ({x_{i}^{max}-x_{i}^{min} }\right)},\quad i=1,2,3\tag{7}\end{equation*}
Two different sets of training and validation have been used in order to show the prediction capability of the optimized GMDH-type neural networks. The training set, which is used for training, is composed of 18 out of 30 input–output data pairs. The validation set, consisting of 12 unforeseen input–output data samples during the training process, is solely used for checking the performance of the trained model. GMDH-type neural networks are employed to fit a polynomial curve to the output of the model as a function of effective input parameters. Considering two objective functions (namely, Training Error, TE, and Prediction Error, PE), NSGA-II is used for the Pareto multi-objective optimization of the GMDH-type neural network. The values of 80 for population size, 0.9 for crossover probability, 0.1 for mutation probability, and 300 for number of generations, were found to be suitable using a trial and error study. The resultant Pareto front, representing a set of non-dominated superior solutions, is shown in Fig. 9. In this figure, points A and B designate solutions with the best PE and TE, respectively. As compared with solution ‘A’, Point C shows a very small increase in the TE value (about 0.7%) but a substantial improvement (about 83%) in PE. Consequently, when the two objectives are simultaneously taken into consideration, point ‘C’ may be considered as a best trade-off solution.
The structure of the neural network corresponding to the design point C obtained by the genetic algorithm NSGA-II is shown in Fig. 10. Reasonable behavior of GMDH-type neural networks model in the training and validation data are shown in Fig. 11. Moreover, Fig. 12 shows a correlation coefficient of 92.36% between the actual value and the predicted one, representing a reasonably accurate model. for the obtained model. Table 2 shows the values of TE and PE corresponding to points A, B, and, C.
Performance of the optimized GMDH model on both actual and validation data sets corresponding to optimum point C.
Correlation coefficient between the actual values and predicted ones corresponding to optimum point C.
To check the robustness of the obtained model, a ±5% variation around the nominal values reported in Table 1 are considered and 1000 random data tables are constructed using MCS. Fig. 13 shows the upper and lower bounds of the predicted values using this model. Extensive changes relative to the nominal values show poor robustness of derived model. UKF is used to improve the robustness of the derived model. It must be noted that the structure of GMDH as shown in Fig. 10 is employed and only the coefficients of equations are update by UKF. The equations governing the neurons of this model derived by UKF are as follows: \begin{align*} y_{1}=&1.29-0.01x_{1}-0.0546x_{2}+0.0135x_{1}^{2} \\&+\,0.0156x_{2}^{2}-0.00718x_{1}x_{2} \tag{8a}\\ y_{model}=&-4.24+7.787y_{1}-0.0129x_{3}-2.71y_{1}^{2} \\&+\,0.003x_{3}^{2}+0.0022y_{1}x_{3}\tag{8b}\end{align*}
Upper and lower bound of predicted values by deterministic model on 1000 input-output random tables.
Fig. 14 shows the upper and lower bound of the predicted values. It is seen that employing UKF is effective in reducing the tolerance margins. The intelligent algorithm results assisted the carbon fiber production line in alleviating the reduced number of experiments, cost and limited data. It should be noted that the same procedure can be used on big data to find a robust model among inputs and outputs of any nonlinear industrial process. Also, DL methods, such as CNNs, DBNs and RNNs that mentioned in section II can be used for very large (terabyte) data sets. One of the most immediate and impactful outcomes of technological evolution is the vast advancement in automation through data modeling. The intelligent algorithm for robust data modeling continues to accelerate, so will automation. Although our model aimed to increase the estimation accuracy, the computational effort is also reduced by 2 (ms) in offline modeling and 3 (ms) in online modeling.
Upper and lower bound of predicted values by robust model on 1000 input-output random tables.
Conclusion
In this paper, some of existing machine learning techniques that can deal with issues commonly observed in many industrial applications and arising from having limited or big data constraints, were reviewed. These techniques can be used to efficiently address various challenges, such as increased complexity, uncertainty, and dynamism, related to data processing and analytics of manufacturing processes. Machine learning is a powerful tool for many industrial applications, and its importance is further enhanced due to increased use of data collection and sensor technologies as part of industry 4.0 implementation, leading to generation of valuable data sources. We proposed a new intelligent algorithm that can model limited industrial data. We applied the proposed algorithm to a real industry case study with data collected from a Carbon Fiber production line. Pareto multi-objective optimization including two objective functions was employed to design the topology of GMDH and overcome overfitting. Using the proposed algorithm, a deterministic model was obtained that showed a very accurate fit to the actual data. We further used UKF approach to improve robustness of the model against uncertainties. The intelligent algorithm assisted the carbon fiber production line in alleviating the reduced number of experiments, cost and limited data. The proposed algorithm has the potential to be used in any other industrial settings for which the aim is to obtain a reliable and robust model between its inputs and outputs.