Introduction
Epilepsy is a chronic disease of the brain characterized by repeated seizures and is an unconscious movement that involves part of the body or the whole body [1]. Efforts to detect the disease early will help determine the cause of epilepsy. EEG (electroencephalogram) is generally used to check whether a patient is having an epileptic seizure, determine the type of seizure, or even a trigger factor for epilepsy. However, this diagnosis has not been able to understand the etiology and has the low spatial resolution to detect the brain abnormality as the cause of epilepsy [2]. Magnetic resonance imaging (MRI) can detect changes in the microstructure of the source of epilepsy because it has a relatively high spatial resolution. Therefore, the study in [3] recommended structural MRI as the standard of investigation in epilepsy patients. Identification involving several sequences of MR images will be advantageous in detecting the brain abnormalities as a source of epilepsy (e.g., hippocampal sclerosis, cortical dysgenesis, brain tumor, cerebral vascular, and others). The HARNESS-MRI protocol shows the advantage of each sequence of MR images in identifying the brain’s structural abnormalities (microstructural changes) [4]. However, each sequence of MR images provides different benefits in identifying any structural brain abnormality, as reported in [5]. Therefore, increasing the performance of the automatic method in processing MR images will help improve the sensitivity in the epilepsy identification.
Several researchers have previously reported the detection/ classification results of epilepsy based on brain structure abnormalities (e.g., temporal lobe epilepsy, focal cortical dysplasia). Most of the researches they do are for the detection or classification of only one abnormality type, e.g., detection or classification abnormalities in temporal lobe epilepsy shown in [2], [6]–[9], focal cortical dysplasia (FCD) is reported in [10]–[12]. The results of studies in [6] have shown the use of one sequence of MR images to classify microstructural abnormalities in temporal lobe epilepsy (TLE) against non-TLE. Visual assessment of two sequences T1 and T2, has also been used for the diagnosis of hippocampal sclerosis (HS) in patients with mesial temporal lobe epilepsy (MTLE) [7]. In the case of FCD lesion detection, studies in [10] and [11] have reported the use of T1-weighted sequence as input for detection. Meanwhile, the use of two sequences (T1-MPRAGE and T2-FLAIR) for FCD detection is also discussed in [12]. These two abnormalities constitute the most significant percentage of epilepsy patients, as reported by Wellmer et al. [13]. A diagnosis of other types of brain abnormalities also uses a specific sequence of MR images to get the best results. Therefore, specific imaging protocols are required to identify a structural abnormality [13]. The initial diagnosis of whether a person has structural abnormalities of the brain or not must involve several sequences of MR images. Involving these many sequences of MR images in manual diagnosis is a maximal effort, but it is complicated and time-consuming. Therefore, the need for automated detection or classification with reliable methods, in this case, is essential. However, the automated detection/classification of epilepsy involving multiple sequences of MR images and types of abnormalities as simultaneous has not been investigated. Fig. 1. shows most of the previous studies, only using one or two sequences of MR images for identification/detection /classification of one brain structural abnormality type. Consequently, the studies involving only one or two sequences of MR images and a type of abnormality have drawbacks such as: not being able to identify/detect/classify epilepsy caused by other types of abnormalities at initial diagnosis and can decrease sensitivity.
Most of the previous studies and our research focus on classifying brain structural abnormalities that cause epilepsy.
Based on the weaknesses of the previous studies and the diagnostic protocol for each type of brain abnormality in [4], [5], and [13], the initial diagnosis needs many sequences of MR images to see the various possible abnormalities in each of these sequences. Therefore, we propose the method for the two-class classification of brain structures (epilepsy, non-epilepsy) by involving several sequences (multi-sequence) in the training process. Fig. 1 illustrates the focus of our study using multi-sequence of MR images with some types of brain abnormalities that cause epilepsy in training.
The multi-sequence of MR images impacts high data variability that it greatly affects the classifier’s performance in identifying/classifying brain structural abnormalities. We use a convolutional neural network (CNN) as a classification method that has proven powerful for image data [14] and a CNN model ensemble technique to improve classification performance. The CNN model in this study is built by considering the low model parameters and the limited learning data, and maintaining the resulting performance. These CNN models serve as base-learner models in the ensemble technique. We use the ensemble technique to improve classification accuracy and reduce the variability of the results [15], [16]. The meta-learner stage using machine learning is beneficial in improving classification performance. Support vector machine (SVM) is one machine learning that has proven reliable in classifying brain abnormalities that cause epilepsy [2], [6]. Therefore, we propose an ensemble scheme for these CNN models using SVM at the meta-learner stage based on an axial multi-sequence of MR images (emsCNN-SVM) to improve classification performance. For that, we have conducted several experiments to evaluate the proposed emsCNN-SVM. The main contributions of this research are as follow:
We propose axial multi-sequence of MR images approach to classify brain structural abnormalities causing epilepsy against non-epilepsy brain structures. Axial multi-sequence of MR images involved in the learning process contains some types of brain structural abnormalities for epilepsy patients and some types of brain structures for non-epilepsy patients.
We build the CNN model based on the multi-sequence of MR images as a base-learner model by considering the low parameter model and overfitting on the limited dataset to classify brain structural abnormalities that cause epilepsy vs. non-epilepsy brain structures.
We propose a scheme CNN models ensemble on the base-learner with SVM on the meta-learner. It involves the output of the base-learner model and the predictions combination of these models, thus, it improves the performance and reduces variability in the classification of brain structural abnormalities that cause epilepsy vs. non-epilepsy brain structures.
The remainder of this paper is structured as follows: Section II discusses a survey of relevant previous research work on the classification of brain structural abnormalities that cause epilepsy. Section III describes the dataset of the experiment and the proposed method. The experimental scenarios and results are in Section IV. Section V discusses the experimental results. Finally, Section VI states the conclusions and suggestions for future research.
Related Work
In this study, we classify brain structural abnormalities as cues that cause epilepsy vs. non-epilepsy subjects based on an axial sequence of MR images. Therefore, this section explores the relevant current research work in the literature from two prospective studies: first, the classification of brain structural abnormalities using machine learning, and second, the classification using CNN.
Classification of brain structural abnormalities that cause epilepsy using machine learning is reported in [6], [10]–[12], and [17]. Del Gaizo et al. [6] used diffusion MRI sequence to classify temporal lobe epilepsy (TLE) vs. non-TLE. They determined scalar diffusion from diffusion kurtosis imaging (DKI). Then, they used the weighted average of support vector machines models to classify TLE vs. non-TLE based on the scalar diffusion input. Their method yielded an accuracy of 68% (fractional anisotropy), 51% (mean diffusivity), dan 82% (mean kurtosis). The use of SVM was also reported by Wang et al. [17] to detect mesial temporal sclerosis (MTS) based on T1-weighted sequence. The detection begins with the segmentation of tissue (grey matter, white matter, and cerebrospinal fluid (CSF)), and hippocampus, followed by feature extraction of volume, shape, and ratio of CSF. The experimental results showed that their proposed technique provides promising performance for MTS. Studies on the detection of abnormalities in TLE are also reported in [7]–[9], only not using machine learning in its detection. Another abnormality classification, FCD, was performed by Qu et al. [10] using a multiple classifier fusion and optimization (MCFO) feature-based voxel-based morphometry (VBM) on T1-weighted MRI sequence. Their proposed MCFO involved several classifiers and minimized false positives using F-scores. The testing results with this method showed a decrease in false positives. The same study was conducted by Jin et al. [11] using T1-weighted sequence produced by three different magnetic resonance imaging scanners. They determined the morphological and intensity features as inputs for the non-linear neural network classifier. Their experiments at a threshold of 0.9 obtained an optimal sensitivity of 73.7% and a specificity of 90% in FCD detection. Mo et al. [12] also performed FCD lesion detection by combining quantitative multimodal surface features with an artificial neural network (ANN) to assess its clinical value. The testing results showed that the method’s accuracy, sensitivity, and specificity were 70.5%, 70%, and 69.9%, respectively, which outperformed the unimodal classifier.
For the classification of brain structural abnormalities (epilepsy) by applying deep learning, some of them are reported in [2], [18], and [19]. Huang et al. [2] identified epilepsy using the DKI image. They segmented the hippocampus and used transfer learning VGG16 to get DKI image features. This feature was an input support vector machine (SVM) to classify epilepsy (hippocampus) vs. normal control. Their proposed method obtained the best classification accuracy of 90.8%. Torres-Velazquez et al. [18] used multimodal MRI to classify TLE. They introduced the Multi-Channel Deep Neural Network (mDNN) for TLE classification. Their experiments showed the potential of the mDNN approach to combine multiple data sets for TLE classification. Another abnormality classification (juvenile myoclonic epilepsy/JME) was conducted by Si et al. [19] using CNN-based transfer learning. They used diffusion MRI sequence to detect subtle changes in white matter. Using three CNN models, the experimental results showed that inception_resnet_v2 based transfer learning is better than Inception_v3 and Inception_v4 in classifying JME, with a classification accuracy of 75.2%.
It is considering the results of previous studies that combined extracting MR images features and machine learning to classify brain structural abnormalities as epilepsy cues. Most of these studies proposed the method to obtain MR image features that represent or combine some features [10]–[12], [17]. The researchers usually focused on one or two sequences of MR images to get these features. Besides, they typically used one classifier [6], [11], [12] or several classifiers [10] to get the best performance in the classification. These efforts are reasonable, but the best classification performance is not necessarily obtained by using the features that are considered representative. This approach can be ineffective and time-consuming, especially in studies involving multiple sequences of MR images and some types of abnormalities. Therefore, a reliable classifier is needed to solve this problem, such as the CNN classifier [2], [19]. A study in [2] showed that CNN is a robust classifier with a convolution process that will optimally perform feature extraction based on the classification results’ loss function. The main problem we often encounter is that the dataset of MR images for epilepsy cases is relatively limited, consequently many researchers rarely use CNN because it will have an overfitting effect. Several techniques can be used to solve an overfitting, such as augmenting data [20], [21], architectural design with low parameters [16], and validation techniques in learning.
In a previous study in [22], we have reported the CNN model with low parameters for epilepsy classification based on EEG signals. To overcome the limitations of the dataset in training, we divided the EEG signal into many segments (multi-segment) and converted it into a spectrogram image. This study used a CNN model and decided on the final classification results using majority voting based on the model predictions in each segment. Although the method in this study yielded good performance, it did not necessarily obtain good performance for the epilepsy classification based on MR images. These results occurred because the signal pattern was different from MR images.
In this study, we included multi-sequence of MR images for the brain abnormalities classification (epilepsy) against non-epilepsy to increase the performance (accuracy, sensitivity) and to overcome the limitations of the dataset. Involving multi-sequence of MR images on CNN will have high variability in results [23] so that the ensemble technique of some CNN models is a solution to improve accuracy and can reduce variability [15], [16]. Therefore, in this study, we propose ensemble CNN that differs from the existing methods in some aspects: (i) involving multi-sequence MR images and some types of brain abnormalities causing epilepsy, (ii) using some CNN models with low parameters as base-learner models, (iii) involving the output of the base-learner models and combinations of predictions as input to the meta-learner.
Materials and Methods
A. Dataset Acquisition
We investigated several T1 and T2 sequences of 37 epilepsy patients. The patients consisted of 17 males and 20 females, including 48.6% with an additional history of epilepsy and seizures and 51.4% with an additional history of stroke, tumor, traumatic, temporal lobe, left focal epilepsy, syncope, cerebral edema, syncope, and hemianopia. Dataset sequences of MR images were obtained from Universitas Airlangga Hospital (Rumah Sakit Universitas Airlangga-RSUA), Surabaya, Indonesia, using a 1.5 T MRI scanner from 2018 to 2020. We have obtained the ethical clearance to use this retrospective dataset for research from the hospital’s ethics committee. For the non-epilepsy dataset, we used nine healthy subjects and free of neurological disease, seven tumor patients, six patients of stroke, and five meningioma patients.
In this study, MRI sequences were acquired from each subject for the axial plane, including T1, T2-FLAIR, T2-FSE, DWI, T2-FLAIR PROPELLER, T2 PROPELLER. All MRI sequences were obtained with 2D acquisition type, slice thickness 5 mm, matrix
From each sequence and a slice of epilepsy and non-epilepsy subjects, it was then converted into an MR image. Each image (frame) was selected and collected in an image dataset for experimental purposes. The total MR images used for the experiment were 4231, including 2515 epilepsy MR images and 1716 non-epilepsy MR images, as shown in Table 1.
B. Data Pre-Processing
The input image for the CNN model must be the same size. Therefore, resizing the image of each slice is an essential pre-processing step. In this work, we decided to use a fixed size of
C. Convolutional Neural Networks (CNNs)
A convolutional neural network is a deep learning model often applied to visual images and is proven to have high accuracy [14], [25]. There are five CNN architectures proposed in this study, each of which has several layers, namely input layer, convolutional layer, activation layer, pooling layer, fully-connected layer, and output layer. We name the five CNN architectures as msCNN1, msCNN2, msCNN3, msCNN4 and msCNN5, as shown in Fig. 3. The CNN architectures are built to classify brain structural abnormalities causing epilepsy vs. non-epilepsy brain structures based on axial multi-sequence of MR images.
Proposed CNN architectures with multi-sequence of MR images input: (a) msCNN1 (b) msCNN2 (c) msCNN3 (d) msCNN4 (e) msCNN5.
In this study, we built the architectures of CNN with different structures for the epilepsy classification. In 2D/3D, areas of structural abnormalities in the brain have different sizes between subjects (patients). In addition, the involvement of some brain abnormalities types in this study also causes higher variability in the shape and size of brain structural abnormalities. Therefore, we decided to build some CNN models with different structures to strengthen the classification of brain structural abnormalities that cause epilepsy.
1) Input Layer
In this study, the input layer is the layer to enter the normalized sequence of MR images in the pre-processing stage into the convolution process. The input image size for each proposed CNN architecture is
2) Convolutional Layer
In this layer, the convolution process will be carried out on the input image of each MR sequence or input from the previous layer by shifting a filter. This process produces a feature map or image sequence pattern from a low to a high level [22]. Therefore, this convolution process will use many feature maps to obtain the characteristics of an image [26], [27].In this study, the convolution operation on the five proposed CNN models can be written as follows:\begin{equation*} Z_{i}=f\left ({W_{i}X+b_{i} }\right),\quad i=1,\ldots,5\tag{1}\end{equation*}
3) Activation Layer
In this layer, an unsaturated activation function is applied to improve the nonlinearity of the decision function. In this study, the activation function used is the rectified linear unit (ReLU) [26], and for each model, it is presented in the following equation:\begin{align*} \hat {Z}_{i}(Z_{i})=\begin{cases} \displaystyle Z_{i},& Z_{i}\ge 0\\ \displaystyle 0,& Z_{i} < 0,\\ \displaystyle \end{cases}\quad i=1,\ldots,5\tag{2}\end{align*}
4) Pooling Layer
The pooling process at the layer aims to reduce the spatial size of the representation, reduce computations, and prevent overfitting. In this study, the pooling used is max-pooling [30], with the filter size of each proposed model being
5) Fully-Connected Layer
After the convolutional layer and max-pooling layer is the fully-connected layer. In this layer, the feedback process is carried out by refreshing the weights and biases against the previous layer and reducing the loss of feature information. The feature matrix of the prior layer process is converted into a feature vector (flatten) before the classification process. In this study, several proposed CNN architectures have different fully-connected layers. msCNN1 and msCNN2 have fully connected layers with all feature vectors (flatten) connected to the output layer, and 0.5 (50%) dropout is added. Meanwhile, for msCNN3, msCNN4 and msCNN5 all have fully-connected layer 1 with dropout 0.5 process and fully-connected layer 2, which is fully connected with output layer. The number of neurons in the hidden layer for the msCNN3 architecture is 32 with the ReLU activation function, while msCNN4 and msCNN5 have 64 neurons with the same activation function. In this study, the addition of a dropout process for fully-connected layer is proposed to prevent overfitting.
6) Output (Classification) Layer
After the fully-connected layer, the results from this layer forward to the output (classification) layer to display the classification results, accuracy, and loss function. The loss function used in each proposed model is binary cross-entropy, while the activation function for classification is softmax. The softmax function of each proposed model can be written as in the following equation:\begin{align*} y_{ik}\left ({\tilde {Z}_{i} }\right)=\frac {\mathrm {exp}\left ({\tilde {Z}_{ik} }\right)}{\sum \nolimits _{j=1}^{C} {exp\left ({\tilde {Z}_{ij} }\right)}},\quad k=1,\ldots,C;~~i=1,\ldots,5\!\! \\\tag{3}\end{align*}
In addition to the CNN architecture proposed in the scope of the study, we used three CNN architectures presented in the literature. The three architectures were CNN in [22], VGG16 [31], and ResNet50 [32], which we used as a comparison against the architectures proposed in this study. We transferred the architectures and trained these architectures with the dataset used in the study. CNN in [22] has a simple architecture and consists of three convolution layers with an output layer of 2 (epilepsy and non-epilepsy).The VGG16 model has 19 layers arranged sequentially, consisting of 16 convolutional layers and three fully-connected layers. The input image dimensions of the original VGG16 architecture are
D. Ensemble Convolutional Neural Networks
In this study, we used ensemble learning on the classification results of each proposed CNN model to improve performance and reduce the variability of the classification results. One type of ensemble learning is stacking or stacked generalization, which includes two main parts, namely base-learner and meta-learner [15], [33]. In this study, the models of msCNN1, msCNN2, msCNN3, msCNN4 and msCNN5 are the base-learner models. While the support vector machine (SVM) is the meta-learner model. In our proposed scheme, between the base-learner and meta-learner, there is an ensemble process of base-learner models with a combination of predictions. The process is carried out by combining the prediction results of the base-learner model using majority voting, weighted average, and weighted majority voting [33]. The proposed scheme involving the combination of predictions is shown in Fig. 4.
Proposed scheme: the ensemble of CNN models using SVM with the input of the CNN predictions, the softmax output, and the combination of predictions.
The process of our proposed scheme begins with training on each base-learner model to get the \begin{align*} g_{i}=\mathop {\text {argmax}}\limits _{k}\left ({y_{ik} }\right),~~g_{i}\in \left \{{0,1 }\right \},\quad i=1,\ldots,5;~~k=1,2 \\\tag{4}\end{align*}
This study uses majority voting to get predictive results based on the majority vote. If the msCNN1, msCNN2, msCNN3, msCNN4 and msCNN5 models are as neurologists (experts), the final decision will be based on the results of the majority with a vote exceeding 50%. For example, it is known that \begin{equation*} h=\mathop {\text {argmax}}\limits _{k}\left ({V_{k} }\right),\quad h\in \left \{{0,1 }\right \},~~k=1,2\tag{5}\end{equation*}
A combination of predictions with a weighted majority voting is obtained by multiplying each prediction result of the model with a certain weight. In this study, the weights are obtained based on validation accuracy’s proportional value in each base-learner model’s last epoch. If \begin{equation*} \tilde {h}=\mathop {\text {argmax}}\limits _{k}\left ({\sum \nolimits _{i=1}^{5} \delta _{ki} }\right),\quad \tilde {h}\in \left \{{0,1 }\right \},~~k=1,2\tag{6}\end{equation*}
The combination of predictions with the weighted average is obtained by averaging the value of the softmax (\begin{equation*} \hat {h}=\mathop {\text {argmax}}\limits _{k}\left ({\sum \nolimits _{i=1}^{5} y_{ik} /5 }\right),\quad \hat {h}\in \left \{{0,1 }\right \},~~k=1,2\tag{7}\end{equation*}
The outputs of the CNN models on the base-learner and the combination of predictions will be input to the training process in the meta-learner. We used SVM in the meta-learner stage for training and final classification. The classifier was chosen because it required few assumptions for input data and flexibility in using kernel functions [34], [35]. If it is known that \begin{equation*} \mathcal{G} \left ({\tilde {X} }\right)=\mathrm {sign}(\omega ^{T}\tilde {X}+\alpha)\tag{8}\end{equation*}
\begin{equation*} \mathcal{G} \left ({\tilde {X} }\right)=\mathrm {sign}(\omega ^{T}\varphi (\tilde {X})+\alpha)\tag{9}\end{equation*}
\begin{equation*} \mathop {\text {argmax}}\limits _{\omega,\alpha }\left \{{\frac {1}{\left \|{ \omega }\right \|}\mathop {\text {min}}\limits _{n}\left ({t_{n}\left ({\omega ^{T}\varphi (\tilde {X}_{n})+\alpha }\right) }\right) }\right \}\tag{10}\end{equation*}
\begin{align*}&\mathop {\text {argmin}}\limits _{\omega,\alpha }\left \{{\frac {1}{2}\left \|{ \omega }\right \|^{2} }\right \} \\&\text {subject to}: t_{n}\left ({\omega ^{T}\varphi (\tilde {X}_{n})+\alpha }\right)\ge 1,\quad n=1,\ldots,N \\\tag{11}\end{align*}
E. Classification Result Evaluation
To evaluate the classification results, we adopted several measurement indicators, accuracy (\begin{align*} AC=&(TP+TN)/(TP+TN+FP+FN)\tag{12}\\ PR=&TP/(TP+FP) \tag{13}\\ SE=&TP/(TP+FN) \tag{14}\\ F1=&2(PR)(SE)/(PR+SE)\tag{15}\end{align*}
Experiments and Results
A. Experiments
This study’s total subjects were 64 people (37 epilepsy subjects and 27 non-epilepsy subjects). We divided the subjects into 45 subjects (25 epilepsy and 20 non-epilepsy) for training and the remaining (12 epilepsy and seven non-epilepsy) for testing, as shown in Table 1. We used stratified 5-fold cross-validation [37] to evaluate each method in the classification of epilepsy with the number of frames for training, validation for each fold, and testing, as shown in Table 2. In this study, the success of the class label “epilepsy” classification is more precedence because of the urgency. Therefore, the number of frames (epilepsy) for training or testing is more than that of the non-epilepsy. Based on this consideration, the evaluation of each method was determined using (12)-(15). On the other hand, the evaluation results of each method are worth comparing, the training process uses the same index file. This study uses Google Collaboratory to implement all these evaluations in each experimental scenario.
The main stages of the proposed method in training refer to the proposed scheme as shown in Fig. 4, while the process steps at the meta-learner refer to Algorithms 1. Training of the base-learner model is carried out on each CNN model with the same input axial multi-sequence of MR images. The input shape in each scenario for the base-learner model is
In this study, we saved each MR image in Portable Network Graphics (PNG) type with a resolution of
We tested all methods on the test sample with the same dataset treatment in the testing phase. The test steps are shown in Algorithm 2. The training parameters for each fold were then used to classify all frames (images) on the same test dataset. Classification performance was obtained by determining the average value of the classification results of all folds. The testing was carried out to see the average performance of the proposed method against other methods.
B. Experimental Results
In this section, we report the experiment’s results using our proposed method, including its constituent methods. The experimental results reported are the performance of the methods at the base-learner stage, the combination of predictions, and meta-learner. Therefore, all methods have been tested in each testing scenario, as shown in Table 4–8.
In the first scenario with epoch = 50, the CNN models on the base-learner yielded the classification accuracy average of 71.64%-77.43% with the standard deviation range of 2.1-5.94. The CNN model ensemble on the base-learner using the predictions combination (MV, WA, and WMV) obtained the classification accuracy average of 80.38%-80.53% with the standard deviation of 1.88 - 2.10. The combination of predictions in this scenario obtained better classification accuracy than all base-learner models and lowered classification accuracy variability. However, testing with the proposed emsCNN-SVM yielded the classification accuracy average still better than it was. SVM with kernel polynomial and degree of 50 on meta-learner provided an accuracy improvement of the CNN models on base-learner by 5.33%-11.11% and 2.23%-2.37% on the combination of predictions. Generally, the proposed emsCNN-SVM presented deviation of the classification accuracy of each fold a relatively smaller than others, as shown in Table 4. The proposed method also yielded an average of sensitivity and F1-score better than others even though the classification precision was lower than the combination of predictions.
In the scenario with epoch = 100, the base-learner model yielded an accuracy average of 68.18%-79.67% with a standard deviation of 2.92-7.28. The combination of predictions obtained an accuracy average of 83.72%-83.83% and a standard deviation of 1.94-2.10. These results showed that the combining predictions using MV, WA, WMV obtained better results than base-learner models, but testing with the proposed emsCNN-SVM yielded the best results. SVM with the polynomial kernel (degree = 25) on the proposed emsCNN-SVM provided an accuracy improvement of the base-learner models by 6.70%-18.19% and 2.54%-2.65% for the combination of predictions. Based on the standard deviation value for classification accuracy, the proposed emsCNN-SVM yielded relatively lower variability than others.
Based on the resulting classification sensitivity value, the ensemble using our proposed emsCNN-SVM obtained the best average of classification sensitivity. Meanwhile, the base-learner model ensemble using the combination of predictions yielded an average of classification sensitivity better than the base-learner model. The proposed emsCNN-SVM provided an average improvement of classification sensitivity of 9.68%-28.45% for base-learner models and 7.24%-7.41% for all combinations of predictions. In general, this method also yielded lower variability in classification sensitivity than the others.
From the precision value in the epilepsy classification yielded in this scenario, emsCNN-SVM with the polynomial kernel (degree = 25) obtained a lower precision than the combination of prediction (MV, WA, and WVM). MV, WA, and WVM yielded the highest average value for classification precision with the lowest level of variability. However, in general, the proposed emsCNN-SVM yielded better average classification precision than the base-learner model with lower variability than those models. This method also obtained the highest F1-score and provided an average improvement of classification F1-score of 5.35%-16.75% for the base-learner models and 2.35%-2.45% for the combination of predictions.
In the experimental scenario with epoch = 150, the proposed emsCNN-SVM in general still presented a better average performance in the classification than the CNN model on the base learner and the combination of predictions. Even though at epoch = 100, the average classification performance of the proposed emsCNN-SVM was still better than epoch = 150, but at epoch = 150, it produced a lower level of variability than all scenarios. In this scenario, the CNN model on the base-learner provided a better level of variability in classification accuracy than the CNN model in other scenarios with a standard deviation of 1.63-4.20. The same results are also shown for sensitivity and F1-score.
The testing results with CNN in [22], VGG16, and ResNet50 for each scenario can be seen in Table 8. The results of testing at epoch = 50, 100, 150 with split evaluation 5-fold cross-validation showed that VGG16 obtained an average of accuracy and precision better than ResNet50, but still lower than CNN in [22]. Whereas our proposed emsCNN-SVM and emsCNN-SVM* yielded an accuracy average better than the others. At epoch = 50, emsCNN-SVM provided an average improvement of classification accuracy of 7.67% for CNN in [22], 10.61% for VGG16, and 14.48% for ResNet50. While, at epoch = (100,150), our proposed emsCNN-SVM presented an accuracy improvement of (12.41%, 8.82%) for CNN in [22], (12.66%, 10.52%) for VGG16 and (16.97%, 14.66%) for ResNet50. For an average of sensitivity and F1-score in classification, our proposed emsCNN-SVM also obtained the best results.
Discussion
In this section, we investigated the performance of CNN models on base-learner, the ensemble of the base-learner model with a combination of predictions (MV, WA, and WMV) and meta-learner. At the meta-learner stage, we investigated the ensemble of the CNN models on base-learner by meta-training using SVM to classify the dichotomous axial sequence of MR images of the brain as epilepsy vs. non-epilepsy. On the other hand, we also investigated some existing CNN models compared to our proposed emsCNN-SVM.
The results of testing showed that the CNN model on the base-learner obtained classification performance with high variation. The CNN model on the base-learner yielded a classification accuracy average in testing in the range of 68.18%-79.67% with a standard deviation of 1.63-7.28. When viewed from the many parameters in the base-learner model, msCNN2 was more than the other models, as shown in Table 3. However, the large number of model parameters does not guarantee that it is proportional to the classification performance produced, especially in the axial multi-sequence of MR images. The classification accuracy average of each CNN model on the base-learner is still below 80%. The training and testing data variability level are relatively high because it involves a multi-sequence of MR images, which affects the performance.
When likened to CNN models on the base-learner is a neurologist who reads axial multi-sequence of MR images, then the reading of each neurologist may give different results. Using the combination of predictions with majority voting, weighted majority voting, and weighted average can increase the accuracy of epilepsy classification and reduce the variability of classification results. However, the increase stops at a certain level (saturation) and is difficult to increase because it depends entirely on the predictions of the classification of models on the base-learner. A meta-learner stage in the proposed emsCNN-SVM has become one of the solutions to improve classification accuracy and better than the combination of predictions. The improving classification accuracy can be found because, at the meta-learner stage, it depends not only on the results of the base-learner model but there is also meta-learning using SVM. The learning not only involves the prediction results of the base-learner models but also the results of the combination of predictions to improve classification performance. The proposed emsCNN-SVM accommodated the output of the CNN model on the base-learner and the combination of predictions (MV, WA, and WMV), accordingly, it yielded better and more stable performance for each scenario.
We realize that the best results in our proposed scheme involving SVM, in this case, do not apply to all kernels in training. At epoch = 50, 100, 150 kernel functions that give better results than others (e.g., RBF and linear) are polynomials with degree (d) = 50, 25, 10, as shown in Fig. 8. In this study, the criteria for determining the degree of the polynomial function are based on the best classification accuracy average, as shown in Fig. 5. In general, the greater the degree of the polynomial function, the higher the sensitivity values, but the impact on the precision decreases. The selection of polynomial kernel degrees based on the maximum sensitivity value will impact the low precision values. Therefore, the best choice is selecting polynomial kernel degrees in the proposed emsCNN-SVM based on the highest accuracy value. The option indirectly considers the value of precision, sensitivity, and F1-score.
Classification performance of proposed emsCNN-SVM with kernel ‘polynomial’ and different degrees.
Accuracy and F1-score of proposed emsCNN-SVM by including three and five based-learner models with kernel ‘polynomial’ and degree = 25.
Accuracy of proposed emsCNN-SVM with epoch = 100, kernel ‘polynomial’ degree = 25, and different input in meta-learner.
The number of models in the ensemble also influences the performance of the proposed emsCNN-SVM in classifying epilepsy against non-epilepsy. By involving five CNN models on the base-learner, it gives a better classification performance than applying only three CNN models. Fig. 6 shows the accuracy value and F1-score for each fold involving five models giving better results than involving only three base-learner models. The involvement of inputs in meta-learning also affects classification performance. The proposed emsCNN-SVM involving three kinds of input: the base-learner model’s predictions, the combination of predictions, and the softmax output of the base-learner models provides better classification accuracy than involving only two types of input and one kind of input, as shown in Fig. 7.
To know the performance or stability of our proposed method, we also compared the results with the existing models: CNN in [22], VGG16, and ResNet50. The results of testing with the same dataset treatment appeared that our proposed method improved all performances in the classification, as shown in Table 8. We realize that there are differences in the input image dimensions in these testing, which will affect the performance [39]. The proposed scheme has an input image dimension of
Our study has several limitations, including the relatively small samples of sequence of MR images used in training and testing. At the clinical level, validation must be carried out on more data involving many institutions. On the other hand, studies involving multi-sequence of MR images and different types of brain abnormalities within a class of epilepsy certainly have the potential to reduce the classifier’s performance. In addition, using only axial planes can also obtain lower performance than involving all other planes: sagittal and coronal.
This study only uses five CNN models on the base-learner. We understand that more CNN models in the base-learner will enrich the decisions and strengthen the results for the combination of predictions and processes on the meta-learner. However, more models in the base-learner will impact the number of model parameters used. Therefore, we decided to use five models of the base-learner for the ensemble process with better results than the three models of the base-learner.
Conclusion
In this study, a method has been proposed to improve performance in the classification of epilepsy based on axial multi-sequence of MR images with an ensemble of several CNN models. The ensemble model is carried out by applying the principle of stacked generalization. The output of the CNN models of the base-learner and combination of predictions (majority voting, weighted average, and weighted majority voting) forwarded to SVM in the meta-learner stage. The proposed scheme can generally improve performance in classifying brain structural abnormalities causing epilepsy vs. non-epilepsy. The testing results show that the proposed scheme has a high potential to assist neurologists (clinicians) in identifying epilepsy patients based on multi-sequences of MR images.
For clinical purposes, in the future, there is still potential to improve the performance of epilepsy classification based on multi-sequence of MR images by increasing the amount of training or testing data and involving all planes of MR images.