Journals & Magazines >IEEE Access >Volume: 12

WKLD-Based Feature Extraction for Diagnosis of Epilepsy Based on EEG

This study introduces a novel feature extraction approach based on Window Kullback-Leibler Divergence (WKLD), combined with discrete wavelet analysis, for extracting feat...

Abstract:

High-performance automated detection methods for epilepsy play a crucial role in clinical diagnostic support. To address the challenge of effectively extracting features ...Show More

Metadata

Abstract:

High-performance automated detection methods for epilepsy play a crucial role in clinical diagnostic support. To address the challenge of effectively extracting features from epileptic EEG signals, characterized by strong spontaneity and complexity, a novel feature extraction approach based on Window Kullback-Leibler Divergence (WKLD) is proposed, coupled with discrete wavelet analysis for EEG signal feature extraction. Then, a Residual Multidimensional Taylor Network (ResMTN) classifier is applied for epilepsy state classification. Experimental results demonstrate an accuracy of 98% in classifying EEG signals during seizure and interictal periods, with both specificity and sensitivity reaching 98.18%, outperforming existing widely-used feature extraction and classification methods.

This study introduces a novel feature extraction approach based on Window Kullback-Leibler Divergence (WKLD), combined with discrete wavelet analysis, for extracting feat...

Published in: IEEE Access ( Volume: 12)

Page(s): 69276 - 69287

Date of Publication: 15 May 2024

Electronic ISSN: 2169-3536

DOI: 10.1109/ACCESS.2024.3401568

Funding Agency:

Contents

SECTION I.

Introduction

Epilepsy is a common neurological disorder, affecting approximately 1% of the world’s population [1]. Epileptic seizures typically manifest as sudden, uncontrolled abnormal brain activity, which may result in convulsions, frequent loss of consciousness, and significant impacts on the physical, mental, and intellectual well-being of the patients. Severe epileptic seizures can lead to loss of consciousness and life-threatening situations [2]. Long-term management and treatment of epilepsy pose a continuous challenge. Therefore, timely and accurate detection of epileptic seizures is paramount in the prevention and treatment of epilepsy.

As a commonly used neurophysiological data, Electroencephalogram (EEG) signals are typically acquired by placing electrodes on the surface of the scalp to capture faint electrical signals generated by brain activity, reflecting the brain’s state [3]. During epileptic seizures, significant changes occur in EEG signals, characterized by specific patterns of abnormal fluctuations. Achieving diagnosis based on EEG signals requires healthcare professionals to conduct long-term monitoring of patients’ EEG signals and interpret them based on clinical experience, a process that is subjective, time-consuming, and prone to biases. In recent years, methods based on machine learning and deep learning have been widely applied in the analysis of electroencephalogram signals. Classifiers such as K-nearest neighbors (KNN), Support Vector Machines (SVM), and Multilayer Perceptrons (MLP) have been extensively employed in epilepsy diagnosis [4]. Automated epilepsy detection techniques based on EEG offer a more reliable and precise solution to enhance the high-precision detection and classification of epilepsy patients’ EEG signals [5]. For instance, in [6], a classification accuracy of approximately 90% was achieved by employing Permutation Entropy (PE) feature extraction method and SVM classifier for epilepsy discrimination. Furthermore, the use of Graph Neural Network (GNN) models for classifying EEG signals of Alzheimer’s disease patients outperforms traditional CNN and MLP models [7]. Research on emotion classification based on EEG signals utilizes differential entropy features to train deep belief networks (DBN), achieving a classification accuracy of 87.62% [8].

For achieving high-precision disease diagnosis, effective feature extraction is paramount. Entropy-based features have been demonstrated as important extraction methods [9], [10], [11]. The Kullback-Leibler divergence, also known as relative entropy, is utilized to measure the discrepancy between source domain distribution and target domain distribution. In transfer learning, it often serves as a loss function, guiding the model to adjust its parameters to minimize the deviation between these two distributions, thereby enhancing the model’s performance and generalization ability in the target domain. Existing entropy feature extraction methods typically calculate complexity over the entire time series, lacking a method for extracting and analyzing local feature signals. Therefore, this study introduces a novel feature extraction approach, enhancing relative entropy by sliding window segmentation of EEG signals without overlapping. We propose a feature extraction method based on Window Kullback-Leibler Divergence (WKLD), combined with features extracted through Discrete Wavelet Transform (DWT). These composite features are then input into the Residual Multi-dimensional Taylor Network (ResMTN) classifier for classification. However, this feature extraction method is influenced by the number of data samples. When the time series is too short, the sliding division of the window may not be able to extract the complete feature waveform, thus causing certain disturbances. When the time series is sufficiently long, the sliding division of the window requires a certain amount of computational resources. Moreover, ResMTN, an advanced classifier based on the fusion of residual networks and multi-dimensional Taylor networks, is specifically designed for EEG data processing, evolved from the BP-MTN. Comparative studies with traditional feature extraction methods (e.g., Approximate Entropy (ApEn), fuzzy entropy) and models like SVM and KNN demonstrate the significant advantages of the WKLD-based feature extraction method and the ResMTN classifier in epilepsy classification accuracy and beyond. This bears crucial implications for improving patient treatment and quality of life.

SECTION II.

Related Research

A. Electroencephalogram(EEG)

The human brain contains approximately one hundred billion neurons. When a large number of neurons discharge synchronously, significant changes in electrical potential occur, which can be detected by EEG instruments. EEG signals are collected by placing electrodes on specific locations of the scalp, following the 10-20 international system [12]. Electrodes are named using a combination of letters and numbers. For instance, F denotes the frontal region, T the temporal region, P the parietal region, and O the occipital region. Even numbers represent the right side of the head, while odd numbers represent the left side. In clinical studies, at least sixteen positions are typically used to record EEG signals.

EEG signals are non-stationary and exhibit nonlinearity and burst characteristics. As shown in Table 1, EEG can be categorized into Delta, Theta, Alpha, Beta, and Gamma bands, reflecting different brain activities under various functional states [13]. During epileptic seizures, abnormal waves appear, such as spike waves with mainly negative components and uncertain amplitudes, as well as sharp waves with longer durations than spike waves [14]. These abnormal waves indicate abnormal neuronal discharges, leading to transient brain dysfunction [15]. Therefore, effective EEG signal feature extraction can extract meaningful features from complex EEG signals, aiding in the identification and understanding of abnormal patterns associated with diseases.

TABLE 1 Category of EEG signals

B. Feature Extraction Engineering

Feature extraction for epileptic EEG mainly includes time-domain analysis, frequency-domain analysis, and time-frequency analysis methods. Time-domain analysis directly examines the original EEG signals in the time series, focusing on the magnitude and duration of voltage changes. Entropy-based feature extraction methods are typical time-domain approaches and are commonly used in EEG signal analysis. Entropy is a measure used to describe the complexity and uncertainty of signals, making it widely applicable in EEG signal feature extraction. Common entropy-based feature extraction methods include ApEn, Sample Entropy (SampEn), and PE. For example, ApEn quantifies the complexity of time series, where higher values indicate higher complexity [16]. To address errors caused by counting self-matching values in ApEn, a feature extraction method based on calculating SampEn from EEG signals has been proposed. SampEn, derived from ApEn, reflects the complexity of EEG signals, with higher values indicating greater complexity [9]. Additionally, the feature extraction method based on fuzzy entropy introduces fuzzy set theory to better handle uncertainty and fuzziness in EEG time series data [11]. PE transforms time series into permutations and calculates the frequency distribution of these permutations for feature extraction [6]. These methods are straightforward to implement and understand. They allow direct observation of the original form of signals, which is useful for identifying obvious abnormalities. However, time-domain features may not capture all relevant information, especially for non-stationary signals with frequency content changing over time. Additionally, these methods may be more susceptible to noise as they do not inherently combine signal averaging or filtering.

In contrast to time-domain methods, frequency-domain analysis involves using mathematical transforms (such as Fourier transform) to convert EEG signals from the time domain to the frequency domain for analyzing the signal’s frequency content. Power Spectral Density (PSD) is a typical frequency-domain method that represents the energy distribution characteristics of the signal at each frequency by computing the square of the amplitude of each frequency component. PSD estimation extracts features of different epileptic states, achieving a classification accuracy of 93.33% when input into an SVM classifier [17]. Additionally, spectral Entropy measures the entropy of the signal spectrum, reflecting the uniformity and complexity of the spectral distribution, making it an effective tool for signal feature extraction [18]. As shown in [19], spectral information (such as spectral flatness and spectral centroid) can also be used for EEG signal analysis. Frequency information is detailed and clearly displays the signal’s frequency components, which is crucial for diagnosis as different frequencies may represent different types of brain activity. However, these methods require signal stationarity, while brain activity is dynamic. Additionally, they do not provide information on specific events occurring in the time series.

Time-frequency methods, such as wavelet transform, represent EEG signals in both the time and frequency domains, integrating information from both domains to capture the characteristics of the signal in both domains simultaneously. These methods primarily include Fourier transform and wavelet transform. Short-time Fourier transform divides the signal into multiple short time segments, which overlap with each other, and performs Fourier transform on each segment [20]. Compared to short-time Fourier transform, wavelet transform has higher resolution and performance [21]. The effectiveness of Daubechies (db) wavelets, particularly db4 wavelet, has been demonstrated for representing epileptic states in EEG signals [22]. Additionally, DWT decomposes EEG signals into a series of wavelet subbands, and statistical measures such as variance and mean are used to extract wavelet coefficients as features of EEG signals [23]. These methods are particularly suitable for analyzing non-stationary signals as they can capture information from both time and frequency domains simultaneously. Moreover, they can locate specific events in both time and frequency domains, providing a more comprehensive analysis of EEG signals. However, the implementation and interpretation of these methods may be more complex. Due to the increased complexity of analysis, more computational resources may be required.

Although the aforementioned EEG feature extraction methods are often effective to some extent in extracting disease features, EEG signals are typically non-stationary. Many traditional time-domain and frequency-domain methods assume signal stationarity. ApEn and SampEn (SampEn), commonly used to quantify the complexity of time series, may not be robust enough for non-stationary signals. Additionally, ApEn and SampEn primarily measure the overall signal complexity rather than local self-similarity, making them less sensitive to minor changes. To address this issue, this study proposes the WKLD feature extraction method. Compared to existing methods, WKLD may be more suitable for handling non-stationary data like EEG signals by analyzing non-overlapping windows. Furthermore, WKLD focuses on evaluating the relative entropy of adjacent windows, allowing quantification of the signal’s self-similarity. Therefore, this method can more sensitively capture minor changes in EEG signals and provide a more accurate assessment of system chaos.

C. The Classifier Backpropagation Multidimensional Taylor Network (BP-MTN)

Traditional Multi-dimensional Taylor Networks (MTN) can approximate any nonlinear function using polynomial networks, thus exhibiting good fitting capabilities for any data and serving as an algorithm suitable for prediction and control [24], [25]. Currently widely used deep learning models, such as deep neural networks, often have numerous layers and high model complexity. Therefore, while they achieve high accuracy, they may not meet the real-time requirements for automated epilepsy detection. In contrast to deep learning methods, Multi-dimensional Taylor Networks consist of only one hidden layer, making them lightweight models capable of rapidly approximating any nonlinear function with high accuracy. Assuming a time series $x(t)$ , $t =1$ , 2,..., $L$ , the value $x(t+1$ ) at time $t+1$ can be represented by a polynomial combination of its previous time values, as shown below:\begin{equation*} x(t+1)=f(x(t))=f(x_{1}(t),x_{2}(t),\ldots ,x_{n}(t)) \tag {1}\end{equation*} View Source and \begin{equation*} x_{i}(t+1)=\sum \limits _{q=1}^{\omega }{\lambda _{q}\prod \limits _{i=1}^{n} x^{\sigma _{q,i}}} \tag {2}\end{equation*} View Source where $f$ represents the MTN; $\omega $ represents the total number of product terms in the approximation expansion, $\lambda _{q}$ denotes the weight before the $q$ th product term in the approximation expansion, and $\sigma _{q,i}$ represents the power of variable $x_{i}$ in the $q$ th product term. To enable classification functionality, we introduced enhancements to the traditional MTN model, resulting in the BP-MTN model as illustrated in Figure 2. In this model, we addressed the issue of inconsistent input-output dimensions by adding fully connected layers to the MTN model. Additionally, we included a softmax layer for classification and incorporated activation functions to enhance the model’s nonlinear fitting capabilities [26].

FIGURE 1.

Tree diagram of feature extraction.

Show All

FIGURE 2.

The structure of the BP-MTN.

Show All

In the BP-MTN, the vector output from the fully connected layer is fed into the softmax layer for classification, and is denoted as:\begin{equation*} P(t) = [ p_{1}(t) { p}_{2}(t) \ldots { p}_{N}(t) ]^{T} \tag {3}\end{equation*} View Source where \begin{align*} z_{n}& =S\left ({{ \sum \limits _{i=1}^{I} {\lambda _{in}S\left ({{ \sum \limits _{q=1}^{N\left ({{ i,M }}\right )} {w_{i,q}\prod \limits _{j=1}^{I} x_{j}^{\sigma _{a,j}}} }}\right )} }}\right ), \\ & \qquad \qquad n=1,2,\ldots , N\end{align*} View Source where $z_{n}$ represents the softmax function output for category $n$ ; S represents the softmax function; and the final category is determined based on the probability values of $n$ categories.

SECTION III.

Research Methodology

This section introduces a novel feature extraction method based on DWT and windowed relative entropy. We employ the ResMTN as a classifier for automated detection and classification of epilepsy.

A. Discrete Wavelet Transform

In the field of EEG signal processing, wavelet transform is a highly valuable analytical method. Mapping EEG signal f to the time-frequency domain provides a multi-resolution perspective for studying the signal’s dynamic changes. The inner product form of continuous wavelet transform is represented as follows:\begin{equation*} {WT}_{f}\left ({{ a, b }}\right )=\lt f, \psi _{a,b}\gt \tag {4}\end{equation*} View Source where \begin{equation*} \psi _{a,b} (t)= \left |{{ a }}\right |^{-\frac {1}{2}} \psi \left ({{\frac {t-b}{a}}}\right )\end{equation*} View Source where WT_f (a, b) denotes the continuous wavelet transform of the EEG signal f, where a and b represent the scale and translation parameters of the wavelet function, respectively, $\psi _{a,b}$ (t) denotes the wavelet function adjusted for scale and position, with t as the time variable. When processing EEG signals, continuous wavelet transform encounters redundancy issues due to convolution integration in the time domain. To address this problem, DWT is employed. Let $b=\frac {k}{2^{j}}, a=\frac {1}{2^{j}}, j,k\in Z$ , where j and k are integers, then WT_f is expressed as:\begin{equation*} {WT}_{f}\left ({{ \frac {1}{2^{j}},\frac {1}{2^{k}} }}\right )=\lt f,\psi _{j,k}\gt \tag {5}\end{equation*} View Source where $\frac {1}{2^{j}}$ represents the scaling of the wavelet function in the frequency domain, and $\frac {1}{2^{k}}$ represents the translation of the wavelet function in the time domain. By using db4 (Daubechies 4) as the basis function for wavelet transform, it’s possible to effectively decompose the signal into multiple levels to explore the characteristics of EEG signals across different frequency ranges. The EEG signal is initially passed through high-pass and low-pass filters, generating two types of coefficient subbands: detail coefficients and approximation coefficients. When the EEG signal passes through the high-pass filter, the obtained detail coefficients subband (e.g., D1) describes the high-frequency portion of the signal, representing its subtle variations. Conversely, when the signal passes through the low-pass filter, the obtained approximation coefficients subband (e.g., A1) represents the low-frequency component of the signal, capturing its overall trend. This decomposition process can be further iterated. For example, the A1 subband can be decomposed again to obtain higher-level detail coefficients (e.g., D2) through a high-pass filter, and higher-level approximation coefficients (e.g., A2) through a low-pass filter, and so on. Through this process, the signal can be decomposed into multiple levels.

B. Window Kullback-Leibler Divergence (WKLD)

To further extract features from EEG signals, we propose a window-based KL divergence method, WKLD. This method is a measure based on relative entropy, used to reflect the degree of chaos in EEG signals. In seizure detection, the degree of chaos in EEG signals is often associated with the likelihood of seizure occurrence. Therefore, WKLD, as an entropy value, can be used to assess the characteristics of EEG signals. Firstly, WKLD divides the signal into fixed-size windows, with no overlap between adjacent windows. Secondly, it calculates the relative entropy between adjacent windows to evaluate the self-similarity degree of neighboring signal windows based on the magnitude of entropy. A higher entropy value indicates greater chaos in EEG signals and lower self-similarity degree, while a lower entropy value suggests higher self-similarity degree. Lastly, it shifts continuously adjacent windows with the window size as the step length, and calculates the relative entropy between them, resulting in a vector of relative entropy. By computing the variance of the relative entropy feature vector, it comprehensively evaluates the fluctuations and regularity of EEG signals. A higher WKLD value indicates poorer periodicity of EEG signals, reflecting lower regularity and greater association with abnormal activity or epilepsy. Therefore, it can be used as a feature of EEG signals. The specific workflow of this method is outlined as follows:

First, define a time series $x(t)$ consisting of $N$ data points, where $t =1$ , 2,..., $N$ . The timeseries is segmented using a sliding window approach, defined as:\begin{align*} & X(i) \\ & =\{ x\left ({{ m\ast \left ({{ i-1 }}\right )+1 }}\right ),x(m\ast (i-1)+2),\ldots , x(m\ast i)\}, \\ & \qquad \qquad \qquad i=1,2,\ldots ,M \tag {6}\end{align*} View Source where $m$ is the number of data points contained in each window, $i$ is the index of the window, and $M$ is the length of windows. The relationship between adjacent windows in WKLD is illustrated in Figure 3. Specific waveforms of EEG signals (such as Alpha, Beta, Theta, and Delta waves) require a certain time range to be extracted. If there is overlap between windows, the same waveform may be captured separately in two windows, potentially affecting the analysis of waveform integrity. Additionally, signal overlap may lead to redundancy of information. Therefore, when computing the relative entropy, two adjacent signal windows without EEG signal overlap are considered. As shown in Figure 3, two adjacent and non-overlapping signal windows are defined as $X(i)$ and $X(i+1$ ). All elements of these two windows form a vector ${\boldsymbol {u}} _{\boldsymbol {i}}$ , satisfying the following condition:\begin{align*} {\boldsymbol { u}}_{\boldsymbol {i}} & =\{ x(k) \vert x(k) \in X(i) ~and~x(k) \notin X(i+1) ~or~(x(k) \\ & \qquad \in X(i+1) ~and~x(k)\notin X(i)) \} \tag {7}\end{align*} View Source

FIGURE 3.

The structure of the adjacent of two windows of the WKLD.

Show All

As shown in Figure 3, the black image represents the EEG signal. The blue window represents the previous window $X(i)$ , and the red window represents the adjacent window $X(i+1$ ). These windows move in a sliding window fashion, with the window size as the step length for movement. The green window in Figure 3 is an example of the blue window after moving a certain step length, and the red window also shifts the same step length. The EEG signal is continuously divided through the sliding window method. The blue and red windows respectively represent the adjacent signal windows $X(i)$ and $X(i+1$ ). By moving the signal window $X(i)$ once with the window width as the step length, we can get the window $X(i+1$ ), and after N times of moving, we can get the purple window $X(i+N)$ . We count the occurrences of each element in vector ${\boldsymbol {u}} _{\boldsymbol {i}}$ in adjacent signal windows $X(i)$ and $X(i+1$ ), denoted as vector $\boldsymbol {a}$ and vector ${\boldsymbol {b}} _{,}$ respectively. By dividing by the total number of elements n in vector ${\boldsymbol {u}} _{\boldsymbol {i}}$ , we obtain the probability density vectors of adjacent windows $X(i)$ and $X(i+1$ ), as shown below:\begin{equation*} \boldsymbol {p}\left ({{ \boldsymbol {x} }}\right )= \left ({{\frac {a_{1}}{n}, \frac {a_{2}}{n}, \ldots , \frac {a_{i}}{n}}}\right ), ~\boldsymbol {q}(\boldsymbol {x})=\left ({{\frac {b_{1}}{n}, \frac {b_{2}}{n}, \ldots , \frac {b_{l}}{n}}}\right ) \tag {8}\end{equation*} View Source where $a_{i}$ is the $i$ th element of vector $\boldsymbol {a}$ , and $b_{l}$ is the $l$ th element of vector $\boldsymbol {b}$ . The $k$ th element of the data sets having the probability density function $p(x)$ and $q(x)$ are denoted as $p(x_{k})$ and $q(x_{k})$ , respectively. The relative entropy is calculated to measure the distance between two adjacent and non-overlapping signal windows. The size of the distance reflects the degree of self-similarity between adjacent windows. The smaller the distance, the higher the degree of self-similarity between adjacent windows; conversely, the larger the distance, the lower the degree of self-similarity. The calculation equation is as follows:\begin{equation*} D(p\vert \vert q) =\sum \limits _{k=1}^{n} {p\left ({{ x_{k} }}\right )\log \frac {p\left ({{ x_{k} }}\right )}{q\left ({{ x_{k} }}\right )}} \tag {9}\end{equation*} View Source where \begin{equation*} p\left ({{ x_{k} }}\right )\ne 0 ~\mathrm {and}~q\left ({{ x_{k} }}\right )\ne 0~\mathrm {and} ~k=1,2,\ldots ,n\end{equation*} View Source

After shifting adjacent windows, the relative entropy is calculated sequentially until the EEG signal ends, resulting in a relative entropy vector. The obtained feature vector is then subjected to data normalization, scaling the data to the range between 0 and 1, as shown in Eq. (10), thereby enhancing the stability of the model and mitigating the issue of gradient vanishing.\begin{equation*} {\boldsymbol {D(p\vert \vert q)}}^{\boldsymbol {'}}=\frac {\boldsymbol {D(p\vert \vert q) - min(D(p\vert \vert q))}}{\boldsymbol {max(D(p\vert \vert q)) - min(D(p\vert \vert q))}} \tag {10}\end{equation*} View Source For the normalized feature vector, the variance is calculated to assess the dispersion of the timeseries $x(t),t=1,2,\ldots ,N$ . A higher numerical value indicates greater dispersion, suggesting amore chaotic and complex EEG signal.

C. Resmtn

To classify epileptic states, the features extracted from the combination of DWT and WKLD are fed into the ResMTN classifier. The ResMTN classifier is responsible for distinguishing between different epileptic states. For example, if the input feature dimension is 200 and the output dimension of ResMTN is 2, it can represent the two classes: interictal and ictal periods of epileptic seizures.

ResMTN is an enhanced network model based on BP-MTN, retaining BP-MTN’s input layer and polynomial layer design. The difference lies in incorporating a skip connection inspired by ResNet, as shown in Figure 4, by following the equation $H(x) = F(x) + x$ [27]. This enables the network to simultaneously learn input features and residual features, thereby facilitating effective information propagation within the network. Additionally, this design effectively addresses the issues of gradient explosion and vanishing gradients, helping to prevent overfitting of the model.

FIGURE 4.

The structure of the ResMTN.

Show All

Assuming at time $t$ , the input vector to the polynomial layer is denoted as $X(t)$ , and the set of monomials in the polynomial combination of input vectors is referred to as $U$ . The corresponding weight matrix of the polynomial layer is represented by $W$ , and is given as \begin{equation*} G(t) = W^{T}\cdot U(t) = {[W_{1} {W}_{2 \ldots } W_{L}]}^{T} \cdot U(t) \tag {11}\end{equation*} View Source where \begin{align*} & U(t) \\ & =[ 1, x_{1}(t) \ldots x_{L}(t),x_{1}(t) x_{2}(t) \ldots ,x_{L}^{2}(t) \ldots , x_{1}(t) \\ & \quad \ldots x_{M}(t) \ldots x_{L}^{M}(t) ]^{T} ~\mathrm {and}~W_{L} = [ w_{L,1} w_{L,2} {\ldots w}_{L,N(L,M)}]^{T}\end{align*} View Source

After inputting the fully connected layer, the data is mapped to the range [−1, 1] using the tanh activation function to enhance the model’s accuracy. This is achieved by leveraging the zero-centered property of tanh to reduce data bias, facilitating faster convergence of the algorithm. Therefore, the output of the fully connected layer is denoted as:\begin{equation*} H(t) =\tanh [ G(t) ] \tag {12}\end{equation*} View Source In contrast to the traditional BP-MTN, ResMTN incorporates skip connections, enabling the raw signal to bypass the polynomial layer directly by connecting the input to the output of the polynomial layer. This addresses the issues of gradient vanishing and exploding gradients caused by high-order polynomials in the training process of the MTN network, thereby enhancing the training effectiveness and model performance of the network and improving the robustness and generalization ability of ResMTN. By adding the original signal $X(t)$ to the output of the polynomial layer $H(t)$ , a residual structure is formed, and the output of this residual structure can be represented as:\begin{equation*} R\left ({{ t }}\right )=H(t) +X(t) \tag {13}\end{equation*} View Source Then, inputting $R(t)$ into the fully connected layer to obtain $Z(t)$ as:\begin{equation*} Z(t) = V^{T}\cdot R(t) = {[V_{1} {V}_{2 \ldots } V_{c}]}^{T} \cdot R(t) \tag {14}\end{equation*} View Source with \begin{equation*} Z(t)=[z_{1}(t) z_{2}(t) {\ldots z}_{c}(t)]~\mathrm {and}~{V}_{L} = [v_{c,1} v_{c,2} {\ldots v}_{c,L}]^{T}\end{equation*} View Source where $V$ is the parameter vector of the fully connected layer. $Z_{m}$ represents the output of the $m$ th fully connected layer. At last, $Z(t)$ is input in the softmax layer for classification \begin{equation*} P(t) = [ p_{1}(t) { p}_{2}(t) \ldots { p}_{C}(t) ]^{T} \tag {15}\end{equation*} View Source with \begin{equation*} p_{l} =\frac {exp(z_{c})}{\Sigma _{i=1}^{C} exp(z_{i})},~c=1,2,\ldots , C\end{equation*} View Source After calculating through the softmax layer, it outputs the probability distribution of categories, and finally makes the judgment of the most likely category based on the probability values.

During the process of optimizing model parameters, the difference between the actual distribution and the predicted distribution is measured by computing the cross-entropy loss function.\begin{equation*} Loss = - \sum \limits _{i=1}^{L} {y_{i}} \cdot ln(p_{i}) \tag {16}\end{equation*} View Source where $y_{i}$ represents the actual class distribution using one-hot encoding, and $p_{i}$ denotes the model’s predicted probability distribution. To train ResMTN, the Adam optimizer is employed, which dynamically adjusts the learning rate for each parameter using first and second moment estimates of the gradients. Finally, parameter updates are performed using the backpropagation algorithm.

To sum up, the whole process of detecting epilepsy is shown in Figure 5. After the feature extraction stage, the extracted features are fed into the ResMTN classifier. This process is divided into training and testing phases based on the dataset split. During the training phase, the classifier learns model parameters using the training dataset and minimizes the loss function on the training data. Once the model enters the testing phase, the trained classifier is used to classify the data in the testing set. By comparing the predictions with the actual labels, the model’s performance is evaluated. Experimental validation indicates that the highest classification accuracy is achieved when the polynomial order is 2.

FIGURE 5.

Flowchart of epilepsy detection using the ResMTN model.

Show All

SECTION IV.

Experiment Design and Performance Evaluation

A. Copyright Formdata Setting Description

The publicly available dataset used in this study was obtained from the Epilepsy Center at the University of Bonn, Germany. It comprises five categories, each containing 100 samples. Each sample has a duration of 23.6 seconds and is recorded using single-channel EEG with a sampling frequency of 173.61Hz. Category A represents EEG signals from normal subjects during wakefulness with eyes open, while category B represents EEG signals from normal subjects during wakefulness with eyes closed. Category C consists of EEG signals from epilepsy patients during interictal periods in the hippocampal region. Category D contains EEG signals from epilepsy patients during interictal periods in the epileptic focus region. Category E includes EEG signals from epilepsy patients during ictal periods in the epileptic focus region. This study primarily focuses on the classification of 200 samples from the interictal (subset D) and ictal (subset E) periods of epileptic seizures.

Additionally, we utilized the MATLAB Wavelet Packet Visualization Tool to visualize signal segments, as depicted in Figure 6. It is evident that there is a significant difference in amplitude between data subset D and data subset E, with epileptic seizure periods exhibiting stronger regularity compared to interictal periods.

FIGURE 6.

Image a: A sample of 4097 original signal segments from dataset D; Image b: A sample of 4097 factor signal segments from dataset E.

Show All

1) Data Processing

In this study, MATLAB software was employed to conduct DWT on 200 samples. Following a comparison and correlation analysis of epileptic EEG characteristic waveforms in MATLAB, it was observed that the db4 wavelet most closely resembled the characteristic waveforms. Consequently, the db4 wavelet was utilized as the wavelet basis, and a six-level DWT was performed with a downsampling parameter set to 2. Upon passing the signal through high-pass and low-pass filters, the first-level detail coefficients (D1) and approximation coefficients (A1) were obtained. This process was repeated for each level of approximation coefficients to obtain higher-level detail and approximation coefficients, such as D3, D4, D5, D6, and A6 subbands. The standard deviation of the wavelet subbands was computed as the wavelet coefficients to extract features of epileptic interictal and ictal periods. Table 2 illustrates the wavelet decomposition levels, where numbers represent the level of wavelet decomposition, “D” signifies the detail information of the EEG signal, and “A” signifies the approximation representation of the EEG signal. From the table, it can be observed that there are significant differences in the wavelet coefficients between interictal and ictal periods, indicating distinct features between normal individuals and those experiencing epileptic seizures.

TABLE 2 Feature values of sub-bands

2) The Result of T-SNE

After extracting features of normal individuals, epileptic interictal, and ictal periods using WKLD and DWT, the t-distributed Stochastic Neighbor Embedding (t-SNE) algorithm was employed to perform dimensionality reduction on the extracted features. This resulted in a linearly separable effect in a two-dimensional space. Prior to feature extraction, as depicted in Figure 7(a), red dots represent EEG signals of normal individuals, blue dots represent EEG signals during epileptic interictal periods, and orange-brown dots represent EEG signals during epileptic ictal periods. The EEG signals of different classes appear to be in a chaotic state. However, after utilizing the WKLD method for feature extraction, as shown in Figure 7(b), the distinguishability of features is significantly higher compared to the original data, demonstrating the effectiveness of the proposed feature extraction method.

FIGURE 7.

Image a: The t-SNE dimensionality reduction of the original signals; Image b: The t-SNE dimensionality reduction after extracting EEG features using WKLD and DWT.

Show All

B. Validation of the Proposed Method

To evaluate the algorithm proposed in this study, the paper employed a variety of evaluation metrics. True Positives (TP) refer to the number of positive class samples correctly classified as positive, True Negatives (TN) refer to the number of negative class samples correctly classified as negative, False Negatives (FN) refer to the number of positive class samples incorrectly classified as negative, and False Positives (FP) refer to the number of negative class samples incorrectly classified as positive. Accuracy represents the proportion of correctly predicted samples out of the total number of samples. Sensitivity refers to the ability to successfully detect positive class samples. Specificity is the ability to successfully detect negative class samples. The Matthews Correlation Coefficient (MCC) is a value between −1 and +1 that measures the quality of a binary classifier’s prediction for imbalanced datasets. The Kappa statistic is a measure of classification accuracy that considers the agreement occurring by chance, with higher Kappa values indicating greater observed agreement beyond chance. The Classification Success Index (CSI) is a value ranging from −1 to 1, which measures a model’s ability to correctly classify positive instances while simultaneously considering the accuracy of both positive and negative classifications. The formulas for these metrics are shown below:\begin{align*} Accuracy & = \frac {TP + TN}{TP + TN + FP + FN} \tag {17}\\ Sensitivity & = \frac {TP}{TP + FN} \tag {18}\\ Specificity & = \frac {TN}{TN + FP} \tag {19}\\ MCC& =\frac {TP\times TN-FP\times FN}{\sqrt {\left ({{ TP\!+\!FP }}\right )\left ({{ TP\!+\!FN }}\right )\left ({{ TN\!+\!FP }}\right )\left ({{ TN\!+\!FN }}\right )} } \tag {20}\\ CSI & = \frac {TN}{TN + FP} + \frac {TP}{TP + FN} - 1 \tag {21}\\ Kappa& = \frac {P_{o}-P_{e}}{1-P_{e}} \tag {22}\end{align*} View Source where \begin{align*} P_{o} & = Accuracy \\ P_{e}& = \frac {\left ({{ TP+FP }}\right )\left ({{ TP+FN }}\right ) + \left ({{ FN+TN }}\right )\left ({{ FP+TN }}\right )}{{(TP + TN + FP + FN)}^{2}}\end{align*} View Source

1) The Analysis of WKLD Feature Extraction

During epileptic seizures, EEG signals tend to exhibit regularity and reduced complexity. As shown in Figure 8, the x-axis represents the number of sampled data points, while the y-axis represents the feature values extracted using the WKLD method. The red and blue curves correspond to the feature values during normal state and epileptic seizure period, respectively. It can be observed that the WKLD values for normal individuals are generally higher than those during epileptic seizure periods. Therefore, the features extracted by the WKLD method indeed reflect the reduced complexity of signals after epileptic seizures. This demonstrates that the features extracted by the WKLD method can effectively differentiate between normal EEG signals and signals during epileptic seizure periods.

FIGURE 8.

The WKLD values of some samples in normal individuals and during seizures.

Show All

Although both epileptic interictal and ictal periods involve abnormal neuronal discharges, as depicted in Figure 9, the WKLD values during interictal periods are generally higher than those during ictal periods. This indicates that during interictal periods, there is a decrease in the regularity of EEG signals and a higher level of complexity. Traditional entropy-based feature extraction methods are not sensitive to subtle changes and cannot effectively distinguish between interictal and ictal periods. However, the introduction of WKLD largely addresses this issue. By using fixed-size signal windows and analyzing the entropy of feature vectors, WKLD captures the degree of disorder and complexity of signals, accurately reflecting the differences in EEG signals between interictal and ictal periods.

FIGURE 9.

The WKLD values of some samples during epileptic seizures and interictal periods.

Show All

2) The Result of Classifcation

In this experiment, the epilepsy diagnosis model used two methods for feature extraction. The first method employed the Discrete Wavelet Transform to extract important features from the dataset. The second method used WKLD analysis to extract the regularity of the time series in the dataset. These two types of features were then integrated and input into the ResMTN classifier for classification. We used 10-fold cross-validation, dividing the feature-extracted data into ten equal parts. Nine parts were used for model training and the remaining part for testing. After each training, the model parameters were evaluated on the validation set. As shown in Figure 10, the model achieved an accuracy of 98% in diagnosing and classifying interictal and ictal epileptic seizures, with a specificity and sensitivity both reaching 98.18%. This achieved high-precision detection of different states of epilepsy.

FIGURE 10.

The WKLD values of some samples during epileptic seizures and interictal periods.

Show All

C. Comparison Between the Proposed Mmethod and Existing Methods

To further validate the effectiveness of the proposed method, this section primarily focuses on comparing the proposed approach with other existing methods.

1) DWT VERSUS DWT and WKLD

When only using DWT for feature extraction, the classification accuracy of the ResMTN classifier for epilepsy reaches 97.05%, with specificity and sensitivity both at 97.26%. Conversely, when the features extracted from both DWT and WKLD are jointly input into ResMTN, as shown in Figure 11, the accuracy, specificity, and sensitivity all improve by around 1%, reaching 98%, 98.18%, and 98.18% respectively.

FIGURE 11.

DWT VERSUS DWT and WKLD.

Show All

2) WKLD Compares Existing Entropy Methods

By combining the proposed WKLD feature extraction method with existing entropy feature extraction methods, including approximate entropy, sample entropy, and fuzzy entropy, each coupled with DWT, we obtained features for epilepsy diagnosis. These features were then input into the ResMTN classifier, and the epilepsy diagnosis results were compared as shown in Table 3. It can be observed that the feature extraction method based on WKLD performed best on the ResMTN classifier, surpassing the common entropy feature extraction methods in performance. Therefore, incorporating WKLD indeed enhances the effectiveness of feature extraction.

TABLE 3 Performance of entropy methods

3) The Proposed Method VERSUS Existing Methods

Table 4 presents the classification accuracy achieved by different combinations of feature extraction methods and classifiers. It can be observed that compared to other algorithms, the proposed method achieves the highest accuracy (98%) in distinguishing between epileptic interictal and ictal periods. Compared to the combination of feature extraction using DWT and fuzzy ApEn with an SVM classifier, the accuracy of our proposed method improves by nearly 2%. Additionally, compared to the combination of feature extraction using Weighted Multiscale Renyi Permutation Entropy (WMRPE) and an LS-SVM classifier, our proposed method improves accuracy by 0.5%. These results demonstrate the effectiveness of the WKLD feature extraction method and ResMTN classifier combination for epilepsy state classification.

TABLE 4 Accuracy of different methods

In this study, the ResMTN classifier achieved high accuracy in diagnosing different seizure states. To further validate the model performance, we compared it with other methods using evaluation metrics such as specificity and sensitivity. As shown in Figure 12, when classifying seizure interictal and ictal periods using the discrete cosine transform (DCT) feature extraction method and SVM classifier, the accuracy, sensitivity, and specificity were 96.35%, 96.50%, and 96.2% respectively [29]. When employing the tunable-q wavelet transform feature extraction method with an SVM classifier for seizure diagnosis, the accuracy reached 98.00%, with sensitivity and specificity both at 98.00% [30]. Compared to the method combining TQWT and SVM, our experimental results also demonstrated a classification accuracy of 98%, with a slight increase of 0.18% in both sensitivity and specificity.

FIGURE 12.

Performance comparison chart of different methods using the same dataset.

Show All

To further analyze the performance of the WKLD feature extraction method and the ResMTN classifier in binary classification tasks, we calculated evaluation metrics including MCC, Kappa, and CSI, with values of 0.9861, 0.9860, and 0.9859, respectively. This indicates that the method proposed in this paper demonstrates high consistency and stability between the predicted results and the actual situation.

SECTION V.

Conclusion

This paper proposes a feature extraction method combining DWT and WKLD, along with an epilepsy detection method based on the ResMTN classifier. This approach demonstrates significant effectiveness in classifying epileptic interictal and ictal periods, with experimental results showing an accuracy of 98%, superior to other commonly used machine learning methods. However, the study has some limitations. Firstly, in addressing complex classification problems, ResMTN may need to employ high-degree polynomials. This increases the complexity of the model, leading it to learn not only the fundamental patterns in the data but also the noise. This excessive learning of noise reduces the model’s generalization ability and increases the risk of overfitting. Secondly, the experiment is based on publicly available datasets, and the performance of the method in adapting to epileptic types with individual differences needs further validation. Therefore, in the future, we will integrate individual differences into the classification model. Additionally, we will collaborate with medical institutions to conduct clinical experimental research on epilepsy patients, aiming to further refine the method based on first-hand data.

ACKNOWLEDGMENT

(Haoyang Cai and Ying Yan contributed equally to this work.)

References is not available for this document.

WKLD-Based Feature Extraction for Diagnosis of Epilepsy Based on EEG

Abstract:

Metadata

Abstract:

Funding Agency:

Introduction