Loading web-font TeX/Math/Italic
An Ensemble Voting Approach With Innovative Multi-Domain Feature Fusion for Neonatal Sleep Stratification | IEEE Journals & Magazine | IEEE Xplore

An Ensemble Voting Approach With Innovative Multi-Domain Feature Fusion for Neonatal Sleep Stratification


The abstract illustrates neonatal sleep classification using a two-channel EEG system, optimized for IoMT. It involves data collection, expert annotation, segmentation, a...

Abstract:

A limited number of electroencephalography (EEG) channels are useful for neonatal sleep classification, particularly in the Internet of Medical Things (IoMT) field, where...Show More

Abstract:

A limited number of electroencephalography (EEG) channels are useful for neonatal sleep classification, particularly in the Internet of Medical Things (IoMT) field, where compact and lightweight devices are essential to monitoring health effectively. A streamlined and cost-effective IoMT solution can be achieved by utilizing fewer EEG channels, thereby reducing data transmission and device processing requirements. Using only two channels of an EEG device, this study presents a binary and multistage classification of neonatal sleep. The binary classification (sleep vs awake) achieved an accuracy of 87.56%, and a Cohen’s kappa of 74.13%. The quiet sleep ( Q_{S} ) detection accuracy was 95.63%, with a Cohen’s kappa of 83.87%. For the three-stage classification, accuracy was 83.72%, and Cohen’s kappa was 69.73%. With only two channels, these are the highest performance parameters. The focus is on the fusion of features extracted through flexible analytical wavelet transform (FAWT) & discrete wavelet transform (DWT), ensemble-based voting models, and fewer channels. To feed crucial features into the ensemble-based voting model, feature importance, feature selection, and validation mechanisms were used. To design the voting classifier, several machine learning models were used, compared, and optimized. With SelectKBest feature selection, the proposed methodology was found to be the most effective. By using only two channels, this study shows the practicality of classifying neonatal sleep stages.
The abstract illustrates neonatal sleep classification using a two-channel EEG system, optimized for IoMT. It involves data collection, expert annotation, segmentation, a...
Published in: IEEE Access ( Volume: 12)
Page(s): 206 - 218
Date of Publication: 22 December 2023
Electronic ISSN: 2169-3536

Funding Agency:


CCBY - IEEE is not the copyright holder of this material. Please follow the instructions via https://creativecommons.org/licenses/by/4.0/ to obtain full-text articles and stipulations in the API documentation.
SECTION I.

Introduction

The Internet of Medical Things (IoMT) offers the potential to improve neonatal health outcomes effectively and affordably through remote monitoring and diagnosis [1]. Using IoMT for health monitoring can improve accuracy, reliability, and convenience while reducing hospitalization and visitation costs [2], [3]. Additionally, this approach enhances data collection for research, resulting in a better understanding of neonatal sleep patterns and disorders [4]. It is important, however, to optimize multiple domains when monitoring neonatal sleep patterns using Electroencephalography (EEG) devices to fully realize the potential benefits of IoMT devices. A high-quality EEG device with fewer channels can reduce physical and financial burdens in neonates due to their physiology and size. Because IoMT devices are small and low-power, they are well suited for neonatal sleep monitoring in remote areas where multi-channel and costly devices are not easily available. They can be developed more efficiently and cost-effectively by using fewer EEG channels. Further, by reducing the amount of data to be processed and transmitted to the cloud, the solution becomes more cost-effective [3], [5].

The proper assessment of neonatal sleep patterns is crucial for the early detection of health issues like sleep apnea and neurological disorders. EEG devices provide detailed information on different sleep stages, which play a vital role in healthy brain development and physical growth [6]. By categorizing EEG signals, abnormal sleep patterns can be detected accurately, allowing for timely interventions to improve neonatal health outcomes. Additionally, understanding a neonate’s sleep staging can aid parents and caregivers in predicting when an infant will be awake or sleepy. Negligence in this area can lead to serious sleep-related issues like sudden infant death syndrome (SIDS) and accidental suffocation, and strangulation in bed (ASSB). Overall, research on neonatal sleep staging can provide valuable insights into early brain and body development [7], [8], [9].

Researchers are employing advanced signal processing techniques, machine learning (ML), and deep learning (DL) algorithms [10], [11], [12], [13]. These methods can help in extracting relevant features from the EEG signals, removing artifacts, and learning patterns that can distinguish between different sleep stages. Moreover, the development of standardized protocols for recording and analyzing neonatal EEG signals can also contribute to improving the accuracy and reliability of sleep stage classification.

However, neonatal EEG signals have different characteristics compared to adult EEG signals [14], [15]. Neonatal EEG signals typically exhibit lower frequency content, higher amplitude, and slow wave activity, making them challenging to classify accurately for sleep monitoring [16], [17]. Secondly, the brain development of neonates is rapid and dynamic, which results in changes in EEG patterns over a short period of time. This makes it difficult to establish consistent and reliable features for classification. Thirdly, the presence of various artifacts [18], such as muscle movements, eye blinks, and equipment noise, further complicates the analysis and interpretation of neonatal EEG signals. These artifacts can interfere with the underlying EEG patterns and lead to incorrect classification. Fourthly, sleep stages in neonates are not as well defined as in adults. Neonatal sleep is often divided into active sleep ($A_{S}$ ), quiet sleep ($Q_{S}$ ), and awake. However, these stages are not as clearly distinguishable as in adults, and transitions between stages may be more subtle, making it harder to identify and classify sleep stages accurately. Fifthly, the lack of standardized recording and analysis techniques for neonatal EEG signals is another factor that complicates classification. There is no universally agreed-upon method for recording or processing neonatal EEG, leading to variations in data quality and interpretability [19].

Keeping in mind all of the obstacles in neonatal research for sleep monitoring, our study introduces a multidomain approach to analyzing neonatal EEG data, surpassing existing methods. This advancement is primarily due to our comprehensive feature extraction strategy, which integrates discrete wavelet transform (DWT), flexible analytical wavelet transform (FAWT), and features from frequency bands and temporal characteristics. This method is specifically tailored to address the unique properties of neonatal EEG data, which differ markedly from adult EEG signals. To determine the optimal combination for classifying neonatal sleep stages, we also assessed the effectiveness of several classifiers based on different ML techniques:

  1. Extra Trees Classifier with Grid Search CV (ETCG)

  2. Quadratic Discriminant Analysis (QDA)

  3. Ensemble Random Forest Classifier (ERFC)

  4. K Neighbors Classifier (KNN)

  5. Artificial Neural Network (ANN)

  6. Bagging Ensemble Model (BEM)

  7. Extra Trees Classifier (ETC)

  8. Gaussian NB (GNB)

  9. Voting Classifier with Estimators:

    • Logistic Regression (LR)

    • Decision Tree Classifier

    • Support Vector Classifier

  10. Voting Classifier with Estimators:

    • LR

    • GNB

    • Random Forest Classifier (RFC)

  11. Stacking Classifier with Estimators:

    • KNN

    • Multi-layer Perceptron (MLP)

    • RFC

  12. Gradient Boosting Classifier (GBC)

  13. Ada Boost Classifier (ABC)

  14. Multi-layer Perceptron (MLP)

  15. Linear Discriminant Analysis (LDA)

  16. Support Vector Classifier (SVM)

Main Contributions of Our Work Are as Follows:

Access to basic medical services is crucial for neonatal health, and the lack of such services can lead to preventable illnesses and complications. Addressing health facility disparities is important to improve neonatal outcomes, regardless of geographic location. The article’s main contributions are:

  1. This study has utilized a thorough multi-domain feature extraction methodology that includes DWT, FAWT, frequency band features, and temporal features. The FAWT along with the DWT framework, is a key component of this methodology. FAWT is adept at breaking down signals into low-pass (LP) and high-pass (HP) channels, which is useful for adjusting important model parameters. The FAWT framework’s capability to delve deeply into the signal characteristics makes it a vital tool in the analysis of non-stationary signals. This diverse feature extraction fusion improves the depth and breadth of signal analysis and serves as a solid foundation for the research.

  2. To stratify neonatal sleep states, we developed an ensemble voting classifier using five base models, which were selected based on their superior performance across ten EEG channels as evidenced in experimental results. The algorithm was validated using datasets from a variety of participants and showed good classification accuracy for neonatal sleep states.

  3. This study identifies neonatal sleep classification’s two most effective EEG channels. Combining two channels further improves classification accuracy. The proposed model allows neonatal sleep monitoring with single and dual-channel settings since not all intensive care units have multi-channel EEG devices, and it could be very effective for remote areas.

  4. The proposed stratification scheme efficacy is established across both binary and multiclass scenarios, employing the dataset outlined in [20]. To ensure robustness and reliability, a k-fold cross-validation methodology is used.

The paper is structured as follows: Section II reviews the relevant literature, while Section III elaborates on multi-domain feature extraction, feature selection, normalization, and employment of classifiers. Section IV provides the ablation study, illustrating how the proposed method is designed based on experimental results. The results, discussion, challenges, and comparisons to existing work are presented in Section V. Finally, the paper concludes in Section VI by summarizing the study’s observations and outlining future research directions.

SECTION II.

Related Work

ML-based neonatal sleep staging automates the classification of sleep stages from EEG recordings, providing a more objective and reliable method than manual scoring by experts. ML models can quickly analyze large volumes of EEG data and detect subtle changes in sleep patterns that may be missed by human experts. This leads to increased consistency in sleep stage classification and improved diagnosis and treatment of underlying neurological or developmental issues in neonates [21]. Using large datasets and correctly annotated EEG recordings from neonates in various sleep states, ML and DL algorithms can be trained to become more accurate and generalizable.

Awais et al. achieved high accuracy in neonatal sleep classification for sleep and wake using a diffusion convolutional neural networks (DCNN) model for feature extraction and SVM for classification, with an accuracy of 93.8 ± 2.2% and an F1-score of 0.93 ± 0.3 [10]. However, their use of video EEG data raises privacy concerns due to the identifiable information it contains, such as neonates’ faces and voices [22], [23]. In contrast, another study utilized an optimized Sinc-based DL model for classifying ($Q_{S}$ ) based on EEG data alone [24]. This approach is advantageous because neonates in the AS stage have EEG signals with a voltage amplitude similar to those in the awake stage, making it difficult to distinguish between them. However, DL models are typically more complex and require more parameters than ensemble ML models, resulting in higher computational demands [25]. Therefore, their applicability in the IoMT field may be limited due to high computational requirements.

Ansari et al. developed an 18-layer convolutional neural network (CNN) for automatic neonatal sleep state classification by separating ($Q_{S}$ ) from $non-(Q_{S}$ ) [26]. They used multi-channel EEG recordings of 26 preterm neonates and later introduced a multi-scale deep convolutional neural network for the same purpose [24]. By using a novel Sinc block, they were able to extract temporal features across multiple timescales.

Yu et al. performed automatic neonatal sleep state classification using publicly available single-channel EEG datasets [27]. They classified neonate’s sleep patterns into W, N1, N2, N3, and REM based on the MRASleepNet module, which comprised a feature extraction module, a multi-resolution analysis (MRA) module, and a gated multi-layer perceptron (gMLP) module. Abbas et al. utilized single-channel EEG data for neonatal sleep-wake classification, employing a SVM algorithm. The results demonstrated an accuracy of 77.5% in sleep-wake classification, with a mean kappa of 55% [28].

Fraiwan and Alkhodari used the long short-term memory (LSTM) learning technique to perform neonatal three-stage sleep classification by using EEG recordings of 16 full-term neonates [11]. Zhu et al. designed a novel multi-scale hierarchical neural network (MS-HNN) for the automatic classification of neonates’ sleep with one, two, and eight channels [29]. They incorporated multi-scale convolutional neural networks (MSCNN), squeeze-and-excitation (SE) blocks, and temporal information learning (TIL) to extract more features that involve temporal information in sleep signals. With single and eight EEG channels, they achieved around 76.5% accuracy for three-stage classification.

In 2020, Abbasi et al. designed a deep MLP for (sleep vs awake) neonate sleep using multi-channel EEG recordings from 19 neonates [30]. They extracted a total of 12 features, including four frequency domains and eight time-domain features, from nine channels. Later, in 2021, they performed three-state neonatal sleep classification using the same EEG dataset [20] and CNN, SVM, and MLP classifiers, two ensemble algorithms for bagging and stacking were used. The study demonstrated that by using 9 channels, the classification accuracy for (sleep vs awake) was 82.53% while for $Q_{S}$ v $A_{S}$ v awake stage, it was around 81.99%. A single channel, however, reduced the test accuracy to 71.7% for binary and 64.72% for three stages. The researchers also presented that by increasing the number of channels to four, classification accuracy improved to 73.15%. Another point to notice is that during the post-processing of EEG data, they applied a smoothing filter that halted the signal for a few minutes. This means the system cannot be used for real-time applications, especially for IoMT devices.

However, in this study, we have used only two channels, proposing a simplified data acquisition process for an EEG device, resulting in a reduction in data transfer to the cloud and making the solutions more cost-effective as discussed in our prior studies [3], [5].

SECTION III.

Materials and Methods

A. Study Flowchart

The flowchart depicted in Fig. 1 provides a simplified overview of the study methodology, illustrating the progression from the initial level to the final stage.

FIGURE 1. - Flowchart and methodological overview. (The neonate picture is from freepik.com and we have a premium plan).
FIGURE 1.

Flowchart and methodological overview. (The neonate picture is from freepik.com and we have a premium plan).

B. Dataset

Nineteen neonates were included in a study conducted at the NICU of the Children’s Hospital of Fudan University, Shanghai, China. The study was approved by the Research Ethics Committee of the Children’s Hospital (approval No. (2017) 89). The neonates underwent Video EEG (VEEG) recordings, and an average of 120 minutes of data was collected for each neonate, during which at least one sleep cycle was observed. A full 10–20 system for electrode placement, which included 17 electrodes, was used to acquire the EEG recordings. However, in this study, we have utilized 10 channels: F3-C3, C3-P3, F4-C4, C4-P4, F3-T3, T3-P3, F4-T4, T4-P4, T3-C3, C4-T4. Out of the 10 channels we tested, we selected the two that performed the best and analyzed their results. Further details of the dataset and data annotation are provided in previous studies [10], [20], [30].

C. Preparation and Pre-Processing of Data

EEG may contaminate with noise and artifacts, which could compromise its accuracy and reliability. EEG recordings were originally sampled at 500Hz, and unwanted signals needed to be removed before further processing. A three-step pre-processing method was employed to remove noise and artifacts from EEG recordings. Firstly, we used a Finite Impulse Response (FIR) filter (high-pass (HP) = 0.3Hz and low-pass (LP) = 35Hz) from EEGLAB to filter the EEG signals. EEGLAB offers more advanced EEG filtering capabilities than other packages [31]. During this step, the unwanted frequency range was removed. Second, we segmented the multichannel EEG into epochs of 30 seconds and labeled them accordingly. Segmentation made it easier to analyze the EEG data since it divided it into manageable segments. In the final step, we removed artifacts that constituted around 20% of the EEG data post-segmentation, and only noise-free recordings were used to train and test the model. To validate the result, 7-fold cross-validation is used.

D. Multi-Domain Feature Extraction

Neonatal EEG signals vary distinctly from adult signals due to intra-class differences based on brain structure and function. Automatic sleep stage detection in the neonatal domain faces challenges such as limited datasets [32], vague data descriptions, and low-precision detection methods. Hence, it is crucial to improve feature extraction techniques to enable accurate and precise sleep stage detection. A total of 173 features were extracted from each epoch: 12 from the spectral statistics of four bands ($\alpha $ , $\beta $ , $\theta $ , $\delta $ ), 11 from temporal statistics, 55 from DWT, and 95 from FAWT.

1) Discrete Wavelet Transform

The DWT is a commonly used technique for the time-frequency analysis of signals [33], [34]. It decomposes a signal into a linear combination of translations and dilations of a basis function called the “Mother Wavelet,” as well as a scaling function. During the DWT process, the signal is filtered using an LP filter and a HP filter, resulting in approximate coefficients $A_{i}$ and detailed coefficients $D_{i}$ . These coefficients are then downsampled, and the process is repeated to obtain the next level of approximate and detailed coefficients. This iterative process establishes time-scale regions, rather than time-frequency regions, for the signal.

In this study, we concentrated on EEG signals, which generally lack useful frequencies above 30Hz. As a result, we opted for 7 decomposition levels enabling us to maintain the signal segments that strongly correlate with the frequencies necessary for classification. This led to the EEG signals being decomposed into seven detail coefficients ($D_{1}$ -$D_{7}$ ) and one final approximation coefficient ($A_{7}$ ). Furthermore, we chose the Daubechies wavelet of order 4 (db4) for detecting changes in EEG signals as it offers a higher accuracy score at the cost of computation.

Features Derived from DWT:

The wavelet coefficients obtained offer a concise representation that illustrates the energy distribution of the EEG signal in both time and frequency domains. Table 1 displays the frequencies corresponding to various decomposition levels for the db4 wavelet, given a sampling frequency of 500Hz.

TABLE 1 The Frequencies Related to the Various Levels of Decomposition for a db-4 Filter Wavelet With a 500 Hz Sampling Frequency
Table 1- 
The Frequencies Related to the Various Levels of Decomposition for a db-4 Filter Wavelet With a 500 Hz Sampling Frequency

To represent the time-frequency distribution of EEG signals, the study used the following statistical features:

  1. The mean of the absolute values of coefficients in each sub-band;

  2. The median of coefficients in each sub-band;

  3. The root mean square values of each sub-band;

  4. The standard deviation of coefficients in each sub-band;

  5. The ratio of absolute mean values of adjacent sub-bands;

  6. The skewness of each sub-band;

  7. The kurtosis of each sub-band.

2) Flexible Analytic Wavelet Transform

A more comprehensive view of time-frequency attributes of signals is offered by FAWT, as described in [35]. This method is particularly suitable for analyzing oscillatory signals since it includes Hilbert transform atom pairs. FAWT is governed by three parameters: the Quality Factor $Q$ , the Number of Decompositions $J$ , and the Redundancy Factor $r$ . The objective of $Q$ is to control oscillations in the mother wavelet, which is defined as a function of frequency variables and a constant $\beta $ as follows:\begin{equation*} Q = \frac {\omega _{0}}{\Delta \omega }, \quad \beta = \frac {2}{Q + 1} \tag{1}\end{equation*} View SourceRight-click on figure for MathML and additional features. where, $\omega _{0} $ and $\Delta \omega $ represent the central frequency and bandwidth of the signal, respectively. The redundancy factor $r $ deals with the time-localization aspect of the wavelet. An iterative filter bank of HP and LP channels is employed in FAWT. This configuration allows custom tuning of the dilation factor, $Q $ , and $r $ through parameter adjustments in constants $\beta $ and $e, f, g, h $ . The parameters $e$ and $f$ are set for up and down-sampling of the HP filter, and $g$ and $h$ are configured for up and down-sampling of the LP channel. The process computes $J$ decomposition levels in an iterative manner, and each level is comprised of LP and HP channels that differentiate the negative and positive frequencies, respectively. The frequency responses $H(\omega)$ and $G(\omega)$ of the HP and LP filters are as follows:\begin{align*} H(\omega)& = \begin{cases} \displaystyle \sqrt {ef} & \text {for } |\omega | < \omega _{p}, \\ \displaystyle \sqrt {ef}\theta \left ({\frac {\omega - \omega _{p}}{\omega _{s} - \omega _{p}}}\right) & \text {for } \omega _{p} \leq \omega \leq \omega _{s}, \\ \displaystyle \sqrt {ef}\theta \left ({\frac {\pi - (\omega + \omega _{p})}{\omega _{s} + \omega _{p}}}\right) & \text {for } -\omega _{s} \leq \omega \leq -\omega _{p} \\ \displaystyle 0 & \text {for } |\omega | \geq \omega _{s}. \end{cases} \\ G(\omega) &= \begin{cases} \displaystyle \sqrt {gh}\theta \left ({\frac {\pi - \omega - \omega _{0}}{\omega _{1} - \omega _{0}}}\right) & \text {for } \omega _{0} \leq \omega < \omega _{1}, \\ \displaystyle \sqrt {gh} & \text {for } \omega _{1} < \omega < \omega _{2}, \\ \displaystyle \sqrt {gh}\theta \left ({\frac {\omega - \omega _{2}}{\omega _{3} - \omega _{2}}}\right) & \text {for } \omega _{2} \leq \omega \leq \omega _{3}, \\ \displaystyle 0 & \text {for } \omega \in [(0,\omega _{0})\\ \displaystyle & \cap (\omega _{3},2\pi)]. \end{cases}\end{align*} View SourceRight-click on figure for MathML and additional features.

Parameters associated with the above-mentioned filter banks are as follows:\begin{align*} \omega _{p} &= (1 - \beta)\pi + e^{e}, \tag{2}\\ \omega _{s} &= \pi f, \tag{3}\\ \omega _{0} &= (1 - \beta)\pi + e^{g}, \tag{4}\\ \omega _{1} &= e \pi f g, \tag{5}\\ \omega _{2} &= \pi - e^{g}, \tag{6}\\ \omega _{3} &= \pi + e^{g}, \tag{7}\\ \epsilon &\leq e^{-f + \beta f e + f \pi }. \tag{8}\end{align*} View SourceRight-click on figure for MathML and additional features. where $r $ indicates the ratio of input to output samples, which should be greater than one to avoid information loss. It is defined as:\begin{equation*} r = \left ({\frac {g}{h}}\right) \frac {1}{1 - \frac {e}{f}} \tag{9}\end{equation*} View SourceRight-click on figure for MathML and additional features.

For a precise reconstruction, $\beta $ should be less than one, and it is defined as:\begin{equation*} 1-\frac {e}{f} \leq \beta \leq \frac {g}{h} \tag{10}\end{equation*} View SourceRight-click on figure for MathML and additional features.

The shift-invariance, tunable oscillatory conditions, and flexible time-frequency localization of FAWT make it useful for a variety of practical applications [36].

Features Derived from FAWT:

The FAWT implementation entails choosing parameters for seven decomposition levels. For dual-state EEG signals, the subband signals are subsequently reconstructed in descending frequency order. The $e/f $ value is set at 3/4, and the values of $r $ and $Q $ are pre-specified as discussed in our previous study [35]. Furthermore, $g/h $ is set at 1/2 to maintain constraints on $r $ for information preservation, as dictated in (9). Since FAWT offers a high dimensional feature space for searching features, compact statistics such as mean, standard deviation, skewness, and kurtosis are evaluated for each signal subband. The use of such statistical measures is promising for capturing inherent signal information [37]. Six such features are listed below:

  1. Mean absolute value of coefficients within each subband;

  2. Average power within each subband;

  3. Standard deviation within each subband;

  4. Ratio of mean absolute values of adjacent subbands;

  5. Skewness within each subband;

  6. Kurtosis within each subband.

3) Spectral Statistics of Four Bands

Different frequencies of waves in EEG signals, such as $\alpha $ , $\beta $ , $\theta $ , and $\delta $ , are linked with different stages of sleep and cognitive and emotional states. The statistical features that we calculated for each EEG frequency band are the mean, median, and standard deviation. From the spectral statistics of four bands, in total, twelve features were calculated to analyze the relationship between EEG frequency bands and sleep states. These statistical features provide information about the amplitude and variability of each frequency range.

4) Temporal Features of EEG Signals

Time-domain statistical features are a critical tool for analyzing neonatal sleep patterns as derived from EEG signals. Specifically, eleven features were calculated, including mean amplitude, standard deviation, peak-to-peak amplitude, variance, square root, minimum amplitude, maximum amplitude, root mean square, and absolute mean change in amplitude, skewness, and kurtosis

These features have been shown to provide critical information about the underlying neural processes in an efficient and effective manner. Additionally, the analysis also incorporated spectral statistics of four bands and FAWT-DWT-derived features. Overall, the combined use of these features offers a promising approach for developing automated sleep staging schemes that could lead to improved diagnosis and treatment of neonatal sleep-related issues.

E. Feature Normalization

We processed the data in two ways for normalization to reduce overfitting. As a first step, normalize the data by scaling each row to a unit norm. We used the L2 norm, or Euclidean norm, for this. L2 norms are calculated by in (11).\begin{equation*} x\_{}normalized = x / sqrt(sum(x^{2})) \tag{11}\end{equation*} View SourceRight-click on figure for MathML and additional features. where $x$ is a vector and $x\_{}normalized$ the normalized vector.

Min-Max scaling is the next step. In this method, the features in the data are scaled to a given range between 0 and 1.\begin{equation*} x\_{}scaled = (x - min(x)) / (max(x) - min(x)) \tag{12}\end{equation*} View SourceRight-click on figure for MathML and additional features. where $x$ is a vector, $min(x)$ the minimum value of $x$ , $max(x)$ the maximum value of $x$ , and $x\_{}scaled$ is the scaled vector.

By improving data consistency and limiting the impact of extreme values, these methods can help reduce overfitting. These can also improve model performance on data that has yet to be analyzed and promote more model generalization.

F. Feature Importance and Selection

The SelectKBest method was used to select the features. Feature selection is used to identify and select the most relevant features in a dataset and exclude the less relevant ones. The SelectKBest (SelectKBest(score_func=f_classif, k=max_features)) uses a scoring function, such as f_classif, to calculate the significance of each feature with respect to the target variable. The scoring function is based on the ANOVA F-value, which measures the difference between the means of two groups. The k parameter specifies how many top features should be selected. Lastly, only the selected features are returned from the transformed input data [38]. In high-dimensional datasets, feature selection is particularly important because it reduces the dimensionality of the dataset and can improve the performance of ML models.

G. Classifiers

In this study, we evaluated and compared several different classifiers using various ML techniques to design an optimal voting classifier. The voting classifier combines the predictions of multiple base classifiers to generate a more robust and generalized model. Essentially, it’s like seeking a “second opinion” from other models to make a more informed decision. In this study, we have utilized hard voting. Each model in the ensemble “votes” for a class, and the class that receives the majority of votes is the chosen prediction. The ensemble approach outperforms the individual models in this study because it takes into account each classifier’s strengths and mitigates its weaknesses.

Overview of Base Classifiers:

Each model’s performance metrics on each channel determine the most appropriate base classifier. Additionally, the results of applying the models to the two most effective channels are considered in this decision-making process. The performance outcomes for the individual and combined channel applications are detailed in Table 2 and Table 3. The tables provide information about the effectiveness of each model, making it easier to choose the best classifier based on data. It ensures that the choice is based on empirical data and highlights the advantages and disadvantages of each model in terms of channel-specific and integrated channel performance.

TABLE 2 Model Performance for Binary Classification on Two Best Channels
Table 2- 
Model Performance for Binary Classification on Two Best Channels
TABLE 3 Models Performance for Three Stage Classification on Two Best Channels
Table 3- 
Models Performance for Three Stage Classification on Two Best Channels

1) Extratreesclassifier

The Extra Trees classifier is an ensemble approach. Splits are selected randomly rather than based on best splits, so it introduces more randomness. A reduction in variance may be achieved by this randomness at the expense of a slight increase in bias.\begin{equation*} H(T) = - \sum _{i} p_{i} \log p_{i} \tag{13}\end{equation*} View SourceRight-click on figure for MathML and additional features. where $H(T) $ represents the entropy of the target variable $T $ , and $p_{i} $ represents the proportion of the target variable’s category $i $ in the dataset.

2) Decisiontreeclassifier

In decision trees, data is split recursively based on attribute values, and subsets are measured by metrics like Gini impurity in order to achieve pure or nearly pure subsets.\begin{equation*} Gini(T) = 1 - \sum _{i} p_{i}^{2} \tag{14}\end{equation*} View SourceRight-click on figure for MathML and additional features.

3) Randomforestclassifier

A Random Forest classifier is an ensemble of decision trees that are trained on different subsets of data and features. Predictions are averaged to improve generalization and robustness.\begin{equation*} O_{\text {RF}} = \frac {1}{N} \sum _{i=1}^{N} O_{i} \tag{15}\end{equation*} View SourceRight-click on figure for MathML and additional features.

4) Mlpclassifier

MLP classifiers are artificial neural networks that have three layers: an input layer, a hidden layer, and an output layer. In particular, MLPs excel at capturing complex relationships in data. This study, however, used a single hidden layer with 1000 neurons.

5) Baggingclassifier

The Bagging (Bootstrap Aggregating) classifier is an ensemble model, which trains multiple instances of a base classifier (in our case, a Decision Tree), each on a random subset of training data.

H. Hybrid Ensemble Voting Model

This study presents a hybrid ensemble voting model to optimize the reliability and accuracy of neonatal sleep stage classification. It consists of a DecisionTreeClassifier, four configurations of ExtraTreesClassifier, RandomForestClassifier, MLPClassifier, and a BaggingClassifier with a DecisionTree base estimator. The hybrid ensemble approach integrates the strengths of different base models, including decision trees for their interpretability, Extra Trees and Random Forests for their randomness and generalizability, and neural networks for capturing complex relationships in neonatal EEG data. A voting mechanism is used to aggregate predictions from these base models, thereby producing a final classification that integrates the strengths and mitigates the weaknesses of each model. Based on our extensive experimental evaluations, the approach is robust and accurate. The model pseudo code is provided in algorithm 1.

Algorithm 1 Ensemble Voting ClassifierWith Multiple Base Classifiers and K-Fold Cross-Validation

1:

Define various base classifiers: multiple ExtraTreesClassifiers (e.g., 540, 900, 470, 300 estimators),

a DecisionTreeClassifier,

a RandomForestClassifier (e.g., 400 estimators),

a MLPClassifier (e.g., single hidden layer with 1000 neurons),

a BaggingClassifier using DecisionTree as base estimator (e.g., 1000 estimators).

2:

Combine these classifiers into a voting classifier, voting=hard.

3:

Set the number of folds 7 for k-fold cross-validation and create a k-fold cross-validator kf.

4:

Initialize empty lists to store evaluation metrics (accuracies, precisions, recalls, f1_scores, cohen_kappas).

5:

for each fold in the k-fold cross-validation do

6:

Split the data into training and validation sets.

7:

Train the voting classifier using the training data.

8:

Predict the labels for the validation data.

9:

Calculate evaluation metrics for the current fold and append them to the respective lists.

10:

Print the metrics for the current fold.

11:

end for

12:

Calculate the average of the evaluation metrics across all folds.

13:

Print the average evaluation metrics.

I. Evaluation Metrics for Classification Models

We have used five key metrics to evaluate classification models, each providing insights into a different aspect of performance. In general, accuracy measures correctness, while precision and recall measure positive predictions’ accuracy. To provide a balanced view, the F1 score harmonizes precision and recall. Additionally, Cohen’s Kappa provides a robust measure of model accuracy that accounts for chance agreement. A classification model’s assessment and improvement are guided by these metrics.\begin{equation*} Accuracy = \frac {(TP + TN)}{(TP + TN + FP + FN)} \tag{16}\end{equation*} View SourceRight-click on figure for MathML and additional features. where $TP$ (True Positive) and $TN$ (True Negative) represent correct positive and negative predictions, respectively. $FP$ (False Positive) and $FN$ (False Negative) signify incorrect positive and negative predictions, respectively.\begin{align*} Precision &= \frac {TP}{(TP + FP)} \tag{17}\\ Recall &= \frac {TP}{(TP + FN)} \tag{18}\\ F1-score &= \frac {2 * (precision * recall)}{(precision + recall)} \tag{19}\\ k &= \frac {(P_{o} - P_{e})}{(1 - P_{e})} \tag{20}\end{align*} View SourceRight-click on figure for MathML and additional features. where $k$ denotes Cohen’s kappa, $P_{o}$ observed agreement between raters (proportion of cases where the raters agreed), and $P_{e}$ expected agreement between raters by chance based on the distribution of ratings.

SECTION IV.

Ablation Study

This ablation study presents a comprehensive analysis of sixteen ML models (names are given in Section I), encompassing ensemble methods, stacking, and voting classifiers. The objective is to categorize neonatal sleeping stages accurately. We examined various feature selection techniques, focusing particularly on the SelectKBest method, within both binary and multiclass classification frameworks.

A. Single Channel Analysis Using MULTIDOMAIN Feature

1) Binary Classification: Sleep VS Awake

The fusion of FAWT, DWT, spectral, temporal features (FDSTF) method outperformed the combination of DWT, spectral, temporal features (DSTF) method in the binary classification of neonatal EEG data (sleep vs awake), as shown in Fig. 2. In the EEG channel F3-T3, the ensemble voting model with FDSTF achieved an accuracy of 84.20%, surpassing DSTF’s 80.69%. Similarly, in the C4-T4 channel, the ensemble voting model with FDSTF achieved 85.80% accuracy, outperforming DSTF’s 82.21%. Nevertheless, Kappa scores for both methods showed a decline in classification consistency.

FIGURE 2. - DSTF vs FDSTF: sleep vs awake.
FIGURE 2.

DSTF vs FDSTF: sleep vs awake.

2) Three-Stage Classification: $Q_{S}$ vs $A_{S}$ vs AWAKE

FDSTF outperformed DSTF in the more complex three-stage classification, as depicted in Fig. 3. F3-T3 and C4-T4 achieved accuracies of 80.71% and 80.57%, respectively, for FDSTF, whereas DSTF achieved 80.28% and 77.66%, respectively. In terms of Kappa scores, DSTF scored 62.22% and 56.96%, while FDSTF achieved 63.91% and 63.83% for F3-T3 and C4-T4, respectively.

FIGURE 3. - DSTF vs FDSTF: 
$Q_{S}$
 vs 
$A_{S}$
 vs awake.
FIGURE 3.

DSTF vs FDSTF: $Q_{S}$ vs $A_{S}$ vs awake.

B. Comparative Analysis of Feature Selection Techniques

A detailed comparative analysis is presented, evaluating three feature selection techniques—Principal Component Analysis (PCA), SelectKBest, and SelectPercentile—for two classification tasks: (sleep vs awake) and ($Q_{S}$ vs $A_{S}$ vs awake). Fig. 4 elucidates the performance scores for each method within these contexts, thereby guiding the selection of the most effective feature selection technique.

FIGURE 4. - Comparison of feature selection techniques across different scenarios.
FIGURE 4.

Comparison of feature selection techniques across different scenarios.

In the (sleep vs awake) task, SelectKBest demonstrates superior performance, achieving a score of 0.8756, closely trailed by SelectPercentile with a score of 0.8741. PCA, while effective, scores a lower 0.8085. For the ($Q_{S}$ vs $A_{S}$ vs awake) classification, SelectKBest leads again, with a score of 0.8384, followed by SelectPercentile at 0.8367. PCA shows a more distinct performance gap, scoring 0.7358. These findings, as depicted in Fig. 4, are crucial for identifying the most suitable feature selection method for each task based on empirical performance data.

C. Selection of Best Channels

This segment involved a thorough analysis of individual EEG channels in binary and multi-stage classifications using multi-domain features, feature selection, and ensemble modeling techniques. The evaluation focused on accuracy and Kappa value. In binary classification, the C4-T4 channel showed notable effectiveness, while in three-stage classification, the F3-T3 channel stood out, indicating its utility in more nuanced EEG applications.

D. Selection of Best Ml Models

The performance of various ML models was evaluated in both binary (sleep vs awake) and three-stage ($Q_{S}$ vs $A_{S}$ vs awake) classifications using channels C4-T4 and F3-T3, as well as dual-channel setup. Tables 2 and 3 detail these models’ effectiveness in differentiating neonatal sleep stages.

SECTION V.

Result and Discussion

A. Cross-Validation Results for Sleep vs Awake: Dual Channel (F3-T3, C4-T4 Analysis)

In Table 4, we present the 7-fold cross-validation results for (sleep vs awake) classification using dual channels. An ensemble-based voting classifier using FDSTF was used to obtain the results. For each fold, the table shows performance metrics such as accuracy, precision, recall, F1 score, and the kappa statistic. All folds achieved an average accuracy of 87.56%, and the mean kappa value was 74.13%.

TABLE 4 Cross-Validation Results for Each Fold: Sleep vs Awake
Table 4- 
Cross-Validation Results for Each Fold: Sleep vs Awake

B. Cross-Validation Results for $Q_{S}$ Detection: Dual Channel (F3-T3, C4-T4 Analysis)

For $Q_{S}$ detection, we demonstrate the results of a 7-fold cross-validation in Table 5. The results were achieved using an ensemble-based voting classifier utilizing FDSTF. For each fold of the validation process, accuracy, precision, recall, F1 score, and Cohen’s kappa statistic are reported. Overall, the proposed methodology achieved an impressive average accuracy of 95.63% and a mean kappa statistic of 83.87%, reflecting good agreement between predictions and actual labels.

TABLE 5 Cross-Validation Results for Each Fold: $Q_{S}$ Detection
Table 5- 
Cross-Validation Results for Each Fold: 
$Q_{S}$
 Detection

C. Cross-Validation Results for $Q_{S}$ vs $A_{S}$ vs Awake State: Dual Channel (F3-T3, C4-T4 Analysis)

Table 6 provides the results of a 7-fold cross-validation to distinguish between ($Q_{S}$ vs $A_{S}$ vs $awake$ ). An ensemble-based voting classifier with FDSTF was designed to achieve these outcomes. Each fold of the validation process includes metrics such as accuracy and Cohen’s kappa. In the proposed methodology, an average accuracy of 83.72% was achieved and a kappa statistic of 69.73% was reached.

TABLE 6 Cross-Validation Results for Each Fold: ( $Q_{S}$ vs $A_{S}$ vs $Awake$ )
Table 6- 
Cross-Validation Results for Each Fold: (
$Q_{S}$
 vs 
$A_{S}$
 vs 
$Awake$
)

D. Comparison of Proposed Method with Existing Work

Table 7 shows that our methodology for (sleep vs awake), which utilizes data from two channels, an ensemble-based voting classifier, and FDSTF outperforms existing approaches with an accuracy of 87.56% and a Kappa value of 74.13%. Moreover, Abbasi et al. [20], [30], used nine channels and reached 82.53% accuracy score.

TABLE 7 Performance Comparison of Proposed Method and Existing Work for Sleep vs Awake Detection
Table 7- 
Performance Comparison of Proposed Method and Existing Work for Sleep vs Awake Detection

In Table 8, we show that our proposed $Q_{S}$ detection method has a higher accuracy 95.63% as well as a high Kappa value (83.87%), which implies a higher level of reliability than existing models by Abbasi et al. [20], Moghadam et al. [18] and other studies mentioned in the Table 8.

TABLE 8 Performance Comparison of Proposed Method and Existing Work for $Q_{S}$ Detection
Table 8- 
Performance Comparison of Proposed Method and Existing Work for 
$Q_{S}$
 Detection

For the three-stage classification of neonatal sleep, our method achieves 83.72% accuracy and 69.73% Kappa value, compared with other methods that utilize more channels as shown in Table 9. Overall, our proposed approach utilizes fewer channels and ML techniques while exhibiting superior or comparable performance.

TABLE 9 Performance Comparison of Proposed Method and Existing Work for Three Stage Classification ( $Q_{S}$ vs $A_{S}$ vs Awake)
Table 9- 
Performance Comparison of Proposed Method and Existing Work for Three Stage Classification (
$Q_{S}$
 vs 
$A_{S}$
 vs Awake)

E. Discussion

Our research represents an advance in the field of neonatal sleep stratification, which has long struggled with the intricacies and irregularities present in neonatal data, which are frequently exacerbated by outside interferences like feeding and movement. Specifically, our work introduces novel methodologies designed for the analysis of neonatal EEG data, which distinguishes it from typical methods that are primarily focused on adult EEG data.

One of the main focuses of our research is dealing with the peculiarities of neonatal EEG signals, which are characterized by a higher amplitude and a lower frequency in comparison to adult EEG, which poses special analytical challenges. Our approach, which is carefully tailored to these particular characteristics, has proven to be successful in accurately classifying neonatal sleep states, which is especially important considering the rapidity of neonatal brain development and the resulting evolution of their EEG patterns.

The most significant of our contributions is the development of a novel, all-encompassing multi-domain feature extraction methodology. This methodology combines methods like DWT, FAWT, and spectral and temporal features. The use of the FAWT framework is very important since it can intricately break down non-stationary signals, which are commonly observed in newborn EEG, and allow for a thorough examination of signal complexities, enhancing our understanding and analytical precision. Another noteworthy aspect of our research is the development of an ensemble voting classifier and the optimized feature selection process. SelectKbest was found to be the most efficient method for processing neonatal EEG data through extensive experimentation. We further refined our methodology by using an ensemble of the top five ML models, which were selected based on how well they performed across ten EEG channels. The base classifiers in this voting ensemble model are included based on how well they perform across different channels. The selection of base models was grounded in an extensive, empirical study of model performance across multiple EEG channels. This practical approach highlights the usefulness of our method in real-world neonatal care settings, where accuracy and precision are crucial, as well as its dependability.

In the future, while our current methodology adeptly addresses binary to three neonatal sleep stages, it is essential to recognize that neonates exhibit five distinct sleep stages. Consequently, future endeavors will aim to extend the applicability of our model to encompass all these stages. Moreover, our research trajectory includes exploring novel feature extraction methods, integrating with IoT devices for real-time monitoring, and employing advanced ML architectures like recurrent neural networks and transformers.

SECTION VI.

Conclusion

In this research, we demonstrate that a fusion of features extracted from FAWT and DWT, along with temporal and spectral features, is highly reliable for classifying neonatal sleep stages. In particular, our ensemble-based voting model yielded an accuracy of 87.56% for (sleep vs awake), 95.63% for $Q_{S}$ , and 83.72% for differentiating between ($Q_{S}$ , $A_{S}$ , and $awake$ ). Our approach requires only two EEG channels, making it a cost-effective and efficient solution for the Internet of Medical Things (IoMT). Through the implementation of a voting model and thorough validation techniques, we have developed an accurate and practical approach to neonatal sleep stage classification.

References

References is not available for this document.