Conferences >ICASSP 2025 - 2025 IEEE Inter...

EEG Correlation Analysis-guided Graph Local Enhanced Feature Learning For Emotion Recognition

Abstract:

EEG-based emotion recognition is a key technology in brain-computer interfaces. Many previous studies have applied deep learning methods to mine emotion-related features ...Show More

Metadata

Abstract:

EEG-based emotion recognition is a key technology in brain-computer interfaces. Many previous studies have applied deep learning methods to mine emotion-related features in EEG to decode emotions. However, they overlooked the importance of electrode correlations and varying brain region activation during emotional processes, which are critical for emotion recognition. In this paper, we propose a method named EEG correlation analysis-guided graph local enhanced feature learning network (CAGLE-net). In CAGLE-net, we use the correlation analysis to guide the learning of the dynamic directed connection matrix to capture topological features, which are then fed into the locally enhanced embedding layer to generate enhanced features for each brain region. Subsequently, the cross-attention fusion mechanism is employed to fully leverage these locally enhanced features, yielding more discriminative representations. Experiment results on the SEED dataset show that CAGLE-net outperforms existing baseline methods. This study offers a promising solution for EEG-based emotion recognition.

Published in: ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Date of Conference: 06-11 April 2025

Date Added to IEEE Xplore: 07 March 2025

ISBN Information:

ISSN Information:

DOI: 10.1109/ICASSP49660.2025.10890374

Conference Location: Hyderabad, India

Funding Agency:

Contents

SECTION I.

Introduction

Emotion recognition is a key technology in brain-computer interfaces, enabling computers to interpret human emotions. Recent research in this field can be broadly categorized by the type of data used. The first category employed non-physiological signals [1]-[5] to recognize and classify emotions, but these signals can be intentionally concealed, limiting their ability to reflect genuine emotions. The second category focused on physiological signals such as electroencephalography (EEG) [6], [7], electrocardiogram (ECG) [8], electromyogram (EMG) [9], and galvanic skin response (GSR) [10]. Among these, EEG is particularly advantageous for emotion recognition, as it captures emotion-related brain activity that cannot be masked, improving reliability and accuracy of identification. Consequently, EEG-based emotion recognition has gained significant attention in both research and applications [7], [11]-[13].

With the significant success of deep learning, many researchers have applied these methods to emotion recognition. For example, Salama et al. [12] used CNNs as feature extractors to capture emotion-related features, while Li et al. [13] proposed a bi-hemispheric difference model (BiHDM) for emotion recognition. However, the aforementioned methods cannot effectively simulate the complex interactions between different electrodes in both local and long-range correlations, resulting in suboptimal performance in emotions recognition.

To leverage the correlations between electrodes, graph convolutional networks (GCNs) [14] have been widely applied in the field of EEG-based research, because they can exploit the connectivity between the electrodes [35]. As pioneering research, Song et al. [16] proposed a novel dynamical graph convolutional neural network (DGCNN) for emotion recognition. Chen et al. [17] introduced a static adjacency matrix in GCN for extracting the inter-correlation features of multichannel EEG. Li et al. [18] proposed a residual graph convolutional broad network for emotion recognition. However, these methods do not consider the correlation analysis between different electrodes from EEG. Research has shown that the correlation analysis plays a crucial role in emotion recognition [21]. Additionally, they also ignored the varying activation of different brain regions during emotional processes [19], [20]. In fact, neuroscience studies have demonstrated a strong association between emotions and different regions of the cerebral cortex [22]-[24]. Therefore, both aspects are critical for emotion recognition [25], [26].

Fig. 1.

The overall structure of CAGLE-net. GARO is Graph-Attention Readout, and SERO is Squeeze-Excitation Readout. LEFL is a local enhanced fusion layer.

Show All

It is hypothesized that using correlation analysis to construct electrode adjacency and aggregate local brain features for each brain region can effectively decode emotions from EEG. Based on this, we propose an EEG correlation analysis-guided graph local enhanced feature learning network (CAGLE-net). In CAGLE-net, the Pearson (PCC) and Spearman (SCC) correlation coefficients of correlation analysis are used to measure correlations between electrodes from linear and rank correlation perspectives, respectively. To capture differences in local activation, we designed a local enhancement embedding layer (LEEL) to aggregate locally enhanced features. A cross-attention fusion mechanism is then employed to fully utilize these features, yielding more discriminative representations for emotion recognition. The main contributions of this work are summarized as follows:

The PCC and the SCC are employed to guide the learning of dynamically directed neighbor matrices from two complementary perspectives to capture emotion-related brain functional networks better.
A LEEL is used to aggregate and fuse emotion-related local enhancement features from different brain regions for more accurate emotion classification.

SECTION II.

METHODOLOGY

The overall structure of CAGLE-net is shown in Fig. 1. The following will provide a detailed explanation of the main modules.

A. Correlation Analysis Guides Dynamic Directed Adjacent Matrix Learning

The brain’s structure and functional connectivity are crucial for exploring correlations between electrodes and decoding emotion-related EEG signals [27]. In EEG correlation analysis, the PCC assesses the linear correlation between electrodes X and Y , ranging from -1 (negative) to 1 (positive), with 0 indicating no correlation. The PCC values are computed as follows: $\begin{equation*}\operatorname{PCC} (X,Y) = \frac{{\operatorname{cov} (X,Y)}}{{{\sigma _X} \cdot {\sigma _Y}}}\tag{1}\end{equation*}$ View Source where cov is covariance of X and Y , and σ is standard deviation. Consequently, the PCC matrix A_PCC ∈ ℝ^N×C×C can be obtained, where N is the number of samples, and C the number of electrodes.

The SCC evaluates the correlation between the ranks of electrodes X and Y , without assuming continuity or linearity in the signals. Its values, similar to the PCC, range from -1 (negative) to 1 (positive). The SCC is computed as follows: $\begin{equation*}{\text{SCC}} = 1 - \frac{{6\sum\nolimits_{i = 1}^N {d_i^2} }}{{N\left({{N^2} - 1}\right)}}\tag{2}\end{equation*}$ View Source where d_i = R(X_i) − R(Y_i) , R(X_i) and R(Y_i) denotes the rank of the i-th data in sample X and Y . Consequently, the SCC matrix A_SCC ∈ ℝ^N×C×C can be calculated.

These two complementary methods assess electrode correlations from distinct perspectives but provide only static measurements.

To model dynamic EEG connectivity, we use a dynamic-directed adjacency matrix A_D ∈ ℝ^C×C, which learns inherent electrode connectivity patterns during training. Leveraging A_PCC and A_SCC as prior knowledge to guide the learning of A_D (as shown in Equation 5) enables the capture of more discriminative, emotion-related features. $\begin{align*} & A_{PC{C_{{\text{ij}}}}}^\prime = \begin{cases} {1,}&{{\text{if }}\left| {{A_{PC{C_{{\text{ij}}}}}}} \right| > 0} \\ {0,}&{{\text{otherwise}}} \end{cases}\tag{3} \\ & A_{SC{C_{{\text{ij}}}}}^\prime = \begin{cases} {1,}&{{\text{if }}\left| {{A_{SC{C_{\text{i}}}}}} \right| > 0} \\ {0,}&{{\text{otherwise}}} \end{cases}\tag{4} \\ & \begin{array}{c} A_{PCC}^D = {A_D} \times A_{PCC}^\prime \\ A_{SCC}^D = {A_D} \times A_{SCC}^\prime\end{array}\tag{5}\end{align*}$ View Source

Fig. 2.

The visualization of the PCC and the SCC. The deeper red color indicates a stronger correlation, while a deeper blue color signifies a weaker correlation. Blank areas represent the absence of correlation. The diagonal is left blank to indicate that self-correlation was not calculated.

Show All

TABLE I The divisions of the five regions

B. Graph Convolution Network

In this paper, the GCN is used to capture the spatial structural features between electrodes. Specifically, we use the adjacency matrices $A_{PCC}^D{\text{ and }}A_{SCC}^D$ along with the feature matrix O ∈ ℝ^C×d as inputs to compute the graph convolutions separately, where d represents the feature dimensions. We utilize the simplified GCN proposed by Kipf et al. [28], which reduces the complexity of the original spectral GCN method. The simplified GCN is formulated as follows: $\begin{equation*}{H^{(l)}} = \sigma \left({{{\tilde D}^{ - \frac{1}{2}}}\tilde A{{\tilde D}^{ - \frac{1}{2}}}XW}\right)\tag{6}\end{equation*}$ View Source where $\tilde A = A + I$ , with $\tilde D$ as the degree matrix of Ã, and I ∈ ℝ^C×C as the identity matrix. W is a trainable weight matrix, and σ is an activation function, for which we use LeakyReLU. l represents the l-th layer of the GCN. A two-layers GCN is employed to capture second-order neighbor information, with the output expressed as follows: $\begin{align*} & {H_{{\text{PCC}}}} = {\text{concatenate}}\left({X,H_{{\text{PCC}}}^{(1)},H_{{\text{PCC}}}^{(2)}}\right)\tag{7} \\ & {H_{{\text{SCC}}}} = {\text{concatenate}}\left({X,H_{{\text{SCC}}}^{(1)},H_{{\text{SCC}}}^{(2)}}\right)\tag{8}\end{align*}$ View Source

C. Local Enhanced Embedding Layer (LEEL)

The visualizations of the PCC and SCC matrices as shown in Fig. 2 highlight distinct local connectivity patterns. Consequently, we divide the electrodes into five regions corresponding to the brain’s physiological divisions: frontal, temporal, central, parietal, and occipital lobes, as detailed in Table I. To aggregate features for representation, Graph-attention readout (GARO) and Squeeze-Excitation readout (SERO) are employed [29], both of which perform exceptionally well on graph structures. GARO leverages the Transformer attention mechanism using key-query embeddings [30] while SERO employs MLP-based attention from the Squeeze-and-Excitation Networks [31]. The computations for GARO (Equation 9-11) and SERO (Equation 12) are as follows: $\begin{align*} & K = {W_{key}}{H_{partial}}\tag{9} \\ & Q = {W_{query}}{H_{partial}}{\phi _{mean}}\tag{10} \\ & Z = sigmoid\left({\frac{{{Q^T}K}}{{\sqrt C }}}\right)\tag{11}\end{align*}$ View Source where W_key and W_query are learnable parameter matrices, H_partial represents the features of one of the divided brain regions, ϕ_mean denotes the mean operation applied to the electrodes within H_partial. $\begin{equation*}Z = sigmoid\left({{W_2}\sigma \left({{W_1}{H_{partial}}{\phi _{mean}}}\right)}\right)\tag{12}\end{equation*}$ View Source where σ is the nonlinearity function, W₁ and W₂ are learnable parameter matrices.

Fig. 3.

Local Enhanced Fusion Layer

Show All

D. Local Enhanced Fusion Layer (LEFL)

To fully utilize the locally enhanced features learned in LEEL, a fusion approach based on cross-attention mechanisms with multi-head attention (MHA) is employed [30]. The structure of LEFL is illustrated in Fig. 3. The three inputs are processed through a linear layer to obtain X_Q, X_K, and X_V . The computation for cross-attention fusion is as follows: $\begin{align*} & attn\left({{X_Q},{X_K}}\right) = soft\max \left({\frac{{{X_Q}X_K^T}}{{\sqrt d }}}\right)\tag{13} \\ & ou{t_{attn}} = attn\left({{X_Q},{X_K}}\right) \times {X_V}\tag{14}\end{align*}$ View Source where attn is the attention matrix between X_Q and X_k, $\sqrt d$ is the scaling factor.

In addition to the methods described above, an ℓ₁-norm constraint is applied to A_D to mitigate the over-smoothing issue in GCNs, resulting in the sparse loss term denoted as ℒ_sparse [32]. The cross-entropy loss function is used to measure the discrepancy between the predicted labels y_p and the ground-truth labels y_i. The final loss function is defined as follows: $\begin{equation*}Loss = \alpha {{\mathcal{L}}_{sparse}} + \beta {{\mathcal{L}}_{cross}}\tag{15}\end{equation*}$ View Source where the α ∈ [0,1] and the β ∈ [0,1] are learnable parameters.

SECTION III.

EXPERIMENTS

A. Dataset

The proposed model is evaluated on the SEED dataset [6], which includes EEG data from 15 subjects (7 males and 8 females). Each subject viewed 15 emotion-eliciting video clips (5 positive, 5 neutral, and 5 negative), each approximately 4 minutes long. The subjects participated in three experimental sessions, with each session spaced at least one week apart.

TABLE II Comparison of the average accuracy and standard deviation (%) for subject-dependent EEG emotion recognition experiments based on the SEED dataset.

B. Experiment Setup

To ensure a fair comparison of our results with other methods, we follow the experimental setup outlined in [6], [16], [33]. Specifically, we use the first 9 trials as the training set and the remaining 6 trials as the test set, with all trials coming from the same session. We computed the mean accuracy and standard deviation across all subjects over two sessions. Differential entropy (DE) features, which have been extensively utilized in previous studies [13], [16], [32], [34]. In this work, We directly use the DE features provided by the dataset.

C. Experiment Result

Table II presents the emotion classification accuracy of our model on the SEED dataset, compared to several baseline methods. Our model demonstrates superior performance, achieving higher accuracy and lower standard deviation. The experimental results prove the validity of our hypothesis. Notably, it surpasses the deep learning models BiHDM [13] and BiDANN [33] by 2.65% and 1.91%, respectively. This improvement highlights our method’s effectiveness in capturing dynamic directed connections between electrodes and identifying emotion-related spatial topological features. Compared to graph-based methods, CAGLE-net achieves the highest accuracy and lowest standard deviation, underscoring its proficiency in capturing spatial brain features and aggregating locally enhanced features from different brain regions. This approach not only extracts locally enhanced features effectively but also integrates them through advanced fusion techniques, leading to more discriminative features. Fig. 4 illustrates the classification accuracy of CAGLE-net across 15 subjects. Except for the Subject 2 and the Subject 10, the accuracy for all other subjects exceeds 90%. This performance highlights the model’s capability to generalize well across different individuals, further validating its effectiveness in practical applications.

Fig. 4.

The accuracy of each subject

Show All

TABLE III The ablation study results of CAGLE-net.

D. Ablation Studies

To validate the effectiveness of LEEL and LEFL, we perform ablation experiments based on the original model structure, with results present in Table III. The table shows a progressive improvement in accuracy as LEEL and LEFL are incrementally incorporated. When both are used in CAGLE-net, the accuracy reaches its highest level and the standard deviation its lowest. This indicates that CAGLE-net not only effectively captures EEG spatial correlation features but also efficiently aggregates locally enhanced features. By integrating cross-attention fusion methods, it extracts more discriminative features for emotion recognition. Besides, when using PCC or SCC individually, the accuracy decreases. However, when both methods are utilized together, the accuracy reaches its peak. This observation indicates that the two methods can guide the learning of the adjacency matrix in a complementary manner, addressing both linear and rank correlation perspectives.

SECTION IV.

CONCLUSION

This paper propose an EEG correlation analysis-guided local enhancement feature learning network. In CAGLE-net, PCC and SCC guide the learning of the dynamic directed adjacency matrix, capturing the topological structure information between electrodes; and LEEL can effectively aggregate emotion-enhanced features from different brain regions. Furthermore, LEFL fully leverages the enhanced features learned from LEEL, and after fusion through the cross-attention mechanism, improves emotion recognition accuracy.

ACKNOWLEDGMENT

The authors sincerely thank all anonymous reviewers for their evaluations and insightful comments, which have contributed to improving the quality of this paper.

References is not available for this document.

MIT Libraries

MIT Libraries

EEG Correlation Analysis-guided Graph Local Enhanced Feature Learning For Emotion Recognition

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

Introduction

METHODOLOGY