Introduction
Mental health is an important component of overall health and well-being. Mental health issues can lead to serious consequences, such as self-mutilation and suicide, particularly among university students who are not yet physically and mentally mature [1]–[5]. Currently, youth mental health is deteriorating. According to The State Of Mental Health In America 2021, 9.7 percent of youth in the United States have severe major depression, up from 9.2 percent in the previous year’s data [6]. The situation is even worse in developing countries [2]. Research also demonstrated that young people are more likely than any other age group to experience moderate to severe anxiety and depression during the COVID-19 pandemic [7]–[9]. However, not all students who have mental health issues are aware of their situation and actively seek help. According to research, roughly three-quarters of college students are hesitant to seek help when they have a mental health problem [10]. In this context, a proactive detection system for students with mental health problems is the key to addressing this issue.
However, detecting these students proactively is a tremendous challenge because mental health is influenced by a variety of complex factors. Previous studies have demonstrated that social life [11], [12], academic performance [13], physical appearance [11], and demographic features [4] can all have an impact on students’ mental health, and these features are recorded by the unstructured multi-modal data generated from various systems. Social life records, for example, belong to graph or network data, whereas physical appearance belongs to image data. It is difficult to accurately represent and fuse these features. The most commonly used methods in the current study are manual scoring and calculating summary statistics (e.g., the mean or variance). For example, researchers use Grade-Point Average (GPA) as a proxy for students’ academic performance or they manually assign a score to people’s physical appearance [11], [14]. It is undeniable that these approaches are capable of effectively overcoming data complexity and heterogeneity. However, it is not only subject to human bias, but it also risks losing information when data distribution varies greatly. For example, two students with the same GPA may take completely different courses. In this case, an effective solution to represent and fuse these features is a crucial challenge for detecting students with mental health problems.
The rapid development of technologies such as network science and representation learning [5], [15]–[17] enables us to profile campus life accurately and effectively, which brings us an unprecedented opportunity for the detection of students’ mental health [18]–[21]. Opportunities and challenges, however, coexist. Several significant issues continue to impede researchers in related fields. First, previous studies have utilised the friendship network to represent students’ social life [22], [23]. However, in addition to friendship, social life includes a variety of scenarios such as information dissemination and seeking assistance [12]. Representing students’ social life comprehensively remains a significant challenge. Secondly, GPA does not accurately represent a student’s academic performance, as mentioned in the previous paragraph. Accurate representation of academic performance remains an open question. In addition, because the number of students with mental health issues is much smaller compared to those who are healthy mentally, the related dataset is highly imbalanced, posing another critical challenge for mental health detection.
In this paper, we aim to detect students with mental health problems by overcoming the challenges mentioned above. We collect data through a combination of questionnaires and the learning management system (LMS), and select students’ social life, physical appearance, academic performance, and demographic features to detect their mental health status (shown in Figure 1). As mentioned above, these four features have been demonstrated to be related to the mental state of students [4], [11]–[13]. To profile students’ social life comprehensively, we collect social relationships for multiple scenarios, including friendship, life advice, academic advice, support, cooperation, intelligence, and good/bad news sharing [11], [12]. The experimental process of this research is shown in Figure 2. A framework, named CASTLE (eduCational dAta fuSion for menTaL hEalth detection), is proposed to achieve an accurate and effective detection, which includes three parts. First, representation learning is used here for the effective fusion of students’ multi-modal information, including multi-view social network embedding, physical appearance representation, and academic performance representation. In this part, we propose MOON (Multi-view SOcial NetwOrk EmbeddiNg), a multi-view social network embedding algorithm, to embed students’ heterogeneous social relations effectively. Moreover, a convolutional neural network (CNN)-based auto-encoder is used to embed students’ ID photos to obtain an accurate representation of their physical appearance. In addition, we introduce the method of combining the variant of one-hot encoding and autoencoder to overcome the heterogeneity of students’ academic performance [24]. Second, a synthetic minority oversampling technique (SMOTE) algorithm is used to mitigate the effects of label imbalance. Finally, a deep neural network (DNN) model is used for the final detection.
Our contributions can be summarized as follows:
We propose a detection framework CASTLE for students’ mental health detection through fusing multi-modal information generated from campus life.
We design a multi-view social network embedding algorithm MOON, eliminating the information redundancy across different views through simplifying the strategy of federated embedding.
We conduct comprehensive experiments on a real-world educational dataset, and the extensive results demonstrate the promising performance of the proposed methods in comparison with an extensive range of state-of-the-art baselines.
This paper is organized as follows. In Section II, related work is reviewed. The problem formulation is presented in Section III. In Section IV, the CASTLE detection framework is introduced in detail. In Section V, all the data used in this research and its collection process are introduced. In Section VI, we analyze the results of our experiment. We present the discussion and conclusion of our work in Section VII.
Related Work
The mental health of students attracts tremendous attention [25]. Scholars explore the laws behind mental health from different dimensions through statistical analysis theory. Rossin-Slater et al. [26] carried out experiments to explore the link between depression and school shootings through youth antidepressant use. They found that exposure to fatal school shootings increases youth antidepressant use. Duckworth and Seligman [27] investigated the self-regulation of eighth-grade students through longitudinal research, thus completing the detection of student achievement. The results indicated that students with poor self-regulation also have poor intelligence. Usher and Curran [28] predicted the influences on the mental health status of Australian university students using a cross-sectional study including an online survey. The results showed that there is a significant positive correlation between mental health and gender, age, health status, physical activity levels, sporting club participation, and social-emotional well-being. Morelli et al. [12] carried out an experiment to explore the association between mental traits and social centrality, and found that different mental traits can have different effects on a student’s social centrality.
Recently, social behavior has received increasing interest. Gong et al. [1] analyzed social anxiety disorders in university students using the research sensor of smartphones. They explored the relationship between social interaction and location, and the changes in social anxiety disorder with changes in location. The results showed that, depending on social anxiety, different students demonstrate large differences in personal behavior. Wongkoblap et al. [29] also concentrated on social behavior. They proposed a detection model based on social networks that can identify participants with poor mental health. The experimental results showed that social network data can effectively detect mental health problems. Meanwhile, Vanlalawmpuia and Lalhmingliana [30] analyzed data from social networking sites. They utilized data mining techniques to identify depression in Facebook users. Moreover, the study identified the number of depression indicator words, which are significant to related works.
Thanks to the development of big data technology, machine learning algorithms are applied to detect mental health issues [31]–[33]. Unlike previous studies that concentrate on the relationship between a single feature and mental health, machine learning-based research attempts to combine multiple features for prediction or classification. Brathwaite et al. [34] validated an existing model for the prediction of mental health in Nigeria. They selected 11 predictors and collected data through questionnaires, such as biological sex, childhood maltreatment, school failure, social isolation, fights, running away from home, and drug use. Tate et al. [35] designed a model to predict mental health problems in mid-adolescence and extract features from parental reports and register data such as the National Patient Register and the Multi-Generation Register. Walsh et al. [36] aimed to explore the law behind the nonfatal suicide attempts of adolescents and applied the random forest to make predictions. The features they used include diagnostic, demographic, medication, and socioeconomic factors. Ge et al. [37] carried out experiments to predict student mental status during the COVID-19 epidemic. They used early psychometric test results to predict future test results through the Xgboost algorithm and achieved good performance. Rubaiyat et al. [38] analyzed the predictability of major disorders based on a series of psychological tests, such as internet addiction, depression, and low self-esteem. They used machine learning methods to create models for detection. The experimental results showed that different disorders are interconnected.
Generally, these studies are mainly based on structured data such as demographic information. However, some important pieces of information, such as social patterns and appearance, are stored in unstructured data that is difficult to quantify and represent. With the development of representation learning and deep learning, some scholars are trying to introduce the information hidden in these unstructured data into mental health prediction. Oyebode et al. [32] introduced processing techniques for unstructured data to capture more complex features. They made predictions for mental health by performing sentiment analysis on 88125 user reviews. Mathur et al. [39] designed an experiment for suicidal tweet detection through natural language processing technology. Gaur et al. [40] incorporated domain-specific knowledge and proposed a detection framework to determine the severity of suicide risk-based data from Reddit. Cai et al. [41] proposed a multi-modal fusion-based depression recognition system based on EEG (Electroencephalogram) data, and the results showed that their system provides a highly flexible technique for depression identification. The data used for all relevant studies are summarized in Table 1. It can be seen that some scholars are trying to mine information from unstructured data to improve the performance of prediction. In a word, mental health prediction based on unstructured data is becoming a hot topic in this field.
Problem Formulation
In this section, we introduce notations and formally define the research problem of this work. In our research, we use a multi-view network to profile students’ heterogeneous social relations. A multi-view network consists of a set of nodes
Mental Status Detection: For student
Design of CASTLE Framework
The CASTLE framework, including educational data fusion, data augmentation, and the detection model, is introduced in this section. First of all, we introduce educational data fusion, consisting of three subparts (shown in Figure 3): multi-view network embedding, physical appearance representation, and academic performance representation. First, the MOON algorithm proposed in this paper is introduced for multi-view social network embedding. Second, we represent students’ physical appearance and academic performance accurately and effectively through a CNN-based auto-encoder and an auto-encoder based algorithm [24]. Moreover, in the data augmentation part, we use the SMOTE algorithm to generate the data of students with mental health problems to balance the dataset. Finally, a DNN with the dropout mechanism is utilized for the final detection.
A. Educational Data Fusion
1) Social Life Representation
As mentioned before, students’ heterogeneous social relations are represented by a multi-view network in this paper. In other words, eight networks are used to represent the social relationships of experimental participants in eight different scenarios. Eight social scenarios include friendship, life advice, academic advice, support, cooperation, intelligence, and good/bad news sharing (details are shown in Section V-D). Inspired by [42], we propose a multi-view social network embedding algorithm, named MOON, to embed social information of students in multiple social scenarios into low-dimensional dense vectors. The details are shown as follows.
Ata et al. [42] proposed concepts of first-order and second-order collaboration and experiments demonstrate that its embedding performance is better than other state-of-the-art algorithms. In this case, we introduce these concepts into our experiments and divide the node pairs into three categories:
Intra-view Pairs: Two nodes are linked in the single-view network. The representation can be generated through random walks in each single-view network to retain the diversity of different views.
Cross-view, intra-node Pairs: The same node (i.e., intra-node) across two views (i.e., cross-view) forms pairs. Through the alignment process of node representations in such a pair, we aim to capture the first-order social relations.
Cross-view, cross-node Pairs: A node in one view forms pairs with various nodes (i.e., cross-node) in another different view (i.e., cross-view) based on their associations in each view, to obtain the second-order social relationship.
Details of these pairs are shown as follows.
First, each node has a view-specific representation to retain the diversity of each view. The embedding process follows the Deepwalk model [43], i.e., it generates topologically associated node pairs from random walks [44]. For a certain view \begin{equation*} L_{\mathrm {Div}}(\Theta)=-\sum _{v \in V} \sum _{\left ({i^{(v)}, j^{(v)}}\right) \in \Omega ^{(v)}} \log P\left ({j^{(v)} \mid i^{(v)}; \Theta }\right) \tag{1}\end{equation*}
\begin{equation*} P\left ({j^{(v)} \mid i^{(v)}; \Theta }\right)=\frac {\exp \left ({\tilde {\mathbf {f}}_{j}^{(v)} \cdot \mathbf {f}_{i}^{(v)}}\right)}{\sum _{u \in U} \exp \left ({\tilde {\mathbf {f}}_{u}^{(v)} \cdot \mathbf {f}_{i}^{(v)}}\right)} \tag{2}\end{equation*}
However, for cross-view joint embedding, the current research strategy is to unite all networks indiscriminately without capturing the essential characteristics of different networks [42]. This strategy may lead to high computational overhead, especially when there are too many networks involved (for example, our dataset contains eight networks representing different social scenarios). In this case, to achieve more effective embedding, we design the following embedding strategy for multi-view social networks. Social relations have an intrinsic nature that most social relations are based on friendships. Taking life advice as an example, students only seek help from their friends when they are in trouble. To capture this law, we divide all views of the social network into two categories:
Source View: Based on the assumption mentioned above, the source view is friendship.
Target View: Except for friendship, the rest of social relations are target views.
a: First-Order Social Relation
Cross-view, intra-node Pairs. The intuition here is that the same node across various views represents the same entity, so its view-specific representation should collaborate with one another. In other words, for cross-view intra-node pairs, the two view-specific embeddings of the same node should become similar. For the research question in this paper, each target view is a derivative of the source-view. In this case, the source-view representation should help the target-view representation, i.e., the friendship should impact other views by optimizing the following:\begin{align*} L_{\mathrm {S} 1}(\Theta)=&-\sum _{v^{\prime } \in V^{\prime }} \sum _{\left ({i^{(v_{0})}, \cdot }\right) \in \Omega ^{(v_{0})}} \log P\left ({i^{\left ({v^{\prime }}\right)} \mid i^{(v_{0})}; \Theta }\right) \\=&-\sum _{v^{\prime } \in V^{\prime }} \sum _{\left ({i^{(v_{0})}, \cdot }\right) \in \Omega ^{(v_{0})}} \log \frac {\exp \left ({\mathbf {f}_{i}^{\left ({v^{\prime }}\right)} \cdot \mathbf {f}_{i}^{(v_{0})}}\right)}{\sum _{u \in U} \exp \left ({\mathbf {f}_{u}^{\left ({v^{\prime }}\right)} \cdot \mathbf {f}_{i}^{(v_{0})}}\right)} \\\tag{3}\end{align*}
b: Second-Order Social Relation
Cross-view, cross-node Pairs. For the second-order social relation, the two nodes without a link in the target views may entail a latent relation when they are linked in the source view. It is not appropriate to directly assume a link between them. Instead, we introduce an implicit mechanism to enable such across-view relations: for a node in each target view, its context’s context in the source view may contain information that contributes to its embedding. For example, one of your friend’s friends is yourself.
For the multi-view social network, the context of nodes in the friendship network should be used to guide their embedding in other views. Formally, \begin{align*} L_{\mathrm {S} 2}(\Theta)=&-\sum _{v^{\prime } \in V^{\prime }} \sum _{\left ({i^{(v_{0})},j^{(v_{0})}}\right) \in \Omega ^{(v_{0})}} \log P\left ({i^{\left ({v^{\prime }}\right)} \mid j^{(v_{0})}; \Theta }\right) \\=&-\sum _{v^{\prime } \in V^{\prime }} \sum _{\left ({i^{(v_{0})},j^{(v_{0})}}\right) \in \Omega ^{(v_{0})}} \log \frac {\exp \left ({\tilde {\mathbf {f}}_{j}^{(v_{0})} \cdot \mathbf {f}_{i}^{\left ({v^{\prime }}\right)}}\right)}{\sum _{u \in U} \exp \left ({\tilde {\mathbf {f}}_{u}^{(v_{0})} \cdot \mathbf {f}_{i}^{\left ({v^{\prime }}\right)}}\right)} \\\tag{4}\end{align*}
The complete loss function of the MOON algorithm is:\begin{equation*} L=L_{\mathrm {Div}}+\alpha \cdot L_{\mathrm {S} 1}+\beta \cdot L_{\mathrm {S} 2} \tag{5}\end{equation*}
2) Physical Appearance Representation
The photos of students are processed by a convolutional auto-encoder in this research to capture the spatial structure characteristics. The classical auto-encoder can be defined as: in each hidden layer, we adopted the following nonlinear transformation function:\begin{align*} \boldsymbol {h_{(2) }}=&f(\boldsymbol {W}_{(2) }\boldsymbol {h}_{(1) } + \boldsymbol {b}_{(2) }) \\ \boldsymbol {h_{(3) }}=&f(\boldsymbol {W}_{(3) }\boldsymbol {h}_{(2) } + \boldsymbol {b}_{(3) }) \\&\ldots \\ \boldsymbol {h_{(i)}}=&f(\boldsymbol {W}_{(i)}\boldsymbol {h}_{(i-1)} + \boldsymbol {b}_{(i)}),\quad i=4,5,\ldots k \tag{6}\end{align*}
\begin{equation*} \boldsymbol {h_{(m)}} = f(\boldsymbol {\mathcal {W}}_{(m)} * \boldsymbol {h}_{(m-1)} + \boldsymbol {b}_{(m)})\tag{7}\end{equation*}
\begin{equation*} \mathcal {L}\left ({\mathbf {x}, \quad \hat {\mathbf {x}}}\right)=\left \|{\mathbf {x}-\hat {\mathbf {x}}}\right \|^{2} \tag{8}\end{equation*}
3) Academic Performance Representation
The heterogeneity of academic performance, caused by the diversity of the student curriculum, is always a challenge in the education field. For example, one student chooses courses A, B, and C and another student chooses courses C and D. The content and number of dimensions vary when using their exam grades as features. The current popular method in this field is to calculate summarizing statistics (e.g., the mean or GPA) as the agent to represent academic performance. It can effectively overcome the heterogeneity of the curriculum, but information loss may be significant when data distribution varies widely. To preserve the completed information of academic performance while overcoming the heterogeneity of grade data, we use the method of combining the variant of one-hot encoding and autoencoder for homogenization [24]. Firstly, we embed their course through one-hot encoding, replacing the 1 with the corresponding exam grade. In this way, we create the matrix \begin{align*} \left \{{ \begin{matrix} c_{11} &\quad c_{12} &\quad \cdots &\quad c_{1n}\\ c_{21} &\quad c_{22} &\quad \cdots &\quad c_{2n}\\ \vdots &\quad \vdots &\quad \ddots &\quad \vdots \\ c_{m1} &\quad c_{m2} &\quad \cdots &\quad c_{mn}\\ \end{matrix} }\right \}\end{align*}
However, the number of courses taken by each student is much less than the total number of courses offered by the university. This issue leads to the severe sparsity of
B. Data Augmentation for Label Imbalance
The number of students with mental health problems is smaller generally (shown in Section V-B), so the label imbalance problem exists in our experiments. Thus, we utilize the SMOTE algorithm [45] to augment data in order to improve generalization performance. Normal oversampling methods take a simple strategy that copies the sample of the target category, resulting in the law captured by learning models on modified data being too specific. By contrast, the basic idea of SMOTE is to analyze the categories with fewer data and to generate data by the following equation:\begin{equation*} x_{new} = x + rand (0,1) \times (\tilde {x} - x)\tag{9}\end{equation*}
C. Detection Model
In this study, the students’ mental health detection task is regarded as a binary classification experiment. A three-layer DNN model, including the input, hidden, and output layers, is used for the final binary detection (shown in Figure 4). The input and output layers serve as nodes to buffer input and output, respectively, and the function of the hidden layer is to fit the relationship between input and output. Before any data has been run through the network, the weights for the three-layer model are random. Through the back-propagation algorithm, all weights are updated according to the laws hidden in the data. A dropout mechanism is applied to overcome the overfitting caused by a relatively small dataset. Dropout is a technique for addressing the overfitting problem of neural network-based models, and the mechanism is to drop units (along with their connections) of neural networks randomly during model training. Meanwhile, we also introduce batch normalization to optimize the training process. (Note that our experiments are carried out in the second semester of the student’s university life, so only the first grades are used as the feature. A model for time-series data, like temporal convolutional network (TCN) [46] or long short-term memory network (LSTM) [47] should be the alternative if more semester grades are involved.)
Dataset
In this section, we detail the data and its collection process. The dataset used in this research includes 509 university students in the same school from a Chinese university and they are freshmen who have just finished their first semester exams. They are required to be more than 18-year-old freshmen (aged 18-20, mean = 19.03, SD = 0.21), who live in several specific residential buildings (next to each other) in the same area. Removing the error data, 485 students are involved in this experiment. First, ethical consideration is introduced. Second, we present the data collected through the LMS (academic performance and demographic features) and the questionnaire (physical appearance and social networks).
A. Ethical Considerations
This research has been given ethical approval through the university’s ethical approval process. Participants consent to release all related data for the study. Participants are given the option to freely withdraw at any time during the study or omit any particular answers without providing a reason. Participants’ pictures and information are kept coded and confidential, and the questionnaire data is kept with separate IDs (e.g., letter code for faces, number code for other data). The key relating codes are kept separately in a password-protected file. The study is not anticipated to cause any distress, but if, for any reason, participants are distressed, they are encouraged to contact the student support service.
B. Mental Test
In this research, we use the Symptom Checklist 90 (SCL-90), a widely used self-report psychometric instrument, to assess mental distress and symptoms of psychopathology [48]. The primary symptom dimensions of SCL-90 consist of total scores of psychological health, somatization, obsessive-compulsive, interpersonal sensitivity, depression, anxiety, hostility, phobic anxiety, paranoid ideation, psychoticism, and a category of “additional items” which help clinicians assess other aspects of the clients’ symptoms. According to the university student norm, all students are divided into two categories: students with mental health problems and students with healthy mental status. The ratio of students with healthy mental status to students with mental health problems is 7 to 1.
C. Academic Performance and Demographic Feature
The LMS is the infrastructure of the university, which records information about students’ learning and daily life. Students’ academic performance can generally be recorded as the exam grade of each course stored in the LMS. The academic performance data used in this research includes 13234 records, and 1455 records of the demographic data are involved in our experiment, including gender, age, and nationality.
D. Physical Appearance and Social Networks
To collect physical appearance accurately, all participants are assigned to a lab and instructed to sit down and look at the center of the camera lens with neutral expressions, hair pulled back, and no adornments. Students are photographed under consistent lighting conditions with a fixed camera distance. We used a Fujifilm FinePix S5 Pro digital SLR camera (60 mm fixed length lens) and a photo booth painted white with calibrated D65 white lighting. These facial photographs are aligned according to interpupillary distance. We resize and crop the photographs to ensure the display of equal proportions of neck and hair. This process results in a set of 485 images of 485 identities.
After the photos, participants are asked to nominate members of their social networks regarding important dimensions of friendship, life advice, academic advice, support, cooperation, intelligence, and good/bad news sharing. They are instructed to write down 5–8 names of freshmen living in their dorm area (The 485 reference student names are given as a reference list.) with questions including: please choose 5–8 freshman names from the list who are your friends; who are intelligent; whom you would go to for academic advice/life advice, sharing good/bad news, support, or cooperation. For the sake of aesthetics in the format, we selected six of these networks for visualization shown in Figure 5.
Experiments and Results
In this section, we present the experimental results in detail to not only demonstrate the performance of the proposed approaches, including the CASTLE framework and MOON algorithm, but also to explore the detectability of students with mental health problems. All experiments are implemented in Python 3.6. Packages including Pandas and Scikit-learn are utilized for data analysis and detection. Origin 2018 and Graph are utilized for the visualization of data and experimental results. We first introduce the representation results of academic performance and physical appearance. Then, we introduce the experimental settings of mental status detection and its results.
A. Representation of Academic Performance and Physical Appearance
To deal with the heterogeneity of academic performance data, we apply the representation approach of combining the variant of one-hot encoding and autoencoder, as mentioned before. Since this experiment was conducted in the second semester, only grades from the first semester are used as a feature for detection. We test the different dimensions and the performance is shown in Figure 6. The value of the loss function fluctuates slightly, representing that even vectors with low dimensions can still accurately represent the academic performance of each student. Thus, we choose 6 as the dimension of representation for computational efficiency. Moreover, we use a CNN-based auto-encoder to process students’ photos and the representations are shown in Figure 6. We choose 6 as the dimension of representation as well.
The results of feature representation for academic performance and physical appearance.
B. Detection Results
1) Results Analysis
We design a series of experiments to explore the detectability of students with mental health problems and to validate the methods proposed in this paper. There are a total of 485 samples in our dataset, and as we mentioned before, the ratio of students with healthy mental status to students with mental health problems is 7 to 1. Four features are used for detection, including social life, appearance, academic performance, and demographic information. The embedding dimension of social life is 8 (details are shown in Section VI-B2). The embedding dimension of both appearance and academic performance is 6. Moreover, three pieces of demographic information were included in this experiment, including gender, age, and nationality. In this case, the final feature used for prediction is a 23-dimensional vector.
Due to privacy concerns, there is a lack of publicly available datasets in the field of student mental health. It is difficult for all scholars in the field to test the performance of the algorithm in the same data environment. In this case, we design the following comparative experiments based on algorithms commonly used in related fields. Firstly, we replace the specified parts of the proposed framework with some popular algorithms, which include two parts. In the first part, we replace our network embedding algorithm with the following algorithms:
Deepwalk [43]: Deepwalk is a classic embedding algorithm that is widely used in the field of network representation.
MANE [42]: MANE is a multi-view network algorithm, which inspired us to propose the MOON algorithm.
In the second step, we replace our final detection model with current popular algorithms shown as follows:
Support Vector Machine (SVM) [49]: SVM is a classic algorithm and is widely used in the field of data mining.
Random Forest (RF) [50]: is a classic ensemble algorithm that achieves good performance in various applications.
XGBoost [51]: XGBoost is a boosting-tree-based method and is widely used in various data mining scenarios with good performance.
We divide the training and test sets into ratios of 9:1, 8:2, 7:3, 6:4, and 5:5, respectively. For each train and test repartition, we use SMOTE to alleviate the class imbalance problem, and the data generation process is shown as follows:
First, raw data is divided into two categories: the training set
and the testing seta by stratified sampling.b Second, we use SMOTE on the training set
to generate samples of the minority class. Then in the new training seta , the number of students in the two classes is equal.a'
We test the performance of these algorithms from two aspects for a comprehensive evaluation. On one hand, we fit algorithms based on the raw training set
Note that from the performance of MOON+DNN and MANE+DNN in Figure 8, the proposed MOON algorithm is only slightly better than the MANE algorithm, but the proposed algorithm has a clear advantage in computational overhead. In the original mode MANE, the complexities for cross-view-intra-node and cross-view-cross-node consistencies are both
Finally, to better understand the performance of our framework, we compare the performance of current popular algorithms trained on the raw training set
2) Input Analysis
First, the detection performance with different embedding dimensions of the MOON algorithm is analyzed based on the CASTLE framework, and the results are shown in Table 4. For computational efficiency, the embedding dimension of the MOON algorithm is set at 8. Second, the contribution of each type of feature is analyzed in this paper. The results are shown in Table 5. All features contribute to the detection, which is consistent with our assumption. Note that the contribution of physical appearance is relatively small because social life and physical appearance may contain redundant information [11].
Moreover, the questionnaire is applied to collect the multi-view social network of students, and this method is time- and cost-consuming and hardly applicable to large-scale. In this case, we explore the performance of the situation without manual data collection methods such as questionnaires. We replace the multi-view network collected through questionnaires with a data-generated friendship network through their canteen co-occurrence frequency [52]. The structure of the friendship network is embedded through Deepwalk and the performance is shown in Table 5 as
In addition, we design experiments to analyze the parameters
Finally, as mentioned above, a dropout mechanism is utilized to improve the generalization performance, and we test the sensitivity of the proposed framework on dropout rate (Figure 10). The change of dropout proportions impacts the performance slightly, and 0.3 is the best.
Conclusion and Discussion
In this paper, we investigate an important problem of the detection of students’ mental health. We propose an educational data fusion detection framework CASTLE for achieving an effective and accurate detection through fusing multi-modal data generated from campus life. We tackle the various challenges that exist in multi-modal data by using representation learning theories. Specifically, for the representation of students’ social life, we divide social networks into the source view and the target view according to the inherent nature of social behaviors and propose the MOON algorithm for multi-view social network embedding. In essence, the idea we provided here is a new embedding strategy for the multi-view network that different networks need to be treated differently depending on the characteristics of the specific application scenario, and this strategy could be extended to other fields. Moreover, we use a SMOTE model to overcome the label imbalance problem. We conduct comprehensive experiments on a real-world educational dataset and the extensive results demonstrate the performance of the proposed methods.
Although we demonstrate the performance of the proposed algorithm through rich experiments, there is still some room for improvement in this study. First, we consider campus social networks as static weightless multi-view networks. However, social networks are dynamic and each link is weighted differently in real life. How to accurately characterize the social network of students is still an open question. In addition to campus social relationships, other social networks also have a significant impact on mental health, such as intimate relationships, family relationships, and teacher-student relationships. Subsequent research needs to consider social networks more comprehensively. Moreover, as mentioned above, the questionnaire is applied to collect multi-view social network of students, and this method is time- and cost-consuming and hardly applicable to large-scale. Although we try to automatically capture friendship relationships among students based on cafeteria co- occurrence, such methods are too crude and the accuracy cannot be guaranteed. How to capture students’ social networks in diverse scenarios based on a data-driven approach is still an open question. Finally, the deep learning model is adopted in this paper, which means losing the interpretability of the experimental results while obtaining better prediction performance. This can easily cause educators to mistrust the prediction.
According to the drawbacks mentioned above, there are multiple directions for future work as follows:
First, we will develop subsequent versions of the CASTLE framework to fuse more features, like students’ Internet access patterns and life orderliness, to achieve better detection performance.
Second, we will attempt to develop data-driven methods for capturing students’ social patterns based on group work records, or discussion records on LMS, to replace the questionnaire-based data collection.
Third, in the next step, we try to introduce causal learning related techniques to analyze the experimental results.
Last but not least, we also intend to integrate the CASTLE framework into the modern educational management system to assist with educational decision making.
ACKNOWLEDGMENT
The authors thank Dongyu Zhang, Chuanhui Yuan, and Qing Qing for their help with the experiments as well as all students who participated in the experiments.