Journals & Magazines >IEEE Open Journal of Control ... >Volume: 3

Classification of Human Learning Stages via Kernel Distribution Embeddings

Abstract:

Adaptive automation, automation which is responsive to the human's performance via the alteration of control laws or level of assistance, is an important tool for trainin...Show More

Topic: Modeling, Control, and Learning Approaches for Human-Robot Interaction Systems

Metadata

Abstract:

Adaptive automation, automation which is responsive to the human's performance via the alteration of control laws or level of assistance, is an important tool for training humans to attain new skills when operating dynamical systems. When coupled with cognitive feedback, adaptive automation has the potential to further facilitate human training, but requires precise assessments of human progression through various learning stages. This is challenging because of the underlying dynamics, as well as the stochasticity inherent to human action. We propose a data-driven approach to assess learning stages in a complex quadrotor landing task that is responsive to stochastic, human-in-the-loop quadrotor dynamics. We represent each learning stage as a distribution of canonical trajectories for that learning stage, then employ kernel distribution embeddings in combination with a rule-based heuristic, to determine which canonical distribution a sample landing trajectory is closest to. We demonstrate our approach on experimental human subject data, and use our approach to evaluate the efficacy of cognitively-based adaptive automation designed to calibrate self-confidence. Our approach is more accurate than standard classification methods, such as nearest centroid assignment, which rely on metrics that are not inherently suited to analysis of trajectories of stochastic dynamical systems.

Topic: Modeling, Control, and Learning Approaches for Human-Robot Interaction Systems

Published in: IEEE Open Journal of Control Systems ( Volume: 3)

Page(s): 102 - 117

Date of Publication: 01 January 2024

Electronic ISSN: 2694-085X

DOI: 10.1109/OJCSYS.2023.3348704

Funding Agency:

Contents

SECTION I.

Introduction

Adaptive automation has widely been used to facilitate human task performance and efficiency [1], [2], [3], by using human performance feedback to predict decision-making behavior and adapt the automation to the human [4], [5]. Cognitive factors, including self-confidence, play an important role in the design of effective human-automation interaction [6], [7], [8], [9] and how humans learn [10], [11], [12], [13]. Overconfidence and underconfidence both undermine learning, thereby motivating the need for calibration of users' self-confidence to their skill level [9], [14], [15]. Indeed, in intelligent tutoring systems [4], [10], such a feature has already been incorporated, so that teaching strategies are chosen based not only on performance metrics, but also the student's self-confidence [11], [12].

While learning theory for psychomotor tasks is relatively well-established [16], [17], [18], [19], [20], [21], developing quantitative methods to detect a user's transitions from one learning stage to another from their dynamic behavioral data is not trivial. In recent work [22], we showed that automation designed to calibrate participants' self-confidence by selecting when to provide automated assistance in a psychomotor task led to statistically significant improvements in the participant's ability to manually execute a task after a fixed number of trials. However, progress towards making this approach run-time capable requires the development of methods that can assess learning stages directly, without human intervention. Assessing learning stages in dynamics-driven tasks requires a high fidelity assessment of trajectories that can distinguish between meaningful (but potentially small magnitude) signals that differentiate between learning stages, while not being overwhelmed by noise inherent to human action. That is, the assessment needs to be responsive to the dynamic and stochastic nature of the human-in-the-loop data. However, a dearth of tools exists to do this high fidelity assessment on a low volume of trajectories of stochastic, dynamical systems. Standard statistical tools (such as nearest neighbor classification [23]) typically struggle with the scale of vector-valued data that must be analyzed, resulting in a lack of responsivity to the dynamic nature of the task, and further, are based in a Euclidean distance metric that may not accurately capture meaningful distinctions. The lack of extensive data is difficult for approaches based in system identification [24], resulting in important signals lumped together with noise processes that capture model inaccuracy.

We focus here on the problem of characterizing learning dynamics in a quadrotor landing task, motivated by the problem of designing adaptive automation with cognitive feedback to facilitate human learning of the landing task. Our methods are based in kernel distribution embeddings [25], in which data is projected into a reproducing kernel Hilbert space that captures the distribution underlying a finite set of data. We have previously used kernel embeddings to analyze human-in-the-loop data [26] from braking trajectories in a semi-autonomous vehicle and found the approach to be responsive to the subtleties of human driving input.

The main contribution of this paper is the creation of a classifier that uses a rule-based heuristic in combination with the maximum mean discrepancy (MMD) [27], a distance metric computed within the reproducing kernel Hilbert space, to assess learning stages. We designed and implemented a human subject experiment with 23 participants, in which participants must learn, through repeated trials, to successfully land a quadrotor in a simulated environment. From this data, we first identify representative distributions of trajectories for each learning stage, then evaluate which learning stage, amongst a set of feasible learning stages, is closest to the observed trajectory. We validate our approach, and compare it with the nearest centroid algorithm [23]. Lastly, we use our proposed classifier to evaluate a heuristic control policy for calibration of self-confidence, by characterizing progression through learning stages over repeated trials.

The paper is organized as follows. In Section II, we describe the human subjects experiment that is the foundation of this work. Section III describes the main results from learning theory and the characteristics of learning stages in the quadrotor landing task. In Section IV, we propose a rule-based classifier, based in reproducing kernel Hilbert spaces, to characterize learning stages during quadrotor landing. Section V presents our results and a comparison with a standard statistical technique. In Section VI, we analyze how both end of learning outcomes and the learning process differ in participants who receive policy aimed toward self-confidence calibration as compared to those who do not. We conclude with a discussion of the implications of these results for adaptive automation aimed at teaching or training humans.

SECTION II.

Human Subjects Experiment

A. Overview

The objective of the experiment is to compare the effects of two different mode selection policies that determine when automated interventions are provided to the learner on both learning outcomes and the learning process itself. More specifically, we consider whether mode selection should be driven by a participant's performance only or instead be based on a participant's performance and their self-reported self-confidence, a human cognitive factor that affects learning. We designed and conducted a between-subjects human study that involved participants practicing landing a quadrotor in a computer-based training simulator. In the study, each participant's goal is to learn how to successfully land a quadrotor manually using a throttle and joystick controller.

Our choice of the quadrotor landing task as an experimental platform was based on multiple factors and constraints. First, we sought an experimental platform based on a psychomotor task, because these types of tasks require planning of movement and have been well studied in human learning literature. Additionally, we sought a platform that not all subjects would have a high familiarity with (unlike a driving scenario, for example), but which exhibited real complexity in the dynamics (unlike a toy-like video game) so that we could readily observe participant progress in learning. It was also important that our experimental platform be amenable to virtual participation as well as in-person participation so that we could use the virtual participation data to hone our in-person experiment and make the most of in-person experimentation. We chose to adapt the platform presented in [22] because, in addition to meeting the criteria outlined above, it had also been previously deployed in an experiment designed to evaluate different levels of assistance [28].

B. Experiment and Controller Design

The experiment consists of 20 trials, in which the participant must navigate the quadrotor from an initial point and land the quadrotor on a landing pad with appropriate bounds on the quadrotor speed and roll angle to ensure a safe landing. The landings are interspersed with prompts to provide information to the participant about their landing performance, in the form of a performance score, as well as to solicit a rating of their self-confidence. The quadrotor starts from the same initial position in every trial.

After each trial, a mode selection policy determines whether subsequent trials should be conducted in manual control mode ( $M_{1}$ ) or the shared control mode ( $M_{2}$ ). In manual control mode, the participant has complete control of the quadrotor. In shared control mode, the participant is assisted by a linear quadratic regulator (LQR)-based control law that has been designed to track a desired landing trajectory that is based on expert demonstrations [28]. The LQR-based control is blended with the participant's input, via a convex combination, to obtain the quadrotor input. Additional details are provided in Appendix A.

1) Self-Confidence Based Mode Selection Policy

The self-confidence-based mode selection policy (Fig. 1, top) is designed to calibrate participants' self–confidence to their performance [8], [15], [22] or in other words, to mitigate their over– and under–confidence. A participant's self–confidence is considered to be calibrated if their self-confidence is low when their performance decreases or is consistently low, and if their self–confidence is high when their performance improves or is consistently high. Aiming to maximize self–confidence may not always lead to better task performance [14], and may instead lead to mis–calibration of self–confidence and consequently the misuse of the automation [8]. The performance-based mode selection policy (Fig. 1, bottom) considers only the participant's performance. The policies can assign mode sequences ranging in length from one to three trials; during this time, new mode selection after each trajectory will be halted.

Figure 1.

Mode selection policies. Top: Self-confidence-based policy that is dependent on participants' self-confidence and performance. Bottom: Performance-based policy that is only dependent on performance.

Show All

Self-confidence-based mode selection is a function of the performance score, denoted $S_{k}$ at trial $k$ , the difference in performance score, $\Delta S_{k} = S_{k} - S_{k-1}$ , and the self-confidence level $SC_{k}$ . We abstract each of these three elements $(\Delta S, S, SC)$ into two discrete values: low and high, based on thresholds $\gamma =186, \phi = 810, \sigma = 56,$ respectively. The thresholds are based on the empirically observed 50% quantile value from pilot experiments. We then create a heuristic based policy for each cell, depicted in Fig. 1.

1) and 2) $(|\Delta S| \geq \gamma, S, SC)$ : Sufficiently large changes in performance (either positive or negative), irrespective of the specific values of $S$ and $SC$ , is an indication that the participant may be making substantial changes to their strategy for landing the quadrotor. We therefore choose a mode selection policy that is aimed at helping the participant achieve consistency in their performance, a necessary step prior to assessing calibration of self-confidence. When $\Delta S< -\gamma$ , i.e., performance is rapidly declining, we choose $M_{2}, M_{2}$ , to improve the participant's skills. When $\Delta S > \gamma$ , i.e., performance is rapidly improving, we choose $M_{1}, M_{1}$ because we want to give the participant a chance to apply the strategy that led to improvement to a trial without assistance.

When performance is not changing rapidly, we employ strategies based on each of the four discretized values of performance $S$ and self-confidence $SC$ .

3) $(|\Delta _{S}| \leq \gamma, S\leq \phi, SC \leq \sigma)$ : For the case in which self-confidence is low and performance is low, we choose a policy of two shared control modes, $M_{2}, M_{2}$ , because the participant's consistently low performance indicates a need for assistance, and that the participant is unsure of their performance. We consider this to be calibrated low self-confidence, because of the consistency between low values of self-confidence and low values of performance.

4) $(|\Delta _{S}| \leq \gamma, S > \phi, SC > \sigma)$ : For the cases in which self-confidence is high and performance is high, we choose a policy of a single manual control mode, $M_{1}$ , because the participant's high performance indicates they are already performing well without assistance. This case is calibrated high self-confidence.

5) $(|\Delta _{S}| \geq \gamma, S \leq \phi, SC > \sigma)$ : In the case of mis-calibration when the participant is over-confident, ( $S\leq \phi$ , $SC>\sigma$ ), one trial of manual mode is assigned, followed by two trials of shared control mode. This sequence is designed to first reinforce to the participant that they are performing poorly in manual mode (with the goal of decreasing the participant's self-confidence) and then help them to improve with the aid of the automation assistance.

6) $(|\Delta _{S}| \leq \gamma, S> \phi, SC \leq \sigma)$ : In the case of self-confidence miscalibration when the participant is em under-confident ( $S>\phi$ , $SC\leq \sigma$ ), the self-confidence-based policy assigns two trials in manual mode ( $M_{1}, M_{1}$ ), to reinforce to the participant that they perform well without assistance, with the goal of increasing the participant's self-confidence.

2) Performance Score

We choose a performance metric specific to the context of the quadrotor landing, consistent with known gamification techniques [29]. We choose to prioritize 1) landing the quadrotor on the landing pad, while also 2) satisfying constraints on speed ( $\sqrt{\dot{x}^{2} + \dot{y}^{2}} \leq 5$ m/s) and roll angle ( $|\theta | \leq 10^\circ$ ). We categorize landing types as either unsuccessful, if they do not meet any of these criteria, safe, if they meet just the landing pad criteria, and successful, the landing constraints are also met (Table 1).

TABLE 1 Quadrotor landing types.

The functions we use to emphasize these criteria are shown in Fig. 2, and are functions that weight final position, velocity, and roll angle, via normalized linear or sigmoidal functions. Specifically, the position scoring function prioritizes the distance at landing from the center of the landing pad. The maneuver time scoring function penalizes excessively long times to land, when the landing is successful. For unsuccessful landings, the time scoring function penalizes participants for crashing, but rewards them for navigation that minimizes the root mean square error from the reference trajectory. The velocity and roll angle scoring functions penalize high speeds and roll angles in a nonlinear fashion, and specifically account for the possibility that the quadrotor may land on its side or upside down.

Figure 2.

The scoring function $S$ is the scaled sum of four scoring functions, each of which is associated with a desirable landing characteristic.

Show All

C. Participants

A total of 27 participants completed the human subject experiment. Of these, 4 participants' data were removed because they never experienced any trials with shared control mode. Therefore, our dataset includes 23 participants (9 male and 14 female). Participants were randomly assigned one of two treatments: The Self-confidence group experienced the self-confidence-based mode selection policy, and the Performance group experienced the performance-based mode selection policy. Of the 23 participants, 11 participants were assigned to the Self-confidence group and 12 participants were assigned to the Performance group. Participant ages range from 18–37 years (mean = 25.09 years). The Institutional Review Board at Purdue University approved the study.

D. Apparatus

The quadrotor landing training simulator was developed on Python 3.6.8 using Pygame 2.0.1. The simulator, originally developed in [28], was adapted for this study. A Thrustmaster T. Hotas 4 joystick and throttle was used to control the quadrotor. The Tobii X-60 eye tracker was used to collect eye gaze data.

E. Experimental Procedure and Data Collection

1) Procedure

Before completing the 20 trials, participants were provided with instructions and a description of the experimental setup. Participants then completed two 60-second tutorials to familiarize themselves with the simulator environment. In the first tutorial, participants practiced using the throttle by moving the quadrotor up and down. In the second tutorial, participants practiced flying the quadrotor using both the throttle and joystick controls. The participants then complete 20 trials of the quadrotor game. After every trial, participants are provided with their numerical score, the amount of time they expended in landing the quadrotor, and whether they unsuccessfully, unsafely, or safely landed the quadrotor for all previous trials. Additionally, each participant is asked to rate their self–confidence, defined for the participant as “The confidence in oneself and one's powers and abilities,” in their ability to land the quadrotor on a numerical scale of 0-100.

The first two and last five trials are completed in manual mode so that each participant's change in manual performance can be quantified. In trials 3–15, the participant's control mode is determined by the mode selection policy. Prior to starting each trial, the participant is able to see whether the automation assistance is on or off.

2) Data Collection

Quadrotor states and inputs are sampled at 30 Hz. The eye gaze data is analyzed using iMotions eye tracking software [30]. The location of participants' eye gaze (in pixels) is sampled at 60 Hz. Areas of interest (AOI) are drawn around stimuli that participants are expected to look at throughout the trials, and include the landing pad, the timer, the live speed and attitude indicator, and the automation status indicator (Fig. 3). Identifying areas of interest enables our ability to track changes in visual focus, which is important for characterizing learning stages.

Figure 3.

Quadrotor simulator module areas of interest (AOI): 1) Timer, 2) automation indicator, 3) live speed and attitude indicator, 4) landing pad, and 5) quadrotor.

Show All

Finally, after every trial, the performance score and landing type is assessed, and the participant's self-assessed self-confidence is recorded.

SECTION III.

Characterizing Learning Stages for the Quadrotor Landing Task

Motor skill learning describes learning paradigms that involve movement, such as learning to control reflex, improving reaction time, or learning a sequence of movements [31]. Fitts' seminal models for psychomotor learning [16], [32] are the basis for the canonical ACT-R model [33], [34], widely used in human factors and psychology for simulating and understanding human cognition.

More recently, Dreyfus proposed a model that characterizes acquisition of expertise in five stages: novice, advanced beginner, competent, proficient, and expert [35]. The novice stage is when the learner recognizes the objective facts and features relevant to the skill and acquires rules for determining actions based on those. For example, a student vehicle driver learns to recognize features such as inferring speed from the speedometer and rules for shifting gears. In the advanced beginner stage, the learner gains experience completing the task, taking note of meaningful cues and situational features. Continuing the same example, the advanced beginner driver uses situational features such as engine sounds and speed cues to decide when to shift gears. In the competent stage, the learner will choose a plan or goal and divide the task into smaller sets of features relevant to the goal. For example, when exiting a freeway, a competent driver will take into account speed, road surface condition, and other factors to decide how to accelerate or brake. In the proficient stage, the learner no longer consciously plans how to overcome challenging situations and instead acts more intuitively. Finally, in the expert stage, the learner can now perform without planning or deciding between alternative plans. This model has been applied to the skill acquisition process of airplane pilots, chess players, automobile drivers, and adults learning a second language [17], [35].

We apply the first four stages of the five-stage learning theory model proposed by Dreyfus [17] to the quadrotor landing task. Based on Dreyfus' definition of the expert (fifth) learning stage, it is likely that a participant will progress through the proficient (fourth) learning stage before advancing to expert. Therefore, we assume that the fourth learning stage captures both proficient and expert learners. We define features of the quadrotor dynamics, participants' eye gaze fixations and trajectory, and the landing type achieved that characterize each learning stage, as summarized in Table 2.

TABLE 2 Descriptions of each learning stage in terms of quadrotor dynamics, gaze data, and feasible landing types.

In Learning Stage 1, the quadrotor dynamics are erratic and result in unsuccessful or unsafe landings. The learner's gaze travels across the computer screen as they familiarize themselves with the task and then focuses on the quadrotor, as shown in Fig. 4(a). In Learning Stage 2, the learner shows more control while navigating the quadrotor. In turn, the learner begins to focus on the live speed and attitude, and to attempt landings that are safe. This is demonstrated in Fig. 4(b), as the participant's gaze follows the quadrotor trajectory until the quadrotor is close to the landing pad, at which point the participants focuses on the live speed and attitude AOI to achieve a safe landing. The quadrotor trajectory shows that while the path may not be direct, the learner is successful in maintaining control of the quadrotor.

Figure 4.

Representative quadrotor trajectories (left) and gaze trajectories with respect to areas of interest (right) for each of the four learning stages. Color gradient (blue to yellow) indicates progression of time within a single trial.

Show All

In Learning Stage 3, demonstrated in Fig. 4(c), the learner maintains control over the quadrotor and its trajectories become more similar to an expert trajectory. The learner may still reference live speed and attitude, but not as often. Finally, in Learning Stage 4, the learner lands the quadrotor efficiently and consistently. The learner no longer needs to look at the live speed and attitude, but rather can infer it, and instead focuses on the quadrotor, as shown in Fig. 4(d).

SECTION IV.

Learning Stage Classification via Kernel Embeddings

A. Justification for the Proposed Approach

Our approach is guided by the need to methodologically address several difficult challenges. 1) Multi-modal data: Learning stages in the quadrotor task are based in knowledge of both quadrotor trajectories and gaze trajectories. Further, causal patterns in these trajectories are also important. 2) Heterogeneity and variability: Unlike scenarios in which participants are trained for a task, and all learning has been completed, we focus specifically on the process of learning. This means that the heterogeneity across participants, as well as the variability within a given participant's trajectories, are both of interest. 3) Nonlinear processes with non-Gaussian uncertainty: The non-uniformity and nonlinearity inherent to human data is important to capture in its raw state. That is, we seek to preserve the uncertainty inherent to participant trajectories, as it is a key element of their response. Further, we wish to avoid, as much as possible, the subjective judgment calls inherent to many data pre-processing techniques, which can essentially remove higher moments of the data. 4) Lack of highly accurate, predictive models: Models of human motor control in reference tracking tasks have been long established [36], [37], [38]. However, these intentionally seek to describe the mean behavior. In contrast, we are interested in the entire distribution, and not just the first moment. Imposing artificial structure on the data could obfuscate stochastic and nonlinear phenomena of interest.

A variety of approaches have been employed for data-driven characterization of human behavior in dynamical systems. Many of these approaches are based in Gaussian mixture models [39] and mixture models more generally [40], with application to car following [41], [42], [43], [44], driver influence via economic models [45], [46], and robotic manipulation [47], [48]. We choose reproducing kernel Hilbert spaces (RKHS) based tools because a) they are amenable to non-parametric modeling, meaning that unlike many machine learning approaches such as mixture models, there are very few parameters that are required, and hence less vulnerability to the need for excessive tuning; b) they established tools for statistical inference and estimation [27], [49], [50], and are gaining traction as tools for data-driven verification [51], [52] and control [53], [54] of dynamical systems; and c) numerical methods scale with the number of samples of observed data, as opposed to dimension of the state, which makes them a promising approach for extension to run-time adaptive automation.

B. Distribution Embeddings

Consider a trajectory that is described as a stochastic process [55], $X = [ {\mathbf {x}}_{1}^{T}, {\mathbf {x}}_{2}^{T},\ldots, {\mathbf {x}}_{N}^{T} ]^{T} \in {\mathbb{R}}^{mN}$ , that results from evolution of a discrete-time, stochastic dynamical system over a horizon of length $N$ time steps from a known initial condition $x_{0}$ . For the combined quadrotor and eye gaze trajectories, we presume that the state of the system is ${\mathbf {x}}_{t} = [ x_{t}, y_{t}, \theta _{t}, \dot{x}_{t}, \dot{y}_{t}, \dot{\theta }_{t}, \tilde{x}_{t}, \tilde{y}_{t} ] \in {\mathbb{R}}^{m}$ at time step $t$ for $m=8$ , where the last two states correspond to the position of the centroid of eye gaze at time $t$ . The vector $X \in {\mathbb{R}}^{mN}$ captures the entire trajectory, and is a random variable defined on the measurable space $({\mathcal {X}}, \mathscr {B}({\mathcal {X}}))$ , where the Borel $\sigma$ -algebra ${\mathcal {B}}({\mathcal {X}})$ is generated by the set of all open subsets of $\mathcal {X}$ .

We presume that the distribution $\mathbb{P}$ of $X$ is unknown, but that we have access to samples ${\mathcal {S}} = \lbrace \xi ^{i}\rbrace _{i= 1}^{M}$ , $\xi ^{i} \in \Xi \subseteq {\mathbb{R}}^{mN}$ that are drawn independently and identically from $\mathbb{P}$ . We seek to first empirically represent a distribution $\mathbb{P}$ via its samples ${\mathcal {S}}$ , and then to compute the distance between a given sample and the distribution $\mathbb{P}$ . Computation of both of these operations within a reproducing kernel Hilbert space enables implicit modeling of complex distributions and stochastic dynamics.

Definition 1 (RKHS, [56]):

A Hilbert space $\mathscr {H}$ of functions from $\Xi$ to $\mathbb{R}$ is a reproducing kernel Hilbert space (RKHS) if there exists a kernel function $k : \Xi \times \Xi \to \mathbb{R}$ , that satisfies the following properties:

For every $\xi \in \Xi$ , $k(\xi, \cdot) \in \mathscr {H}$ , and
(Reproducing property): For every $\xi \in \Xi$ and $f \in \mathscr {H}$ , $f(\xi) = \langle f, k(\xi, \cdot) \rangle _{\mathscr {H}}$ .

Here, we employ the radial basis kernel, $k(\xi, \xi ^{\prime }) = \exp (- \Vert \xi - \xi ^{\prime } \Vert ^{2}/2\sigma ^{2})$ for some positive value $\sigma \in \mathbb{R}_+$ , because it is characteristic [57] (meaning that the embedding encapsulates all statistical features of the underlying distribution and is unique [58, Theorem 1]), universal (meaning that it is capable of approximating a wide variety of functions), and because we have previously found it to be effective for human-in-the-loop data [26]. The kernel distribution embedding of $\mathbb{P}$ [58], $\begin{equation*} m_{\mathbb{P}} = \int _{\Xi } k(\xi, \cdot) \mathbb{P}(\mathrm{d} \xi), \tag{1} \end{equation*}$ View Sourcecaptures the distribution $\mathbb{P}$ of $X$ within the RKHS. However, direct computation of (1) is not possible since $\mathbb{P}$ is unknown. Instead, we compute an empirical estimate $\widehat{m}_{\mathbb{P}}$ via the approach in [26], [58], with $\begin{equation*} \widehat{m}_{\mathbb{P}} = \frac{1}{M} \sum _{i=1}^{M} k\left(\xi ^{i}, \cdot\right). \tag{2} \end{equation*}$ View SourceThis provides an empirical estimate of the distribution within the Hilbert space.

To calculate the distance between a sample and a distribution, we employ the maximum mean discrepancy [27], represented as a norm in the Hilbert space, $\begin{equation*} \text{MMD}(\mathbb{P}, \mathbb {Q}) = \Vert m_{\mathbb{P}} - m_{\mathbb {Q}} \Vert _{\mathscr {H}}, \tag{3} \end{equation*}$ View Sourcefor two embeddings, $m_{\mathbb{P}}$ and $m_{\mathbb {Q}}$ . However, as these distributions are not known, we again turn to an empirical approximation. Using the empirical estimate (2), we can compute an estimate of (3) as $\begin{equation*} \widehat{\text{MMD}}(\mathbb{P},\mathbb {Q}) =\left\Vert \frac{1}{M}\sum ^{M}_{i=1}k\left(\xi ^{i}, \cdot\right)-\frac{1}{N}\sum ^{N}_{i=1}k\left(\xi ^{i}, \cdot\right)\right\Vert _{\mathscr {H}}. \tag{4} \end{equation*}$ View SourceThus, we can readily employ the MMD to assess the similarity between empirical estimates of distributions. We note that this metric is valid irrespective of the sample size for embeddings of either distribution, meaning that we can compute the MMD between a distribution and a single sample without compromising accuracy.

Although there are a variety of metrics which capture distance between distributions, such as the Kullback-Leibler divergence or total variation divergence, we select the MMD primarily because it does not require density estimation, which can be computationally expensive and numerically unstable. Additionally, because MMD can be computed within the RKHS, it is sensitive to the underlying distribution characteristics and computationally efficient.

C. MMD-Based Classification Algorithm

We use both the empirical kernel embedding (2) and the empirical MMD (4) to classify learning stages from observed trajectories. We first characterize four distributions that capture the canonical behavior associated with each of the four learning stages. Using the criteria described in Section III and summarized in Table 2, we manually select 8 trajectories for each of Learning Stages 1 and 2, and 6 trajectories for each of Learning Stages 3 and 4. The trajectories are chosen based on their ability to represent the critical elements of each learning stage. The quadrotor positions associated with these trajectories are shown in Fig. 5. For each of these distributions, we compute the kernel embedding (2) using the same kernel function, with $\sigma = 25$ .

$Figure 5. - To classify a given trajectory, we compare the distance between that trajectory and the subset of canonical distributions associated with each feasible learning stage, within the reproducing kernel Hilbert space. Canonical distributions for all four learning stages are shown: Learning stage 1 (Novice, red), Learning stage 2 (advanced beginner, orange), learning stage 3 (competent, yellow), and learning stage 4 (proficient, green). For participant 1 in the performance group, trial 2 (blue), only learning stage 1 is feasible, hence only the MMD with respect to the kernel embedding $\widehat{m}_{\mathbb{P}_{1}}$ is computed.$

Figure 5.

To classify a given trajectory, we compare the distance between that trajectory and the subset of canonical distributions associated with each feasible learning stage, within the reproducing kernel Hilbert space. Canonical distributions for all four learning stages are shown: Learning stage 1 (Novice, red), Learning stage 2 (advanced beginner, orange), learning stage 3 (competent, yellow), and learning stage 4 (proficient, green). For participant 1 in the performance group, trial 2 (blue), only learning stage 1 is feasible, hence only the MMD with respect to the kernel embedding $\widehat{m}_{\mathbb{P}_{1}}$ is computed.

Show All

To classify the remaining trajectories, we evaluate each trajectory in two steps: 1) an assessment of feasible learning stages, and 2) classification amongst the feasible set of learning stages. These elements are formally described in Algorithms 1 and 2, respectively. In the first step, we exploit the prescribed learning stage definitions (Table 2) to reduce the number of distributions that each sample is compared to, and to ensure that only feasible distributions are considered. For example, because we know that an unsuccessful landing is only possible in the first two learning stages, we can limit our MMD calculations to only those two distributions and safely ignore distributions for the latter two learning stages. In the second step, we compute (4) for each feasible distribution, and then select the distribution associated with the smallest distance to the sample in question. This process is shown graphically in Fig. 5 for a scenario drawn from observed data, in which only one learning stage is feasible.

Algorithm 1: Feasibility Algorithm.

Input: Landing type (Unsafe, Unsuccessful, Safe)

Output: Feasible learning stages $F$

Initialize $F := \emptyset$

if $\text{Landing type} = \text{Unsuccessful}$ then

$F \leftarrow F \cup \lbrace \mathcal {L}_{1}, {\mathcal {L}}_{2}\rbrace$

else if $\text{Landing type} = \text{Unsafe}$ then

$F \leftarrow F \cup \lbrace {\mathcal {L}}_{1},{\mathcal {L}}_{2},{\mathcal {L}}_{3}\rbrace$

else if $\text{Landing type} = \text{Safe}$ then

$F \leftarrow F \cup \lbrace {\mathcal {L}}_{2},{\mathcal {L}}_{3},{\mathcal {L}}_{4}\rbrace$

end if

return $F$

Algorithm 2: Learning Stage Classification.

Input: Canonical distributions $\mathbb{P}_{n}$ for $n \in \lbrace 1, 2, 3, 4\rbrace$ , Sample trajectory $\xi$ , Set of feasible learning stages $F$

Output: Minimum distance learning stage $s$

for $n \in F$ do

$\widehat{\text {MMD}}_{n} = \Vert k(\xi, \cdot) - \widehat{m}_{\mathbb{P}_{n}} \Vert _{\mathscr {H}}$ as in (2), (4)

end for

$s = \text{arg min}_{n \in F} \widehat{\text {MMD}}_{n}$

return $s$

The main benefit of this approach is that it blends knowledge of physical and human context with numerical analysis, to provide sensitivity and robustness with limited manipulation of the data. We note that for the kernel embedding methods to be effective, the kernel that defines the reproducing kernel Hilbert space must be chosen to reflect underlying features of the data. In our case, the parameterization of the radial basis kernel is important to ensure appropriate fidelity in the projection of the data into the Hilbert space.

D. Data Pre-Processing

Before applying the methods in Section IV-C, we preprocess the data to ensure validity and numerical conditioning. We first interpolate the trajectories so that they are all of the same length, $N=1000$ time steps. This is necessary to ensure feasibility of comparisons across kernel embeddings when evaluating the MMD in (4). In essence, we normalize all trajectories with respect to their length, ignoring the time needed to complete the maneuver. Because time to completion was not included in our definition of learning stages, we do not believe this compromises our results. Through this interpolation, we preserve all information about how a maneuver was completed, as opposed to when it was completed. Lastly, we employ scaling so that all state variables (position, velocity, roll angle, and roll rate) are of comparable size to prevent ill conditioning. The scaling is completed prior to the projection into the Hilbert space.

SECTION V.

Results and Discussion

A. Classification Accuracy

We evaluate the accuracy of our classification algorithm through a comparison with the manually determined assessment of learning stage described in Section III, as shown in Fig. 6(a). We focus on trials completed in manual mode for this analysis, because they are indicative of human performance without automation assistance, and consistent with the selection of canonical distributions solely from manual mode trajectories. Because we have high variability in the number of trials completed in each learning stage (Table 3), we present our results primarily in terms of percentages, to facilitate comparison across learning stages.

TABLE 3 Agreement between MMD-based and NCA-based classification algorithms.

Figure 6.

Classification accuracy of MMD-based and NCA-based algorithms.

Show All

A total of 80.69% of the trials completed in manual mode are classified correctly by our algorithms. Of the trials that are incorrectly classified, 84.09% are misclassified in a neighboring learning stage. We find that 84.05% of all manual trials are either correctly classified or off by one learning stage.

Many of the misclassifications (across all learning stages) occurred when the observed trajectory was essentially a “corner case” for that Learning Stage. That is, the trajectory just barely met the criteria for that learning stage. These scenarios highlight a ubiquitous challenge in classification of stochastic, dynamic trajectories—it is difficult to quantitatively capture all criteria used to assign trajectories to a given learning stage. In these scenarios, manual identification of the learning stage relies in part upon qualitative judgements that are not possible in a rule-based algorithm. For example, Fig. 7 shows a subject transitioning between Learning Stage 1 and Learning Stage 2. The eye gaze trajectory is indicative of Learning Stage 1, with an overt transition during the trajectory from focusing on the immediate stabilization and descent of the quadrotor, to focusing on the landing pad, when the quadrotor is still a substantial distance away from the landing pad. However, the trajectory is classified as Learning Stage 2 because the landing occurred on the landing pad (Unsafe), as opposed to off the landing pad (Unsuccessful).

Figure 7.

Example of a trial manually labeled as learning stage 2 that the MMD-based classifier categorizes as learning stage 1.

Show All

We observe that the MMD-based classification algorithm has high accuracy in lower learning stages (92% in Learning Stage 1, and 85% in Learning Stage 2). However, misclassifications occurred at relatively higher rates in higher learning stages, with an accuracy of 61% in Learning Stage 3 and 39% in Learning Stage 4. Upon further examination, we found that misclassifications in Learning Stage 4 primarily occur when the quadrotor trajectory is relatively direct in its descent, and the gaze trajectory shows the user looking at AOIs (such as when glancing at the live data). Although we do include a trajectory similar to this in our canonical trajectories for Learning Stage 4, the low accuracy rate means that it is possible that additional trajectories of this type should be included. That is, the inaccurate classification could be due to a lack of data representative of the trajectory that is being misclassified. Although this limitation is unfortunately fundamental to our approach, in that as with all learning related techniques, effectively sampled data is key for effective classification, we note that it is possible to tune our classifier by e.g., providing additional data in Learning Stages 3 and 4, or including additional features (such as time to completion) that may be important for latter stages.

B. Comparison With Nearest Centroid Assignment

We seek to compare the MMD-based classification algorithm with a more standard statistical approach. We select the nearest centroid assignment algorithm (NCA), a special case of the nearest neighbor classification algorithm [23], for comparison, because of its overall similarity to MMD. The NCA algorithm (Algorithm 3) computes the similarity between a single observation (i.e., an observed trajectory) and predefined clusters (i.e., the representative distributions for each learning stage), by computing the point-wise Euclidean distance between the mean of the clusters and the observed data. To ensure a fair comparison, we embed the NCA within the same Feasibility Algorithm (Algorithm 1) as we use for the MMD.

Algorithm 3: NCA Learning Stage Classification [23].

Input: Canonical distributions $\mathbb{P}_{n}$ for $n \in \mathbb {N}_{[1,4]}$ , Sample trajectory $\xi$ , Set of feasible learning stages $F \subseteq \mathbb {N}_{[1,4]}$

Output: Minimum distance learning stage $s$

for $n \in F$ do

$\text{dist}(n; \mu _{\mathbb{P}_{n}}, \xi) = \sqrt{\sum _{i=1}^{T} (\mu _{\mathbb{P}_{n,i}} - \xi _{i})^{2}}$ , for $\mu _{\mathbb{P}_{n}}$ , the mean of distribution $\mathbb{P}_{n}$

end for

$s = \text{arg min}_{n \in F}\; \text{dist}(n)$

return $s$

Fig. 6(b) shows the accuracy of the NCA based classifier via a confusion matrix, and Fig. 6(c) shows the difference in accuracy between the MMD-based classifer and the NCA-based classifier. The MMD-based classifier performs with almost uniformly higher accuracy than the NCA-based classifier for lower learning stages (1 and 2), and with comparable accuracy for higher learning stages (3 and 4). We further investigate the types of errors MMD and NCA make in Table 3. There are an order of magnitude more trials in which the MMD-based classifier is correct and NCA-based classifier incorrect, than there are in which the MMD-based clasifier is incorrect and the NCA-based classifier correct.

The main difference between the NCA and MMD classifiers is in the distance metrics they each employ. Although the Euclidean distance metric, which measures the straight-line distance between points in a vector space, offers interpretability and simplicity through a clear geometric interpretation, it also assumes that the trajectories exist in a metric space where the Euclidean norm is a suitable measure of similarity. That is, the Euclidean distance metric of NCA is well suited for datasets in which each dimension corresponds to a meaningful attribute that aligns well across samples. In contrast, MMD employs a kernel-based approach to compute the distance between distributions in a reproducing kernel Hilbert space, capturing the overall structure and patterns within the trajectories. This allows MMD to discern distributional characteristics that are not evident through pointwise comparison across time-steps, making it powerful for analyzing complex, structured data, even in cases where the underlying temporal dynamics are not aligned.

Practically, this translates to NCA performing well when trajectories are temporally aligned and when the task at hand requires assessing similarity based on specific time-indexed attributes. In contrast, MMD's kernel-based approach does not make such assumptions about the data and is therefore more robust to variability and to outliers. It is designed to compare the inherent properties of the data distributions, providing a more flexible framework for classification tasks where the overall shape, rather than the specific points along the trajectories, is of primary interest.

We see that the MMD classifier demonstrates a distinct advantage over the NCA, particularly in the initial stages characterized by high variability. In Learning Stages 1 and 2, learners exhibit a diverse range of patterns and behaviors. The Euclidean distance employed by NCA can be misled by the high variability, often failing to capture the underlying learning progression. In contrast, MMD can discern the subtler developmental trends indicative of early learning stages. This advantage becomes less pronounced in Learning Stages 3 and 4, in which learner behavior converges and the variability within the data sets decreases. In such scenarios, both NCA and MMD tend to perform similarly as the trajectories become more defined and the differences between stages are more distinct and easily captured by both pointwise and global measures. Consequently, the superiority of MMD in stages with higher variability underscores its robustness in handling heterogeneous and less structured data, a common characteristic of early skill acquisition.

SECTION VI.

Evaluation of Learning Dynamics via Learning Stage Classifier

In this section, we demonstrate that our learning stage classifier (as described in Section IV-C) can be used to evaluate the efficacy of learning for different mode selection policies. We apply the MMD-based classifier to all trials for all subjects, in both Self-Confidence and Performance-based groups, including those in shared control mode. Results are shown in the Appendix in Fig. 13. We first compare the effects of the two mode selection policies (self-confidence-based and performance-based) on the mean performance score over the last five trials (which are all manual mode).

Figure 8.

Comparison of performance scores of last five trials between groups. The violin plot depicts box plot results (minimum, first quartile, median, third quartile, and maximum values) as well as the distribution of data using a density curve.

Show All

Figure 9.

Comparison of variance of performance scores of last five trials between groups. Only participants who landed safely or unsafely in at least three of the last five trials of the experiment are included.

Show All

Figure 10.

Mean and standard error of number of trials categorized into each learning stage for Self-Confidence group and performance group.

Show All

Figure 11.

Percentage of participants classified into each of the four learning stages in the last five trials of the experiment.

Show All

Figure 12.

Mean learning stage across participants for all trials.

Show All

Figure 13.

Learning stage classification results for both self-confidence and performance group participants.

Show All

Fig. 8 shows that during the last five trials of the experiment, the Self-Confidence group was more likely to achieve higher mean performance scores than the Performance group. Likewise, Fig. 9 shows that during these same trials, the Self-Confidence group was more likely to have a lower variance in their performance scores (with the exception of one outlier participant in the self-confidence group) in the last five trials. Having established that the self-confidence-based policy is more likely to lead to higher and more consistent performance than the performance-based policy in the last five trials, we expect this to also be reflected by differences in the learning stage classification results for each group.

Fig. 10 shows the mean number of trials categorized into each learning stage for participants in the Self-Confidence group and the Performance group. The Self-Confidence group spends fewer trials in Learning Stage 1 and more trials in Learning Stages 2-4, compared to the Performance group.

Fig. 11 shows the percentage of participants classified into each of the four learning stages in the last five trials of the experiment for each group. It is evident that a larger percentage of participants in the Self-Confidence group achieve Learning Stages 3 and 4 in trials 16-20 in comparison to those in the Performance group. This is also demonstrated in Fig. 12, in which it can be observed that for trials 16 through 20, the mean learning stage among participants is higher for the Self-Confidence group. These observations are consistent with the performance metrics provided in Figs. 8 and 9 but provide valuable insight into the participants' learning dynamics, which are beneficial for the design of adaptive automation aimed at improving or accelerating learning. A rigorous comparison of the efficacy of the self-confidence-based policy and the performance-based policy is outside the scope of this paper, in part due to a small sample size of participants, but will be the subject of future work. We plan to investigate an optimal control policy in lieu of our heuristic based policy in future work, to facilitate progress amongst learning stages, with an experiment design that specifically accommodates comparisons between groups.

SECTION VII.

Conclusion

Motivated by challenges in the design of adaptive automation, we present a rule-based MMD classifier that assigns a sample trajectory to one of four learning stages. Given a representative distribution of trajectories for each learning stage, and a sample trajectory to be classified, the classifier uses the maximum mean discrepancy, a distance metric within the Hilbert space, to identify the distribution that is closest (amongst a set of feasible distributions) to the sample trajectory. Feasibility is determined by a rule-based algorithm that checks the sample trajectory for readily quantifiable landing characteristics (e.g., whether a landing occurred on the landing pad or not, whether the landing met constraints on speed and other quadrotor variables). We find our algorithm compares favorably to a standard statistical approach, the nearest centroid algorithm, largely because the Hilbert space based distance metric is more appropriate than a Euclidean distance for trajectories of stochastic dynamical systems. We use this classification to assess learning stage in an adaptive automation scheme that we designed to calibrate self-confidence, and show the effectiveness of our self-confidence heuristic in hastening progress to higher learning stages. We believe that tools such as the one we presented, which are designed to accommodate high variability within stochastic, human-in-the-loop dynamical systems, are an important tool for the future design of adaptive automation.

Future work will focus on further exploration of the impact of the self-confidence-based mode selection policy on learning progressions, as well as the construction of dynamic policies that employ learning stages for feedback. In addition, to establish initial trust in the automation assistance, participants may be provided with greater transparency with regard to how the automation assistance aids in user training.

ACKNOWLEDGMENT

The authors thank Sooyung Byeon for the initial development of the quadrotor landing training simulator that was adapted for the human subject experiment and the design of the controller implemented during shared control mode. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.

Appendix

SECTION A.

Quadrotor Dynamics and Control

For linearized planar quadrotor dynamics, $\begin{align*} x(k+1) = Ax(k)+Bu(k), x(0)= x_{0}, \tag{5} \end{align*}$ View Sourcewith $\begin{align*} A &= \begin{bmatrix}1 & 0 & 0 & 0.05 & 0 & 0 \\ 0 & 1 & 0 & 0 & 0.05 & 0 \\ 0 & 0 & 1 & 0 & 0 & 0.05 \\ 0 & 0 & 0.49 & 1 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0.995 & 0 \\ 0 & 0 & & -0.05 & 0 & -0.5 \end{bmatrix} \tag{6}\\ B &= \begin{bmatrix}0 & 0 & 0 & 0 & 4 & 0 \\ 0 & 0 & 0 & 0 & 0 & 100 \end{bmatrix}^{T}, \tag{7} \end{align*}$ View Sourceand with state $x = [x,y,\theta, \dot{x}, \dot{y}, \dot{\theta }]^{T} \in \mathbb{R}^{6}$ that captures vertical and horizontal position and velocities, and roll and roll rate, and input $u = [u_{1},u_{2}]^{T} \in \mathbb{R}^{2}$ that captures thrust and roll acceleration, the quadratic cost $\begin{align*} J=\sum ^{T}_{k=0} \left((x(k)-x_{r}(k))^{T}Q(x(k)-x_{r}(k))+u(k)^{T}Ru(k)\right) \tag{8} \end{align*}$ View Sourcehas weightings $Q = \text{diag}([ 3, 0.5, 500, 2, 0.5, 300])$ and $\begin{align*} R = \begin{bmatrix}300 & -30 \\ -30 & 300 \end{bmatrix} \tag{9} \end{align*}$ View Sourcefor reference trajectory $x_{r} \in \mathbb{R}^{6}$ . We blend the LQR based control, $u_{a}(k)$ , with the participant's input, $u_{h}(k)$ , to obtain the quadrotor input, $\begin{equation*} u(k) = 0.9u_{h}(k)+0.1u_{a}(k). \tag{10} \end{equation*}$ View SourceThe weightings of the convex combination are chosen empirically based on pilot experiments.

SECTION B.

Scoring Functions

The scoring functions for each landing characteristic described in Section II-B are included here. The position score, time score, velocity score, and roll angle score for each trial $k$ are given by $S_{k,p}$ , $S_{k,t}$ , $S_{k,v}$ , $S_{k,\theta }$ , respectively. Position, time, velocity, and roll angle are denoted by $p$ , $t$ , $v$ , and $\theta$ , respectively. $\begin{align*} S_{k,p} &= \left\lbrace \begin{array}{ll}100,& \text{unsafe or safe landing} \\ 100\left(1-\frac{\sqrt{x_{k}^{2}+y_{k}^{2}}}{45.2}\right),& \text{unsuccessful} \end{array}\right. \tag{11}\\ S_{k,t} &= \left\lbrace \begin{array}{ll}104.6\left(1-\frac{1}{1+e^\frac{-(t_{k}-50.0)}{15.0}}\right),&\text{unsafe or safe} \\ 50\left(1-\frac{RMS-1.25}{5.0}\right),&\text{unsuccessful} \end{array}\right. \tag{12}\\ S_{k,v} &= 100\left(1-\frac{1}{1+e^\frac{-(v_{k}-8.5)}{2.0}}\right) \tag{13}\\ S_{k,\theta } &= 126.4\left(1-\frac{1}{1+e^\frac{-|\theta _{k}|-20.0)}{15.0}}\right) {\kern5.0pt}. \tag{14} \end{align*}$ View Source

SECTION C.

Classification Results

Fig. 13(a) and (b) show the classification results for each of the participants. For each trial, the classified learning stage and the manual mode are represented by the black line and gray shading, respectively. Trials with no shading were completed in shared control mode. The “jumps,” i.e., transitions of two or three learning stages between sequential trials, may occur for a variety of reasons. Some jumps may reflect dynamic changes in learning progression, which do not necessarily follow a simple, linear trend [18], [19], [59]. However, some jumps may be artificial, due primarily to the fact that the classifier is not trained on trajectories in shared control mode. That is, under shared control, the assistance available to the participant can skew the participant's demonstration of their manual skill, leading to misclassifications. Indeed, all of the jumps from Learning Stage 1 to Learning Stage 4 occur under shared control mode (and none occur in manual mode).

References is not available for this document.

Classification of Human Learning Stages via Kernel Distribution Embeddings

Alerts

Abstract:

Metadata

Abstract:

Funding Agency:

Introduction

Human Subjects Experiment

A. Overview

B. Experiment and Controller Design

1) Self-Confidence Based Mode Selection Policy

2) Performance Score

C. Participants

D. Apparatus

E. Experimental Procedure and Data Collection

1) Procedure

2) Data Collection

Characterizing Learning Stages for the Quadrotor Landing Task

Learning Stage Classification via Kernel Embeddings

A. Justification for the Proposed Approach

B. Distribution Embeddings

Definition 1 (RKHS, [56]):

C. MMD-Based Classification Algorithm

Algorithm 1: Feasibility Algorithm.

Algorithm 2: Learning Stage Classification.

D. Data Pre-Processing

Results and Discussion

A. Classification Accuracy

B. Comparison With Nearest Centroid Assignment

Algorithm 3: NCA Learning Stage Classification [23].

Evaluation of Learning Dynamics via Learning Stage Classifier

Conclusion

ACKNOWLEDGMENT

Appendix

Quadrotor Dynamics and Control

Scoring Functions

Classification Results

Authors

Figures

References

Citations

Keywords

Metrics

References

IEEE Account

Purchase Details

Profile Information

Need Help?