Loading web-font TeX/Main/Regular
Human Modeling and Passivity Analysis for Semi-Autonomous Multi-Robot Navigation in Three Dimensions | IEEE Journals & Magazine | IEEE Xplore

Human Modeling and Passivity Analysis for Semi-Autonomous Multi-Robot Navigation in Three Dimensions


Abstract:

In this article, we study a one-human-multiple-robot interaction for human-enabled multi-robot navigation in three dimensions. We employ two fully distributed control arc...Show More
Topic: Modeling, Control, and Learning Approaches for Human-Robot Interaction Systems

Abstract:

In this article, we study a one-human-multiple-robot interaction for human-enabled multi-robot navigation in three dimensions. We employ two fully distributed control architectures designed based on human passivity and human passivity shortage. The first half of this article focuses on human modeling and analysis for the passivity-based control architecture through human operation data on a 3-D human-in-the-loop simulator. Specifically, we compare virtual reality (VR) interfaces with a traditional interface, and examine the impacts that VR technology has on human properties in terms of model accuracy, performance, passivity and workload, demonstrating that VR interfaces have a positive effect on all aspects. In contrast to 1-D operation, we confirm that operators hardly attain passivity regardless of the network structure, even with the VR interfaces. We thus take the passivity-shortage-based control architecture and analyze the degree of passivity shortage. We then observe through user studies that operators tend to meet the degree of shortage needed to prove closed-loop stability.
Topic: Modeling, Control, and Learning Approaches for Human-Robot Interaction Systems
Published in: IEEE Open Journal of Control Systems ( Volume: 3)
Page(s): 45 - 57
Date of Publication: 15 December 2023
Electronic ISSN: 2694-085X

Funding Agency:

Description

The supplementary video is a supporting document to the article.
Review our Supplemental Items documentation for more information.

SECTION I.

Introduction

Most real-world robotic tasks, more or less, need human intervention due to the high capability that humans possess in reasoning and decision-making especially in unstructured/uncertain environments. A robotic system in which humans are involved in part of the overall process is called a semi-autonomous system. Needless to say, human modeling is indispensable when systematically designing a semi-autonomous robotic system. The most appropriate human model strongly depends on the role that the human plays in the system. Musić and Hirche [1] classified human roles in human–robot interactions as either active or supervisory, depending on the required level of autonomy [2]. In the active role, human intervention may involve motion control of the robot, and accordingly, the human is involved in the control loop. Meanwhile, in the supervisory role, the human focuses only on high-level decisions based on abstracted task information, which is sometimes described as “human-on-the-loop” [3]. In this article, we focus on modeling of a human with an active role in the human-in-the-loop architecture.

A promising approach to human-in-the-loop robot control with the human having an active role has been studied in the paradigm of bilateral teleoperation [4], [5], [6]. In this paradigm, the operator is modeled as a passive system. Beyond traditional one-human-one-robot teleoperation, teaming between a human and multiple robots has been investigated in the literature, including the scenarios of cooperative payload manipulation [7], [8], [9], [10], multi-robot navigation [11], [12], [13], and an exploration task [14], [15]. Meanwhile, relatively few papers have questioned the validity of the passivity assumption imposed on the human operator. Dyck et al. [16] examined the passivity of the human arm and revealed that it had a task-dependency. Bilateral teleoperators under milder assumptions were presented in [17], [18] to address the possibility of non-passive human operators. Non-passive components in bilateral teleoperators have also been studied based on integral quadratic constraint [19], [20], [21] and the input-to-output stability small gain theorem [22].

Previously, we addressed multi-robot navigation based on the concept of passivity, and presented a fully distributed control architecture that ensures motion synchronization under the assumption of human passivity [23]. We also examined human passivity through system identification techniques using human operation data collected on a 1-D human-in-the-loop simulator. We observed that an appropriately defined notion of human passivity strongly depends on the proficiency level of the system operation and the network structure. To address the possibility of the passivity shortage, the authors of [24], [25] also presented another distributed control architecture based on the architecture for output synchronization of passivity-short systems [26] and bilateral teleoperation [4], [5], [6]. In the present paper, we study an extension of [23], [24], [25] to the three dimensional case. It is easy to confirm that the stability analysis in [23], [24], [25] can be directly applied to the 3-D case. However, it remains unclear whether the analytical results for the 1-D experiments are applicable to the 3-D case. The selection of the interface in particular is more critical than in the 1-/2-D operations due to the limited human capability for 3-D recognition and real-time manipulability.

Virtual Reality (VR) technology is widely believed to enhance human 3-D recognition and manipulability. Indeed, there have been many publications devoted to various human-robot interactions, including operator support [27], [28], [29], simulation [30], [31], instruction [32], [33], [34], manipulation [35], [36], and teleoperation [37], [38] with VR and Augmented Reality (AR) devices as summarized in [39]. Some of these studies quantitatively revealed the benefit of VR and AR technology in terms of safety [27], [30], task completion speed [33], [34], [35], [37] and accuracy [35], [37], and human usability, workload, and experience [36], [37], [38]. The authors of [34] also conducted comprehensive subjective evaluations through user studies. Missing from these works is an analysis from a control–theoretic perspective. For example, how VR interfaces affect the human dynamic behavior remains an open question. Moreover, to the best of our knowledge, the impact of VR interfaces on human passivity has not yet been reported in the literature.

In this article, we study the scenario involving 3-D multi-robot navigation shown in Fig. 1 based on the paradigm of [23], [24], [25], and address human modeling and human passivity analysis. To this end, we begin by conducting user studies for the passivity-based architecture in [23] on a 3-D human-in-the-loop simulator with three different networks, taking three pairs of command/feedback interfaces: a traditional joystick controller and a 2-D display, a VR controller and a 2-D display, and a VR controller and a head mounted display (HMD). We then build dynamical models of a human operator and analyze human passivity, showing that VR interfaces are advantageous both in human modeling and human passivity as well as tracking performance. We also conduct the same experiments and modeling for other nine trial subjects and examine if the results of the investigation can be generalized. We also conduct the NASA TLX questionnaire in order to quantitatively evaluate the human workload for the three interface pairs. In the latter part of this article, we focus on the problem of human deviation from passivity even with VR interfaces. We conduct user studies for the passivity-shortage-based architecture in [24], [25] and reveal that all operators meet the degree of passivity shortage assumed in [24], [25].

Figure 1. - Scenario involving 3-D one–human–multiple–robot interactions.
Figure 1.

Scenario involving 3-D one–human–multiple–robot interactions.

The contributions of this article are summarized as follows.

  1. We assess the impact of VR interfaces on dynamic human properties, including dynamical model accuracy, tracking performance, and passivity.

  2. Multiple user studies are conducted to understand the generalization capability of the aforementioned analysis, and to analyze the human workload for each interface.

  3. The passivity-short assumption in [24], [25] is shown to be satisfied even in the 3-D operation.

Note that preliminary results only for the passivity-based architecture [23] were in part presented in a conference version in [40], however (ii) and (iii) are novel contributions added in this article. Also, the human models in [40] had undesirable resonance peaks due to overfitting, and the model developed in this article does not have such peaks.

SECTION II.

Preliminary: Passivity

We consider a dynamical system with the same input-output dimension given as \begin{align*} \dot{x} &= f(x) + g(x)u,\ x(0) = x_{0},\tag{1a}\\ y &= h(x), \tag{1b} \end{align*}

View SourceRight-click on figure for MathML and additional features.where x(t) \in \mathbb {R}^{n} is the state, u(t)\in \mathbb {R}^{m} is the control input, and y(t)\in \mathbb {R}^{m} is the control output. The system is said to be passive if there exists a positive semi-definite function S:\mathbb {R}^{n} \to \mathbb {R} such that \begin{align*} S(x(\tau)) - S(x_{0}) \leq \int ^{\tau }_{0}y^{T}(t)u(t)dt \end{align*}
View SourceRight-click on figure for MathML and additional features.
holds for all input signals u, all initial states x_{0}, and all time \tau \geq 0. Suppose now that there exists \nu \in \mathbb {R} such that \begin{align*} S(x(\tau)) - S(x_{0}) \leq \int ^{\tau }_{0}y^{T}(t)u(t)dt {+} \nu \int ^{\tau }_{0}\Vert u(t)\Vert ^{2}dt \end{align*}
View SourceRight-click on figure for MathML and additional features.
holds for all u, all x_{0}, and all \tau \geq 0. The minimum of \nu is called the input passivity index and is denoted by \bar{\nu }. The system is then said to be input feedforward passive if \bar{\nu } > 0 and input feedforward passivity-short if \bar{\nu } < 0. When \bar{\nu } < 0, the parameter \bar{\nu } is called the impact coefficient [26].

Suppose now that (1) is linear time-invariant. We then define \begin{align*} \nu (\omega) = \lambda _\mathrm{min}(G(j\omega)+G^{H}(j\omega)), \end{align*}

View SourceRight-click on figure for MathML and additional features.where G(s) is the transfer function matrix from u to y and \lambda _\mathrm{min}(A) is the minimal eigenvalue of a matrix A. It is then well known that \bar{\nu } = \min _{\omega }\nu (\omega) holds [26]. The function \nu (\omega) is thus regarded as a passivity metric corresponding to the angular frequency \omega.

SECTION III.

Control Architectures for Semi-Autonomous Multi-Robot Navigation

The goal of the 3-D one-human–multiple-robot interaction in this article is for the operator to stably navigate the multiple robots to a desirable position or velocity. In this scenario, direct manual control of the robots in real time is demanding for the human operator, and may even be impossible depending on the number of robots. A promising approach to address the issue is to utilize the architecture for the so-called complementary interactions [1], where motion synchronization is left to the multi-robot system and the role of controlling the robotic group is assigned to the operator. To this end, we presented two fully distributed control architectures based on passivity [23] and passivity-shortage [25], respectively. We will now briefly review these architectures.

A. Passivity-Based Control Architecture

Let us consider a group of n robots in 3-D Euclidean space as shown in Fig. 2. The set of their IDs is denoted by \mathcal {V} = \lbrace 1,2,\ldots, n\rbrace. Each robot i \in \mathcal {V} is assumed to obey the kinematic model \begin{align*} \dot{p}_{i} = u_{i}, \tag{2} \end{align*}

View SourceRight-click on figure for MathML and additional features.where p_{i} \in \mathbb {R}^{3} is the position of robot i \in \mathcal {V} relative to the world frame and u_{i} \in \mathbb {R}^{3} is the velocity input to be designed. The robots are assumed to be interconnected by a network that is modelled by a fixed and connected undirected graph G = (\mathcal {V}, \mathcal {E}),\ \mathcal {E} \subseteq \mathcal {V}\times \mathcal {V}. The neighbor set \mathcal {N}_{i} is then defined as \mathcal {N}_{i} = \lbrace j\in \mathcal {V}|\ (i,j)\in \mathcal {E}\rbrace.

Figure 2. - Configuration of multiple robots.
Figure 2.

Configuration of multiple robots.

A human operator is interfaced with the robots to receive feedback and command in his or her interaction with the robots. The feedback interface translates the output of the robotic group into visual feedback for the operator, and the command interface translates the human action into a control command for the robotic group. Both interfaces are assumed to have access to a subset of robots \mathcal {V}_\mathrm{h} \subseteq \mathcal {V} through wireless communication. The information displayed on the feedback interface is denoted by y_\mathrm{h}, and the command determined by the operator is denoted by u_\mathrm{h}, where the vector u_\mathrm{h} is defined in the world frame. Throughout this article, we assume that u_\mathrm{h}\in \mathbb {R}^{3} is a velocity command to the robots i \in \mathcal {V}_\mathrm{h}. The operator drives all robots to a reference position r_{p} or reference velocity r_{v} in the world frame by manipulating the command signal.

The operator chooses whether to control the positions or velocities of the robots. For the position navigation, the control goal is formulated as \begin{align*} \lim _{t\to \infty }\Vert p_{i} - r_{p}\Vert = 0\ {\forall i}\in \mathcal {V}, \tag{3} \end{align*}

View SourceRight-click on figure for MathML and additional features.while for the velocity navigation it is formulated as \begin{align*} \lim _{t\to \infty }\Vert \dot{p}_{i} - r_{v}\Vert &= 0\ {\forall i}\in \mathcal {V}, \tag{4a}\\ \lim _{t\to \infty }\Vert p_{i} - p_{j}\Vert &= 0\ {\forall i,j}\in \mathcal {V}. \tag{4b} \end{align*}
View SourceRight-click on figure for MathML and additional features.
To achieve both position and velocity navigation, the authors of [23], [25] proposed the following distributed controller based on the PI consensus algorithm. \begin{align*} u_{i} &= \sum _{j\in \mathcal {N}_{i}}a_{ij}(p_{j} - p_{i}) + \sum _{j\in \mathcal {N}_{i}}b_{ij} (\xi _{i} - \xi _{j}) + \delta _{i} v_\mathrm{h}\tag{5a}\\ \dot{\xi }_{i} &= \sum _{j\in \mathcal {N}_{i}}b_{ij}(p_{j} - p_{i}), \tag{5b} \end{align*}
View SourceRight-click on figure for MathML and additional features.
where a_{ij} and b_{ij} are positive gains, and \delta _{i} = 1 if i\in \mathcal {V}_\mathrm{h} and \delta _{i} = 0 otherwise. The symbol v_\mathrm{h} is a signal that will be designed to reflect the human navigation objective. The signal differs between the passivity-based and the passivity-shortage-based control architecture, which will be presented later.

Let us now define the average position and velocity of the accessible robots \mathcal {V}_\mathrm{h} as \begin{align*} z_{p} = \frac{1}{|\mathcal {V}_\mathrm{h}|}\sum _{i\in \mathcal {V}_\mathrm{h}} p_{i},\ z_{v} = \frac{1}{|\mathcal {V}_\mathrm{h}|}\sum _{i\in \mathcal {V}_\mathrm{h}} \dot{p}_{i}, \tag{6} \end{align*}

View SourceRight-click on figure for MathML and additional features.respectively. The following lemma can then be proved to hold in [23].

Lemma 1:

[23] Consider a group of robots with the dynamics (2) and the distributed controller (5), where the undirected graph G is assumed to be connected and |\mathcal {V}_\mathrm{h}|\geq 1. Then, the collective dynamics for all robots is passive from v_\mathrm{h} to z_{p}. Also, if the signal v_\mathrm{h} is differentiable in time, then the dynamics is passive from \dot{v}_\mathrm{h} to z_{v}.

The authors of [23] designed a control architecture assuming that the human behaves as a passive system. Given this assumption, Lemma 1, and the fact that feedback interconnection of two passive systems ensures closed-loop stability [6], they interconnected the human and robots with v_\mathrm{h} = u_\mathrm{h} and had the average position y_\mathrm{h} = z_{p} or velocity y_\mathrm{h} = z_{v} fed back to the operator, where these signals are switched at the feedback interface depending on the selected control goal, (3) or (4).1 The overall system is illustrated in Fig. 3 for the situation in which the operator determines the velocity command based on the error e = r_{p} - z_{p} or e = r_{v} - z_{v}.

Figure 3. - Passivity-based control architecture, where the switch is activated according to which control goal is selected by the operator, (3) or (4). The block $H$ is a map from the tracking error $e$ to the human velocity command $u_\mathrm{h}$. The block $F$ is a filter that ensures differentiability of the signal $v_\mathrm{h}$, which is needed only for the velocity navigation. The “aver.” blocks output the average of the elements of the input into them.
Figure 3.

Passivity-based control architecture, where the switch is activated according to which control goal is selected by the operator, (3) or (4). The block H is a map from the tracking error e to the human velocity command u_\mathrm{h}. The block F is a filter that ensures differentiability of the signal v_\mathrm{h}, which is needed only for the velocity navigation. The “aver.” blocks output the average of the elements of the input into them.

They showed that both the goals (3) or (4) with constant references r_{p} and r_{v} is achieved under human passivity together with additional assumptions, even without sharing the selected control goal among all the robots. Despite the difference in the configuration space of the considered system, the same proof can be trivially extended to the three dimensional case considered in this article and therefore is omitted. We instead focus on human behavioral analysis, including human passivity. The authors of [23] addressed this issue by studying human modeling and human passivity analysis based on operation data acquired on a 1-D human-in-the-loop simulator. However, it is unclear whether a human undertaking the 3-D robot operation would behave in the same way as in the 1-D or the 2-D case. Since the tablet interface considered in [23] is unsuitable as a command interface for 3D operations due to its limited dimensionality, we must carefully select an alternative interface. Furthermore, the challenge of conveying the robot's 3-D information to humans is an issue that demands thorough consideration.

B. Passivity-Shortage-Based Control Architecture

The human passivity analysis in [23] revealed that human passivity depends on the proficiency level of the system operation and the network structure. It was also shown that the operator may fail to attain passivity for a sparse network even after training. Motivated by this work, the authors of [24], [25] presented another control architecture that accepts the human passivity shortage based on the architecture for output synchronization of passivity-short systems [26] and the bilateral teleoperation [4], [5], [6].

In our architecture, we employ a master robot to interact with the other robots, similarly to bilateral teleoperation [4], [5], [6]. Note, however, that our architecture differs from teleoperation in that the motion of the master is virtually simulated in the interface instead of there being a physical interaction between the master and the operator. Accordingly, the operator interacts with the robots by assessing visual feedback and determining a velocity command through an interface.

Denoting the position of the master by q_{m} \in \mathbb {R}^{3} in the world frame, we let q_{m} obey the dynamics \begin{align*} \dot{q}_{m} = u_\mathrm{h}. \tag{7} \end{align*}

View SourceRight-click on figure for MathML and additional features.We then interconnect the human operator with the master robot (7) and the robotic group using the architecture in [26]. Specifically, the signal y_\mathrm{h} is designed as \begin{align*} y_\mathrm{h} = k_{m} z_{p} + (1-k_{m}) q_{m}, \tag{8} \end{align*}
View SourceRight-click on figure for MathML and additional features.
or \begin{align*} y_\mathrm{h} = k_{m} z_{v} + (1-k_{m}) \dot{q}_{m}, \tag{9} \end{align*}
View SourceRight-click on figure for MathML and additional features.
with k_{m} \in (0, 1) by blending the master position and the average position. We also design v_\mathrm{h} as \begin{align*} v_\mathrm{h} = k_{s}(q_{m} - z_{p}), \tag{10} \end{align*}
View SourceRight-click on figure for MathML and additional features.
where k_{s} is a positive gain. The overall control architecture is illustrated in Fig. 4.

Figure 4. - Passivity-shortage-based control architecture.
Figure 4.

Passivity-shortage-based control architecture.

Suppose now that the cascade system of the master robot and human is passivity short. Precisely speaking, we assume that the user-defined reference r_{p} is constant and that the cascade system is passivity short from {r_{p}} - y_\mathrm{h}(t) to q_{m}(t)-{r_{p}} with an impact coefficient \bar{\nu } greater than −1. The authors of [24] proved that (3) is achieved under k_{m} \in (0,1) and additional assumptions on the boundedness of the signals for position navigation. Meanwhile, in the case of velocity navigation, (4) was shown to be achieved if the cascade system is passivity short from r_{v} - y_\mathrm{h}(t) to \dot{q}_{m}(t)-r_{v} with \bar{\nu } > -1, k_{m} \in (0,1), and with additional assumptions on the boundedness of the signals. They also showed through user studies for multiple subjects that operators met the assumption of \bar{\nu } > -1. Just as in Section III-A, we forgo repeating the same proof in this article and instead focus on whether \bar{\nu } > -1 is satisfied for the 3-D operation.

SECTION IV.

Experiment Design

In the subsequent sections, we will address human modeling and analyze human passivity based on the models for the control architectures in Section III. In this section, we present a human-in-the-loop simulator and identification experiments conducted to collect the operation data for the human modeling. Note that we focus on position navigation for the reminder of this article, and will address velocity navigation in our future.

A. Human-in-the-Loop Simulator

In this subsection, we present a 3-D human-in-the-loop simulator whose overview is shown in Fig. 5.

Figure 5. - Schematic of the 3-D human-in-the-loop simulator.
Figure 5.

Schematic of the 3-D human-in-the-loop simulator.

We simulate the motion of 10 robots with the dynamics (2) and the distributed controller (5). The virtual master with (7) is only used in the case of the passivity-shortage-based architecture. In view of the real implementation, the robot dynamics (2) and (5) are simulated on the robot operating system (ROS), while the master dynamics and (10) are implemented in Unity. We also take three different types of networks, shown in Fig. 6, for which we set a_{ij} = b_{ij} = 1 for all (i,j)\in \mathcal {E}. In Type 1 (left), the inter-robot network is sparse, and all robots are connected to the interface. In Type 2 (middle), the robots are interconnected by a dense network, while only robot 1 is connected to the interface. In Type 3 (right), the inter-robot network is the same as in Type 1, but only robot 1 is connected to the interface. The signal v_\mathrm{h} in (5) is set to v_\mathrm{h} = u_\mathrm{h} in the passivity-based architecture, while v_\mathrm{h} is given by (10) with k_{s} = 3 in the passivity-shortage-based architecture. The average position z_{p} is sent to Unity, which then generates 3-D graphics to enable smoother interactions between the human and the robots.

Figure 6. - Communication network Type 1 (left), Type 2 (middle), and Type 3 (right), where the red nodes belong to $\mathcal {V}_\mathrm{h}$ and the blue do not.
Figure 6.

Communication network Type 1 (left), Type 2 (middle), and Type 3 (right), where the red nodes belong to \mathcal {V}_\mathrm{h} and the blue do not.

We prepared two command interfaces. The first uses a DualShock 4 controller (Sony Corp.), standard joystick-based controller common for gaming that uses a pair of joysticks for input. The left stick specifies the x- and z-coordinates of the velocity command u_\mathrm{h}, while the longitudinal operation of the right stick corresponds to the y-coordinate. The gain from the joystick angle to u_\mathrm{h} was tuned so that the maximal angle corresponds to \pm 0.15 m/s for Type 1 and \pm 1.5 m/s for Types 2 and 3, which was done to enable better human operability, and due to the fact that the stationary gain from u_\mathrm{h} to z_{p} for Type 1 is 10 times as large as that for the other two networks [23]. This reasonable change in the stationary gain at the interface depending on the network structure is simply determined by |\mathcal {V}_\mathrm{h}|/n. The second interface used Valve Index (Valve Corp.) or Meta Quest 2 (Meta Platforms Inc.) VR controller. The operator pushes a button on the controller in the beginning of the experiments, and the controller's position at this time is set to the origin. The vector from the origin to the real-time position of the VR controller is converted to \gamma u_\mathrm{h}. In view of the fact that the hand motion for each coordinate is limited to around \pm 30 cm, the parameter \gamma was set to 2 for Type 1 and 0.2 for the other networks, and consequently, \Vert u_\mathrm{h}\Vert _{\infty } was approximately restricted to 0.15 m/s for Type 1 and 1.5 m/s for Types 2 and 3 just as they were for the joystick controller. The command signal u_\mathrm{h} was then directly sent to a topic through Unity, at which point ROS subscribed and substituted the signal into u_\mathrm{h} in (5) or (10).

We used two feedback interfaces, a standard 2-D 27-inch display monitor and the Valve Index (Valve Corp.) or Meta Quest 2 (Meta Platforms Inc.) VR HMD. The HMD is connected to Unity, receiving and displaying the 3-D graphics generated by the program. The viewing angle varies depending on the behavior of the person on whom the display is mounted. The feedback information, y_\mathrm{h} = z_{p} and (8), is switched depending on the selected control architecture. The point y_\mathrm{h} represented in the 3-D graphics as a yellow ball, as shown in Fig. 7. When the operator is using the VR controller, we also display the origin of the controller coordinate frame and the current position of the controller using cyan and blue balls, respectively. For the 2-D monitor, the 3D graphics are seen from a fixed viewpoint, where the optical axis is in parallel to z-axis of the world frame and the viewing angle is set to 60 degrees throughout the experiments. In this setting, the robots may leave the field of view, but we arranged the experiments so that this would not happen since addressing this issue is beyond the scope of this work.

Figure 7. - Scene viewed during system operation.
Figure 7.

Scene viewed during system operation.

B. Identification Experiment

We will now detail the identification experiments we designed on the previously described simulator. The trial subject was told to use either the joysticks or the VR controller to drive z_{p} (yellow ball) to a reference r_{p} (red ball). They were told to stop when the yellow ball with a diameter of 5 cm lay inside of the red ball with a diameter of 5.5 cm. The reference jumped randomly at every 15 s to a point within a 2 m cube, including the operator whose center was located 1 m from the floor. One trial consisted of 10 jumps of the reference, taking 150 s in total. A video of the trials is found in the Multimedia Materials for this article.

The subject conducted trials for all the networks in Fig. 6 under the three different interface settings summarized in Table 1. The sampling period for the data was 0.0083 s on average. It should also be noted that we included an additional process to the acquired data in the same manner as in [23]. Specifically, the subject was told to press a button to indicate that he or she recognizes the new references and was ready to start the operation. The data were then shifted so that the initial time of the operation synchronized with the time that the button was pushed, thus excluding the delays associated with recognizing the new references. The recognition delay is a phenomenon unique to this experiment since the reference is determined by the human in practice.

TABLE 1 Interface selections in the identification experiments.
Table 1- Interface selections in the identification experiments.

The time responses of u_\mathrm{h} over 15 s for the joystick and VR controllers are illustrated in the left and right figures, respectively, of Fig. 8. The operator tended to take extreme actions for the joystick controller even in the settling phase, with a small error, because the built-in spring makes fine manipulation difficult. Meanwhile, we can see that the commands given with the VR controller are much smoother.

Figure 8. - Time responses of $u_\mathrm{h}$ for the joystick (left) and VR controller (right), where the black curve shows $\Vert e\Vert$ with the error $e = r_{p} - z_{p}$.
Figure 8.

Time responses of u_\mathrm{h} for the joystick (left) and VR controller (right), where the black curve shows \Vert e\Vert with the error e = r_{p} - z_{p}.

SECTION V.

Impact of VR Technology on Human Properties

In this section, we focus on the passivity-based architecture in Section III-A. Our goal here is to examine how the interfaces in Table 1 affect human properties including model accuracy, tracking performance, and passivity.

A. Human Modeling

Let us first assess how each interface affects human modeling. To this end, we develop a human operator model using MATLAB System Identification Toolbox (Mathworks Inc.) and the so-called direct approach to closed-loop system identification [41], using the data from one trial. As a result, we obtain a model u_\mathrm{h}(s) = H(s)e(s) with a 3-by-3 transfer function matrix H(s). In the system identification, we used a continuous-time model with 2 poles and 1 zero for the diagonal elements and constants for the non-diagonal elements. A discussion of the selection of the model order can be found in the Appendix. In the sequel, we use H(s) identified from various operation data to analyze human properties.

The time responses of the model outputs and the identification data for the Type 1 network are shown in Fig. 9. We can see that the fitting performance for the traditional interface (interface #1 in Table 1) is poor, primarily because of the extreme actions pointed out in Fig. 8. Meanwhile, interfaces #2 and #3 with the VR controller achieve a better fitting performance than interface #1. The accuracy for interface #3 looks slightly better than that for interface #2, but it is difficult to discuss the superiority between these two only from this data.

Figure 9. - Time series data of the model outputs (red) and the identification data (blue) on $u_\mathrm{h}$ for the Type 1 network (left: interface #1, middle: interface #2, right: interface #3).
Figure 9.

Time series data of the model outputs (red) and the identification data (blue) on u_\mathrm{h} for the Type 1 network (left: interface #1, middle: interface #2, right: interface #3).

Next, we conducted the cross validation by using the data from another trial as verification data. Fig. 10 shows the time responses of the model outputs and the verification data for the Type 1 network. The fitting performance for interface #1 is again worse than the other two. It is interesting to note that the fitting performance for the 3rd element of u_\mathrm{h} is worse than the other two elements in the case of interface #2, while the model successfully fit the data for all elements in the case of interface #3. In view of the fact that the coordinate of the 3rd element corresponds to the depth from the viewpoint in the 3-D graphics, the bottom-middle figure indicates that the operator may fail to take consistent behavior against the depth errors. It is conceivable from the bottom-middle and bottom-right figures that the dominant reason why the model accuracy for this coordinate is worse than the other two coordinates is due to the limited human capacity for depth recognition in 2-D images. The fit ratios for all three networks and all interface selections are given in Table 2. We can see from them that the VR interfaces improved the model accuracy with 25\sim30% compared with the other interfaces, regardless of the network types. We see from this table that not only does the smoother operation of the VR controller contribute to enhancing the model accuracy but also does the enhanced 3-D recognition by the HMD.

TABLE 2 Average model fit ratios among three elements of u_\mathrm{h} for the verification data (passivity-based architecture).
Table 2- Average model fit ratios among three elements of $u_\mathrm{h}$ for the verification data (passivity-based architecture).
Figure 10. - Time series data of the model outputs (red) and the verification data (blue) on $u_\mathrm{h}$ of the same subject as Fig. 9 for the Type 1 network (left: interface #1, middle: interface #2, right: interface #3).
Figure 10.

Time series data of the model outputs (red) and the verification data (blue) on u_\mathrm{h} of the same subject as Fig. 9 for the Type 1 network (left: interface #1, middle: interface #2, right: interface #3).

In summary, we can conclude that the VR interface simplifies human behavior in these tasks to the extent that it can be represented by a linear time-invariant system. This reduces the uncertainty stemming from the human factor in the loop and simplifies designing human–robot interaction systems. In the present paper, we focus only on the specific control architectures and robot dynamics. However, the above analysis of the human does not rely on the special structure of the present robot system. It is thus expected that these investigations are also valid for other semi-autonomous robot navigation systems as long as the operator gives the velocity commands and the robot dynamics are well compensated so that they follow the commands and are approximated to be linear time-invariant. We would like to leave more investigations on the issue to future work.

B. Tracking Performance

Let us next demonstrate the tracking performances for the above three interfaces.

We take the tracking error e as a metric of the performance, but the absolute value of the error depends on the size of the jumps in the reference r_{p}. In the above trial, the reference switches at every 15 s and we denote each switching time by t_{k},\ k = 0, 1, 2,\ldots, 10 with t_{0} = 0s. Accordingly, we define the normalized error e_\mathrm{n} by \begin{align*} e_\mathrm{n}(t) = \frac{e(t)}{\Vert e(t_{k})\Vert } \text{ if } t\in [t_{k}, t_{k+1})\text{[s]},\ \ k = 0, 1, 2,\ldots, 10. \end{align*}

View SourceRight-click on figure for MathML and additional features.

The time series data of the normalized errors e_\mathrm{n} for the three interfaces and three networks during a trial are shown in Fig. 11. Surprisingly, it is first observed that interface #2 performs worse than the other two, which indicates that the benefits of the VR interface are maximised not by the VR controller alone, but by the combination of the controller and HMD. We see from this figure that interface #3 achieves the smallest steady-state error among the three for all network types. These results indicate that the depth information provided by HMD is the key to improving the tracking performance. The accumulated values of \Vert e(t)\Vert over t \in [t_{k}-5, t_{k}]\ (k=1,2,\ldots, 10) for interface #1 and interface #3 over the two trials are summarized in Table 3. It is quantitatively confirmed that the steady-state error is reduced by the combination of the VR controller and HMD for all networks. We also see from Fig. 11 that the interface #3 achieves faster responses than #1.

TABLE 3 Steady-state errors for interface #1 and interface #3.
Table 3- Steady-state errors for interface #1 and interface #3.
Figure 11. - Time series data of the normalized error $e_\mathrm{n}$ over a trial for interface #1 (red), # 2 (green), and #3 (blue) and three networks, where the top, middle and bottom figures correspond to Type 1, 2, and 3, respectively.
Figure 11.

Time series data of the normalized error e_\mathrm{n} over a trial for interface #1 (red), # 2 (green), and #3 (blue) and three networks, where the top, middle and bottom figures correspond to Type 1, 2, and 3, respectively.

In summary, we conclude that the VR interface enhances the tracking performance in terms of both steady-state and fast-response properties by providing humans with richer depth information.

C. Human Passivity

Let us next investigate the impact of the VR interface on human passivity. We take the model for interface #2 as a baseline for the comparison with that for interface #3, since the model for interface #1 is not always accurate enough to discuss passivity. This allows us to analyze how the higher 3-D recognition ability enabled by the HMD improves human passivity.

Fig. 12 shows the passivity index \nu (\omega) for the three networks, where the blue and red curves represent \nu (\omega) for the models in interface #3 and #2, respectively. We can immediately see that the index for the model with the HMD is larger than that for the model with the 2-D display over all frequencies for all networks. In view of the property of the passivity index \nu stated at the end of Section II, we conclude that the HMD improves human passivity, which also means that the VR interface enhances closed-loop stability of this human-in-the-loop system.

Figure 12. - Passivity index $\nu$ for the passivity-based control architecture (top-left: Type 1, top-right: Type 2, bottom: Type 3).
Figure 12.

Passivity index \nu for the passivity-based control architecture (top-left: Type 1, top-right: Type 2, bottom: Type 3).

Meanwhile, Fig. 12 indicates that the human operator fails to attain passivity for all networks even if he/she uses the VR interface. We remark that human passivity shortage does not immediately generate closed-loop instability for a specific network. Human passivity is a condition that ensures closed-loop stability for any network. In practice, the operator was able to achieve stable operations throughout all trials. Nonetheless, it would be more reliable to design a system architecture in which the operator is able to ensure the stability for any network. This motivates us to use the passivity-shortage-based control architecture in Section III-B, which we will study in the next section.

D. Validation for Multiple Subjects

In this subsection, we examine if the investigations in the above subsections are universally applied to other operators. To this end, we conduct user studies for the other nine trial subjects.

We start with remarking that the following discussions assume that subjects are pre-trained. Actually, without any training, they could not even complete the navigation task itself. In such a case, no meaningful discussion on model accuracy or other human properties can be developed. In the experiment, we thus had the subjects freely repeat the above 150 s trials with the network Type 2 three times for each interface. We next showed the operation of a well-trained operator as a reference, and then asked them to do one more trial. They finally conduct two trials for all of the three interfaces, where the data for the first trial and the second trial are used as identification data and verification data, respectively.

Let us first examine our hypothesis on human modeling in Section V-A, where it was exemplified that the VR interfaces enhance the human model accuracy. We take the same modeling method and model parameters as Section V-A. The fit ratios for the interface #1(red), #2(green), and #3(blue) are shown in Fig. 13. The fit ratios to the identification data in the left figure are higher for interface #3, #2 and #1 in this order for all participants, which validates the hypothesis formed in Section V-A. The ratios to the verification data in the right figure tend to be lower than those in the right, and, for participant 2, 7, and 8, the order for interface #1 and #2 is reversed. However, it is at least confirmed that the pair of the VR controller and HMD achieves the best fit ratio for all participants, and these results reinforce the hypothesis that VR interfaces improve the model accuracy.

Figure 13. - Model fit ratios for the nine trial subjects with the identification data (top) and verification data (bottom).
Figure 13.

Model fit ratios for the nine trial subjects with the identification data (top) and verification data (bottom).

We next examine our hypothesis on human passivity in Section V-C. The input passivity indices \bar{\nu } for interface #1(red), #2(green), and #3(blue) are shown in Fig. 14. Although some models have questionable reliability due to their low accuracy in Fig. 13, we at least observe the tendency that interface #3 enhances the human passivity, which almost reinforces the conclusion in the previous subsection. On the other hand, all of the participants do not perfectly attain passivity even with interface #3 in the same way as Section V-C, whereas an operator attained passivity depending on the network structure in the 1-D case [23]. These results also emphasize the needs for addressing human passivity shortage.

Figure 14. - Input passivity index $\bar{\nu }$ for the nine trial subjects.
Figure 14.

Input passivity index \bar{\nu } for the nine trial subjects.

We finally show the human workload perceived by the nine trial subjects for the three interfaces through NASA TLX questionnaire [42] in Fig. 15. We see from this figure that the VR interfaces reduce the workload on average. In particular, the use of the VR controller requires less workload than the joystick controller for all participants. In contrast, for subjects 2, 4, and 6, the use of the HMD imposes an additional burden. The reasons for this could be the discomfort that VR images cause or the weight of the device itself. In any case, it cannot be concluded at least from these results that the HMD reduces the burden on the person.

Figure 15. - Human workload quantified by NASA TLX [42], where the number in the title of each graph is the participant number. The bars in each graph correspond to interfaces #1, #2 and #3 from left to right.
Figure 15.

Human workload quantified by NASA TLX [42], where the number in the title of each graph is the participant number. The bars in each graph correspond to interfaces #1, #2 and #3 from left to right.

SECTION VI.

Human Modeling and Passivity Analysis for Passivity-Shortage-Based Control Architecture

The experiments covered in this section are identical to those in Section IV-B except the control architecture was changed to that in Section III-B, where we take only interface #3. To analyze the impact of the parameter k_{m} on the human characteristics, we focus on two settings k_{m} = 0.2 and k_{m} = 0.8. Note that the human operating target is closer to the master robot for k_{m} = 0.2, while it is closer to the actual robotic group for k_{m} = 0.8. The bode diagrams of the operating targets, namely the systems from u_\mathrm{h} to y_\mathrm{h}, with the three networks for both settings of k_{m} are illustrated in Fig. 16. We see that k_{m} = 0.8 has more complicated dynamic properties than k_{m} = 0.2 that is close to the master robot with the single integrator dynamics.

Figure 16. - Bode diagrams of the systems from $(u_\mathrm{h})_{i}$ to $(y_\mathrm{h})_{i} (i = 1, 2, 3)$ for all three networks, where $k_{m} = 0.2$ (red) and $k_{m} = 0.8$ (blue).
Figure 16.

Bode diagrams of the systems from (u_\mathrm{h})_{i} to (y_\mathrm{h})_{i} (i = 1, 2, 3) for all three networks, where k_{m} = 0.2 (red) and k_{m} = 0.8 (blue).

A. Human Modeling

We constructed the human models based on the operation data from the passivity-shortage-based control architecture, taken in the same way as in the previous section. We also used the same number of poles and zeros: the diagonal elements have 2 poles and 1 zero, and non-diagonal elements are constant.

The time series data of the model outputs and the verification data for k_{m} = 0.2 and k_{m} = 0.8 are shown in Figs. 17 and 18, respectively. We see from these figures that the models almost fit the verification data for both values of k_{m} and for all network types. The average fit ratios for both the identification and verification data are summarized in Table 4. In all cases, the identification data were successfully fit by the model with the specified orders. The ratios degrade for the verification data, but they all have high values except for network Type 1 and k_{m} = 0.8. This corresponds to the left column in Fig. 18. The oscillatory behavior in the second element of u_\mathrm{h} is not correctly fit, but the model totally fit the verification data. In view of the responses in the figures, we conclude that the model is able to correctly identify the human characteristics. On the other hand, we performed our modeling based on the hypothesis that model accuracy would deteriorate as the robot dynamics became more complex, but such a trend is not apparent in Table 4.

TABLE 4 Average model fit ratios among three elements of u_\mathrm{h} for the passivity-shortage-based architecture.
Table 4- Average model fit ratios among three elements of $u_\mathrm{h}$ for the passivity-shortage-based architecture.
Figure 17. - Time series data of the model outputs (red) and the verification data (blue) on $u_\mathrm{h}$ for $k_{m} = 0.2$ (left: Type 1, middle: Type 2, right: Type 3).
Figure 17.

Time series data of the model outputs (red) and the verification data (blue) on u_\mathrm{h} for k_{m} = 0.2 (left: Type 1, middle: Type 2, right: Type 3).

Figure 18. - Time series data of the model outputs (red) and the verification data (blue) on $u_\mathrm{h}$ for $k_{m} = 0.8$ (left: Type 1, middle: Type 2, right: Type 3).
Figure 18.

Time series data of the model outputs (red) and the verification data (blue) on u_\mathrm{h} for k_{m} = 0.8 (left: Type 1, middle: Type 2, right: Type 3).

We finally present interesting analysis results, not necessarily related to the main argument, below. Bode diagrams of the identified models are presented in Fig. 19. We see from these figures that the human characteristics differ in the value of k_{m} and the network type. In the frequency domain lower than 1 rd/s, the diagonal elements of the models for k_{m} = 0.2 have almost flat gains while those for k_{m} = 0.8 have gain peaks except for Type 1. The frequencies of the peaks almost correspond to the lags of the gains in Fig. 16 for each network. This is compatible with the claim of the crossover model presented in [43], and it is also highlighted in [23]. Specifically, the operator tries to shape the open-loop transfer function so that it gets closer to the single integrator while learning the inverse model of the system to be operated. This also explains the validity of the absence of peaks for k_{m} = 0.2 and network Type 1.

Figure 19. - Bode diagrams of the human operator models for the three networks and two $k_{m}$.
Figure 19.

Bode diagrams of the human operator models for the three networks and two k_{m}.

B. Human Passivity Analysis

In this subsection, we examine human passivity for the passivity-shortage-based control architecture. We begin by noting that [24] showed that the cascade system needs to be passivity short from {r_{p}} - y_\mathrm{h}(t) to q_{m}(t)-{r_{p}} with an impact coefficient \bar{\nu } greater than −1, thus enabling (3) for any network. This means that the passivity index \nu for the system \frac{H(s)}{s} with the operator model H(s) should be greater than -1 over the entire frequency domain.

We also note that the index \nu for \frac{H(s)}{s} takes extremely small values over the domain lower than 10^{-2}. This is because the model does not correctly identify the human property over such a low frequency domain since we use only data over 150 s. Meanwhile, passivity over the low frequency domain does not matter in practice, because the gain crossover frequency lies between 10^{-1} rad/s and 1 rd/s for all settings. This is why we check the index for \frac{100H(s)}{100s+1}, which has almost the same frequency responses as \frac{H(s)}{s}, at least over the domain greater than 10^{-1} rad/s.

The passivity indices for the three networks and the two k_{m} are given in Fig. 20. We see from these figures that \bar{\nu } > -1 is satisfied for all networks and k_{m}. The human operator is thus expected to behave as a system meeting the requirement in [24].

Figure 20. - Passivity index $\nu$ for the passivity-shortage-based control architecture (left: Type 1, middle: Type 2, right: Type 3).
Figure 20.

Passivity index \nu for the passivity-shortage-based control architecture (left: Type 1, middle: Type 2, right: Type 3).

Remark 1:

The appropriate selection of k_{m} was discussed in [24], [25]. In these papers, NASA TLX studies showed that a large k_{m} tends to increase the workload of the operator due to the complexity of the dynamics of the system controlled by the operator. On the other hand, a small k_{m} makes the real robots invisible from the human, which degrades transparency for the operator. It is thus expected to manage the trade-off between transparency and workload online by appropriately tuning k_{m} based on the human state like fatigue and skill level. We leave this issue to the future work.

SECTION VII.

Conclusion

In this article, we studied human-enabled multi-robot navigation in three dimensions designed based on passivity. The high dimensionality posed new challenges not in operations up to two dimensions. Namely, we had to consider how to enable humans to understand the robots' 3-D information and to stably manipulate the 3-D robots' motion. To address the issues, we prepared two pairs of command interfaces, joystick and VR controller, and feedback interfaces, 2-D display and HMD, and we conducted user studies to acquire the operation data for three interface candidates. Through these user studies, we have obtained the following four findings regarding the benefits of the VR interfaces.

  • VR interfaces improve the accuracy of the human dynamic model, and thus can ease the design of human-in-the-loop systems,

  • VR interfaces enhance human passivity, which contributes to enabling stable interactions between the human and robots.

  • VR interfaces improve the tracking performance due to the depth information given by the HMD.

  • Operators are likely to fail in achieving passivity for the 3-D operation, even with the VR interface.

We also conducted user studies on the passivity-shortage-based control architecture specifically to address this third issue. We showed that operators tend to meet the degree of passivity shortage needed to prove closed-loop stability.

There are open questions to be addressed in the future. First, we need to address whether the above insights are applied to velocity navigation. An appropriate design of the virtual robot dynamics may contribute to all aspects including human model accuracy, passivity, performance and workload. How the skill level of the operator affects the model accuracy and passivity is also open. How having the operator feed back the real scenes through AR affects the human modeling, passivity, workload, and task performance is also left as future work. Linking the human properties with personal data like gender, age, and nationality should be also addressed in the future.

Finally, all user studies in this article were conducted under the permission of Administrative office of Human Subjects Research Ethics Review Committee in Tokyo Institute of Technology (Permit No. 2023109).

Appendix

This Appendix covers how the orders of the operator model were determined for this article. In the conference version [40], various model orders were examined under the restriction that all elements had to have the same number of poles and zeros, and we concluded that having 2 poles and 1 zero is the best for all of three interface selections and all network types, based on the fit ratio for the verification data. The fit ratios for the model in [40] are summarized in Table 5. As pointed out in [40], the model tends to have large resonance peaks in the non-diagonal elements, as can be seen in the blue curves in Fig. 21. The same applies to the human model for the passivity-shortage-based architecture in Fig. 22. Given that this is a human model, it is reasonable to assume that these resonance peaks are the result of overfitting rather than correct identification. To avoid overfitting, we applied regularization techniques prepared by the System Identification Toolbox, but the resulting model accuracy was far worse than those in Table 5. We thus set the order of the non-diagonal elements to 0, which produces the models illustrated by the red curves in Figs. 21 and 22. Comparing Tables 2 and 5, we can see that this model not only eliminates the peaks but also achieves slightly better model accuracy. This is why we took the present model in the discussion and subsequent human passivity analysis.

TABLE 5 Average model fit ratios among three elements of u_\mathrm{h} for the verification data of the 2nd-order non-diagonal blocks.
Table 5- Average model fit ratios among three elements of $u_\mathrm{h}$ for the verification data of the 2nd-order non-diagonal blocks.
Figure 21. - Bode diagrams of a human operator model for the passivity-based architecture with interface # 3 and the Type 1 network. The blue line represents the model with 2 poles and 1 zero for all elements, and the red represents that with 2 poles and 1 zero only for the diagonal elements and constants for non-diagonal elements.
Figure 21.

Bode diagrams of a human operator model for the passivity-based architecture with interface # 3 and the Type 1 network. The blue line represents the model with 2 poles and 1 zero for all elements, and the red represents that with 2 poles and 1 zero only for the diagonal elements and constants for non-diagonal elements.

Figure 22. - Bode diagrams of a human operator model for the passivity-shortage-based architecture with interface # 3, the Type 1 network, and $k_{m} = 0.2$. The blue line represents the model with 2 poles and 1 zero for all elements, and the red represents that with 2 poles and 1 zero only for the diagonal elements and constants for non-diagonal elements.
Figure 22.

Bode diagrams of a human operator model for the passivity-shortage-based architecture with interface # 3, the Type 1 network, and k_{m} = 0.2. The blue line represents the model with 2 poles and 1 zero for all elements, and the red represents that with 2 poles and 1 zero only for the diagonal elements and constants for non-diagonal elements.

Description

The supplementary video is a supporting document to the article.
Review our Supplemental Items documentation for more information.

References

References is not available for this document.