Introduction
Most real-world robotic tasks, more or less, need human intervention due to the high capability that humans possess in reasoning and decision-making especially in unstructured/uncertain environments. A robotic system in which humans are involved in part of the overall process is called a semi-autonomous system. Needless to say, human modeling is indispensable when systematically designing a semi-autonomous robotic system. The most appropriate human model strongly depends on the role that the human plays in the system. Musić and Hirche [1] classified human roles in human–robot interactions as either active or supervisory, depending on the required level of autonomy [2]. In the active role, human intervention may involve motion control of the robot, and accordingly, the human is involved in the control loop. Meanwhile, in the supervisory role, the human focuses only on high-level decisions based on abstracted task information, which is sometimes described as “human-on-the-loop” [3]. In this article, we focus on modeling of a human with an active role in the human-in-the-loop architecture.
A promising approach to human-in-the-loop robot control with the human having an active role has been studied in the paradigm of bilateral teleoperation [4], [5], [6]. In this paradigm, the operator is modeled as a passive system. Beyond traditional one-human-one-robot teleoperation, teaming between a human and multiple robots has been investigated in the literature, including the scenarios of cooperative payload manipulation [7], [8], [9], [10], multi-robot navigation [11], [12], [13], and an exploration task [14], [15]. Meanwhile, relatively few papers have questioned the validity of the passivity assumption imposed on the human operator. Dyck et al. [16] examined the passivity of the human arm and revealed that it had a task-dependency. Bilateral teleoperators under milder assumptions were presented in [17], [18] to address the possibility of non-passive human operators. Non-passive components in bilateral teleoperators have also been studied based on integral quadratic constraint [19], [20], [21] and the input-to-output stability small gain theorem [22].
Previously, we addressed multi-robot navigation based on the concept of passivity, and presented a fully distributed control architecture that ensures motion synchronization under the assumption of human passivity [23]. We also examined human passivity through system identification techniques using human operation data collected on a 1-D human-in-the-loop simulator. We observed that an appropriately defined notion of human passivity strongly depends on the proficiency level of the system operation and the network structure. To address the possibility of the passivity shortage, the authors of [24], [25] also presented another distributed control architecture based on the architecture for output synchronization of passivity-short systems [26] and bilateral teleoperation [4], [5], [6]. In the present paper, we study an extension of [23], [24], [25] to the three dimensional case. It is easy to confirm that the stability analysis in [23], [24], [25] can be directly applied to the 3-D case. However, it remains unclear whether the analytical results for the 1-D experiments are applicable to the 3-D case. The selection of the interface in particular is more critical than in the 1-/2-D operations due to the limited human capability for 3-D recognition and real-time manipulability.
Virtual Reality (VR) technology is widely believed to enhance human 3-D recognition and manipulability. Indeed, there have been many publications devoted to various human-robot interactions, including operator support [27], [28], [29], simulation [30], [31], instruction [32], [33], [34], manipulation [35], [36], and teleoperation [37], [38] with VR and Augmented Reality (AR) devices as summarized in [39]. Some of these studies quantitatively revealed the benefit of VR and AR technology in terms of safety [27], [30], task completion speed [33], [34], [35], [37] and accuracy [35], [37], and human usability, workload, and experience [36], [37], [38]. The authors of [34] also conducted comprehensive subjective evaluations through user studies. Missing from these works is an analysis from a control–theoretic perspective. For example, how VR interfaces affect the human dynamic behavior remains an open question. Moreover, to the best of our knowledge, the impact of VR interfaces on human passivity has not yet been reported in the literature.
In this article, we study the scenario involving 3-D multi-robot navigation shown in Fig. 1 based on the paradigm of [23], [24], [25], and address human modeling and human passivity analysis. To this end, we begin by conducting user studies for the passivity-based architecture in [23] on a 3-D human-in-the-loop simulator with three different networks, taking three pairs of command/feedback interfaces: a traditional joystick controller and a 2-D display, a VR controller and a 2-D display, and a VR controller and a head mounted display (HMD). We then build dynamical models of a human operator and analyze human passivity, showing that VR interfaces are advantageous both in human modeling and human passivity as well as tracking performance. We also conduct the same experiments and modeling for other nine trial subjects and examine if the results of the investigation can be generalized. We also conduct the NASA TLX questionnaire in order to quantitatively evaluate the human workload for the three interface pairs. In the latter part of this article, we focus on the problem of human deviation from passivity even with VR interfaces. We conduct user studies for the passivity-shortage-based architecture in [24], [25] and reveal that all operators meet the degree of passivity shortage assumed in [24], [25].
The contributions of this article are summarized as follows.
We assess the impact of VR interfaces on dynamic human properties, including dynamical model accuracy, tracking performance, and passivity.
Multiple user studies are conducted to understand the generalization capability of the aforementioned analysis, and to analyze the human workload for each interface.
The passivity-short assumption in [24], [25] is shown to be satisfied even in the 3-D operation.
Preliminary: Passivity
We consider a dynamical system with the same input-output dimension given as
\begin{align*}
\dot{x} &= f(x) + g(x)u,\ x(0) = x_{0},\tag{1a}\\
y &= h(x), \tag{1b}
\end{align*}
\begin{align*}
S(x(\tau)) - S(x_{0}) \leq \int ^{\tau }_{0}y^{T}(t)u(t)dt
\end{align*}
\begin{align*}
S(x(\tau)) - S(x_{0}) \leq \int ^{\tau }_{0}y^{T}(t)u(t)dt {+} \nu \int ^{\tau }_{0}\Vert u(t)\Vert ^{2}dt
\end{align*}
Suppose now that (1) is linear time-invariant. We then define
\begin{align*}
\nu (\omega) = \lambda _\mathrm{min}(G(j\omega)+G^{H}(j\omega)),
\end{align*}
Control Architectures for Semi-Autonomous Multi-Robot Navigation
The goal of the 3-D one-human–multiple-robot interaction in this article is for the operator to stably navigate the multiple robots to a desirable position or velocity. In this scenario, direct manual control of the robots in real time is demanding for the human operator, and may even be impossible depending on the number of robots. A promising approach to address the issue is to utilize the architecture for the so-called complementary interactions [1], where motion synchronization is left to the multi-robot system and the role of controlling the robotic group is assigned to the operator. To this end, we presented two fully distributed control architectures based on passivity [23] and passivity-shortage [25], respectively. We will now briefly review these architectures.
A. Passivity-Based Control Architecture
Let us consider a group of
\begin{align*}
\dot{p}_{i} = u_{i}, \tag{2}
\end{align*}
A human operator is interfaced with the robots to receive feedback and command in his or her interaction with the robots. The feedback interface translates the output of the robotic group into visual feedback for the operator, and the command interface translates the human action into a control command for the robotic group. Both interfaces are assumed to have access to a subset of robots
The operator chooses whether to control the positions or velocities of the robots. For the position navigation, the control goal is formulated as
\begin{align*}
\lim _{t\to \infty }\Vert p_{i} - r_{p}\Vert = 0\ {\forall i}\in \mathcal {V}, \tag{3}
\end{align*}
\begin{align*}
\lim _{t\to \infty }\Vert \dot{p}_{i} - r_{v}\Vert &= 0\ {\forall i}\in \mathcal {V}, \tag{4a}\\
\lim _{t\to \infty }\Vert p_{i} - p_{j}\Vert &= 0\ {\forall i,j}\in \mathcal {V}. \tag{4b}
\end{align*}
\begin{align*}
u_{i} &= \sum _{j\in \mathcal {N}_{i}}a_{ij}(p_{j} - p_{i}) + \sum _{j\in \mathcal {N}_{i}}b_{ij} (\xi _{i} - \xi _{j}) + \delta _{i} v_\mathrm{h}\tag{5a}\\
\dot{\xi }_{i} &= \sum _{j\in \mathcal {N}_{i}}b_{ij}(p_{j} - p_{i}), \tag{5b}
\end{align*}
Let us now define the average position and velocity of the accessible robots
\begin{align*}
z_{p} = \frac{1}{|\mathcal {V}_\mathrm{h}|}\sum _{i\in \mathcal {V}_\mathrm{h}} p_{i},\ z_{v} = \frac{1}{|\mathcal {V}_\mathrm{h}|}\sum _{i\in \mathcal {V}_\mathrm{h}} \dot{p}_{i}, \tag{6}
\end{align*}
Lemma 1:
[23] Consider a group of robots with the dynamics (2) and the distributed controller (5), where the undirected graph
The authors of [23] designed a control architecture assuming that the human behaves as a passive system. Given this assumption, Lemma 1, and the fact that feedback interconnection of two passive systems ensures closed-loop stability [6], they interconnected the human and robots with
Passivity-based control architecture, where the switch is activated according to which control goal is selected by the operator, (3) or (4). The block
They showed that both the goals (3) or (4) with constant references
B. Passivity-Shortage-Based Control Architecture
The human passivity analysis in [23] revealed that human passivity depends on the proficiency level of the system operation and the network structure. It was also shown that the operator may fail to attain passivity for a sparse network even after training. Motivated by this work, the authors of [24], [25] presented another control architecture that accepts the human passivity shortage based on the architecture for output synchronization of passivity-short systems [26] and the bilateral teleoperation [4], [5], [6].
In our architecture, we employ a master robot to interact with the other robots, similarly to bilateral teleoperation [4], [5], [6]. Note, however, that our architecture differs from teleoperation in that the motion of the master is virtually simulated in the interface instead of there being a physical interaction between the master and the operator. Accordingly, the operator interacts with the robots by assessing visual feedback and determining a velocity command through an interface.
Denoting the position of the master by
\begin{align*}
\dot{q}_{m} = u_\mathrm{h}. \tag{7}
\end{align*}
\begin{align*}
y_\mathrm{h} = k_{m} z_{p} + (1-k_{m}) q_{m}, \tag{8}
\end{align*}
\begin{align*}
y_\mathrm{h} = k_{m} z_{v} + (1-k_{m}) \dot{q}_{m}, \tag{9}
\end{align*}
\begin{align*}
v_\mathrm{h} = k_{s}(q_{m} - z_{p}), \tag{10}
\end{align*}
Suppose now that the cascade system of the master robot and human is passivity short. Precisely speaking, we assume that the user-defined reference
Experiment Design
In the subsequent sections, we will address human modeling and analyze human passivity based on the models for the control architectures in Section III. In this section, we present a human-in-the-loop simulator and identification experiments conducted to collect the operation data for the human modeling. Note that we focus on position navigation for the reminder of this article, and will address velocity navigation in our future.
A. Human-in-the-Loop Simulator
In this subsection, we present a 3-D human-in-the-loop simulator whose overview is shown in Fig. 5.
We simulate the motion of 10 robots with the dynamics (2) and the distributed controller (5). The virtual master with (7) is only used in the case of the passivity-shortage-based architecture. In view of the real implementation, the robot dynamics (2) and (5) are simulated on the robot operating system (ROS), while the master dynamics and (10) are implemented in Unity. We also take three different types of networks, shown in Fig. 6, for which we set
Communication network Type 1 (left), Type 2 (middle), and Type 3 (right), where the red nodes belong to
We prepared two command interfaces. The first uses a DualShock 4 controller (Sony Corp.), standard joystick-based controller common for gaming that uses a pair of joysticks for input. The left stick specifies the
We used two feedback interfaces, a standard 2-D 27-inch display monitor and the Valve Index (Valve Corp.) or Meta Quest 2 (Meta Platforms Inc.) VR HMD. The HMD is connected to Unity, receiving and displaying the 3-D graphics generated by the program. The viewing angle varies depending on the behavior of the person on whom the display is mounted. The feedback information,
B. Identification Experiment
We will now detail the identification experiments we designed on the previously described simulator. The trial subject was told to use either the joysticks or the VR controller to drive
The subject conducted trials for all the networks in Fig. 6 under the three different interface settings summarized in Table 1. The sampling period for the data was 0.0083 s on average. It should also be noted that we included an additional process to the acquired data in the same manner as in [23]. Specifically, the subject was told to press a button to indicate that he or she recognizes the new references and was ready to start the operation. The data were then shifted so that the initial time of the operation synchronized with the time that the button was pushed, thus excluding the delays associated with recognizing the new references. The recognition delay is a phenomenon unique to this experiment since the reference is determined by the human in practice.
The time responses of
Time responses of
Impact of VR Technology on Human Properties
In this section, we focus on the passivity-based architecture in Section III-A. Our goal here is to examine how the interfaces in Table 1 affect human properties including model accuracy, tracking performance, and passivity.
A. Human Modeling
Let us first assess how each interface affects human modeling. To this end, we develop a human operator model using MATLAB System Identification Toolbox (Mathworks Inc.) and the so-called direct approach to closed-loop system identification [41], using the data from one trial. As a result, we obtain a model
The time responses of the model outputs and the identification data for the Type 1 network are shown in Fig. 9. We can see that the fitting performance for the traditional interface (interface #1 in Table 1) is poor, primarily because of the extreme actions pointed out in Fig. 8. Meanwhile, interfaces #2 and #3 with the VR controller achieve a better fitting performance than interface #1. The accuracy for interface #3 looks slightly better than that for interface #2, but it is difficult to discuss the superiority between these two only from this data.
Time series data of the model outputs (red) and the identification data (blue) on
Next, we conducted the cross validation by using the data from another trial as verification data. Fig. 10 shows the time responses of the model outputs and the verification data for the Type 1 network. The fitting performance for interface #1 is again worse than the other two. It is interesting to note that the fitting performance for the 3rd element of
Time series data of the model outputs (red) and the verification data (blue) on
In summary, we can conclude that the VR interface simplifies human behavior in these tasks to the extent that it can be represented by a linear time-invariant system. This reduces the uncertainty stemming from the human factor in the loop and simplifies designing human–robot interaction systems. In the present paper, we focus only on the specific control architectures and robot dynamics. However, the above analysis of the human does not rely on the special structure of the present robot system. It is thus expected that these investigations are also valid for other semi-autonomous robot navigation systems as long as the operator gives the velocity commands and the robot dynamics are well compensated so that they follow the commands and are approximated to be linear time-invariant. We would like to leave more investigations on the issue to future work.
B. Tracking Performance
Let us next demonstrate the tracking performances for the above three interfaces.
We take the tracking error
\begin{align*}
e_\mathrm{n}(t) = \frac{e(t)}{\Vert e(t_{k})\Vert } \text{ if } t\in [t_{k}, t_{k+1})\text{[s]},\ \ k = 0, 1, 2,\ldots, 10.
\end{align*}
The time series data of the normalized errors
Time series data of the normalized error
In summary, we conclude that the VR interface enhances the tracking performance in terms of both steady-state and fast-response properties by providing humans with richer depth information.
C. Human Passivity
Let us next investigate the impact of the VR interface on human passivity. We take the model for interface #2 as a baseline for the comparison with that for interface #3, since the model for interface #1 is not always accurate enough to discuss passivity. This allows us to analyze how the higher 3-D recognition ability enabled by the HMD improves human passivity.
Fig. 12 shows the passivity index
Passivity index
Meanwhile, Fig. 12 indicates that the human operator fails to attain passivity for all networks even if he/she uses the VR interface. We remark that human passivity shortage does not immediately generate closed-loop instability for a specific network. Human passivity is a condition that ensures closed-loop stability for any network. In practice, the operator was able to achieve stable operations throughout all trials. Nonetheless, it would be more reliable to design a system architecture in which the operator is able to ensure the stability for any network. This motivates us to use the passivity-shortage-based control architecture in Section III-B, which we will study in the next section.
D. Validation for Multiple Subjects
In this subsection, we examine if the investigations in the above subsections are universally applied to other operators. To this end, we conduct user studies for the other nine trial subjects.
We start with remarking that the following discussions assume that subjects are pre-trained. Actually, without any training, they could not even complete the navigation task itself. In such a case, no meaningful discussion on model accuracy or other human properties can be developed. In the experiment, we thus had the subjects freely repeat the above 150 s trials with the network Type 2 three times for each interface. We next showed the operation of a well-trained operator as a reference, and then asked them to do one more trial. They finally conduct two trials for all of the three interfaces, where the data for the first trial and the second trial are used as identification data and verification data, respectively.
Let us first examine our hypothesis on human modeling in Section V-A, where it was exemplified that the VR interfaces enhance the human model accuracy. We take the same modeling method and model parameters as Section V-A. The fit ratios for the interface #1(red), #2(green), and #3(blue) are shown in Fig. 13. The fit ratios to the identification data in the left figure are higher for interface #3, #2 and #1 in this order for all participants, which validates the hypothesis formed in Section V-A. The ratios to the verification data in the right figure tend to be lower than those in the right, and, for participant 2, 7, and 8, the order for interface #1 and #2 is reversed. However, it is at least confirmed that the pair of the VR controller and HMD achieves the best fit ratio for all participants, and these results reinforce the hypothesis that VR interfaces improve the model accuracy.
Model fit ratios for the nine trial subjects with the identification data (top) and verification data (bottom).
We next examine our hypothesis on human passivity in Section V-C. The input passivity indices
We finally show the human workload perceived by the nine trial subjects for the three interfaces through NASA TLX questionnaire [42] in Fig. 15. We see from this figure that the VR interfaces reduce the workload on average. In particular, the use of the VR controller requires less workload than the joystick controller for all participants. In contrast, for subjects 2, 4, and 6, the use of the HMD imposes an additional burden. The reasons for this could be the discomfort that VR images cause or the weight of the device itself. In any case, it cannot be concluded at least from these results that the HMD reduces the burden on the person.
Human Modeling and Passivity Analysis for Passivity-Shortage-Based Control Architecture
The experiments covered in this section are identical to those in Section IV-B except the control architecture was changed to that in Section III-B, where we take only interface #3. To analyze the impact of the parameter
Bode diagrams of the systems from
A. Human Modeling
We constructed the human models based on the operation data from the passivity-shortage-based control architecture, taken in the same way as in the previous section. We also used the same number of poles and zeros: the diagonal elements have 2 poles and 1 zero, and non-diagonal elements are constant.
The time series data of the model outputs and the verification data for
Time series data of the model outputs (red) and the verification data (blue) on
Time series data of the model outputs (red) and the verification data (blue) on
We finally present interesting analysis results, not necessarily related to the main argument, below. Bode diagrams of the identified models are presented in Fig. 19. We see from these figures that the human characteristics differ in the value of
B. Human Passivity Analysis
In this subsection, we examine human passivity for the passivity-shortage-based control architecture. We begin by noting that [24] showed that the cascade system needs to be passivity short from
We also note that the index
The passivity indices for the three networks and the two
Passivity index
Remark 1:
The appropriate selection of
Conclusion
In this article, we studied human-enabled multi-robot navigation in three dimensions designed based on passivity. The high dimensionality posed new challenges not in operations up to two dimensions. Namely, we had to consider how to enable humans to understand the robots' 3-D information and to stably manipulate the 3-D robots' motion. To address the issues, we prepared two pairs of command interfaces, joystick and VR controller, and feedback interfaces, 2-D display and HMD, and we conducted user studies to acquire the operation data for three interface candidates. Through these user studies, we have obtained the following four findings regarding the benefits of the VR interfaces.
VR interfaces improve the accuracy of the human dynamic model, and thus can ease the design of human-in-the-loop systems,
VR interfaces enhance human passivity, which contributes to enabling stable interactions between the human and robots.
VR interfaces improve the tracking performance due to the depth information given by the HMD.
Operators are likely to fail in achieving passivity for the 3-D operation, even with the VR interface.
There are open questions to be addressed in the future. First, we need to address whether the above insights are applied to velocity navigation. An appropriate design of the virtual robot dynamics may contribute to all aspects including human model accuracy, passivity, performance and workload. How the skill level of the operator affects the model accuracy and passivity is also open. How having the operator feed back the real scenes through AR affects the human modeling, passivity, workload, and task performance is also left as future work. Linking the human properties with personal data like gender, age, and nationality should be also addressed in the future.
Finally, all user studies in this article were conducted under the permission of Administrative office of Human Subjects Research Ethics Review Committee in Tokyo Institute of Technology (Permit No. 2023109).
Appendix
Appendix
This Appendix covers how the orders of the operator model were determined for this article. In the conference version [40], various model orders were examined under the restriction that all elements had to have the same number of poles and zeros, and we concluded that having 2 poles and 1 zero is the best for all of three interface selections and all network types, based on the fit ratio for the verification data. The fit ratios for the model in [40] are summarized in Table 5. As pointed out in [40], the model tends to have large resonance peaks in the non-diagonal elements, as can be seen in the blue curves in Fig. 21. The same applies to the human model for the passivity-shortage-based architecture in Fig. 22. Given that this is a human model, it is reasonable to assume that these resonance peaks are the result of overfitting rather than correct identification. To avoid overfitting, we applied regularization techniques prepared by the System Identification Toolbox, but the resulting model accuracy was far worse than those in Table 5. We thus set the order of the non-diagonal elements to 0, which produces the models illustrated by the red curves in Figs. 21 and 22. Comparing Tables 2 and 5, we can see that this model not only eliminates the peaks but also achieves slightly better model accuracy. This is why we took the present model in the discussion and subsequent human passivity analysis.
Bode diagrams of a human operator model for the passivity-based architecture with interface # 3 and the Type 1 network. The blue line represents the model with 2 poles and 1 zero for all elements, and the red represents that with 2 poles and 1 zero only for the diagonal elements and constants for non-diagonal elements.
Bode diagrams of a human operator model for the passivity-shortage-based architecture with interface # 3, the Type 1 network, and