Introduction
Due to the manufacturing shift from mass to customized production processes and the trend of an aging population in recent years, more flexibility and safety are required for robots in industrial and logistic scenarios. Collaborative Mobile Manipulators (CMM), which integrate a collaborative manipulator on a mobile platform, are designed with these objectives, combining the precision and manipulation capabilities of a robotic arm and the locomotion potential of the mobile platforms [1]. Because of these advantages, CMMs have demonstrated high suitability to different tasks, not only in industrial manufacturing and warehouse automation but also in service and domestic applications [1], [2]. Nevertheless, integrating different components makes the control and trajectory planning harder for CMM. Aiming at overcoming the synchronization issues of decoupled strategies, whole-body controllers have been presented. In particular, whole-body torque-based strategies could regulate not only the interaction forces with the environment but also the motion distribution at the joint level, generating different motion patterns [3], [4].
In addition, when facing unstructured environments, where interactions may change unexpectedly, collaborative robots (cobots) are expected to adapt their performance online to unforeseen situations. While dealing with physically interactive tasks (see Fig. 1), hybrid position-force approaches are favorable w.r.t. pure force control for safety reasons. Nevertheless, such schemes presented an unstable behavior during the contact phase [5]; hence it appears convenient to render virtual forces through an indirect method such as impedance control.
Left side: teaching a complex physically interactive task. Right side: Execution of the learned task adapting the Cartesian impedance.
To model the system response to interactions with the environment, impedance controllers require the tuning of a high number of design parameters (generally stiffness, damping and inertia). Usually, high impedance is required in case of precise motions, whereas lower impedance is favorable in case of interactions with the environment. However, how to optimally tune impedance parameters, namely variable impedance control (VIC), is still an open issue. For instance, in [6] an impedance model based on a mass-damper system is used to express the desired dynamics of the error between the guessed position of the environment and the actual desired position, which is the one allowing force tracking. In particular, based on the force error the damping is tuned online and the desired position obtained from the resulting dynamic model is sent to a position-controlled robot. Although this method can achieve force tracking on different surfaces, the desired trajectory computed is altered based on the force error (e.g., when the surface contact is lost), which may result in an unsafe behavior of the robot. Differently, authors in [5], [7] propose different strategies to self-tune the impedance parameters according to the position error. While a high tracking error increases the stiffness, the interaction force is exploited to detect unplanned contacts and recover a compliant behavior. In [5], moreover, the authors try to further minimize the interaction with the environment. Nevertheless, sometimes high interaction forces are a task requirement and hence becoming compliant is not a suitable strategy. Additionally, human-in-the-loop approaches allow for online tuning of the robot behavior (varying damping and inertia) transferring the human skills for specific collaborative tasks as a trade-off between precision and speed of execution [8]. Other methods exploit EMG sensors to estimate joint stiffness and generate reference trajectories at the same time [9], [10]. Although these human impedance-transfer approaches for fixed base robotic arms can benefit from real-time adaptation to the changing environment, especially in teleoperation, EMG sensors and motion capture systems are required, making them unsuitable for industrial and domestic applications.
Furthermore, to generate autonomous tasks for CMM, optimal impedance profiles are not sufficient, also reference trajectories should be defined. Imitation Learning (IL) paradigm has proven to be an efficient way to obtain desired motion given a predefined set of trajectories demonstrated by a teacher [11]. The IL paradigm includes different algorithms to encode the taught demonstrations, such as Dynamical Movement Primitives (DMPs) [12] and Gaussian Mixture Model (GMM) / Gaussian Mixture Regression (GMR) [11], among the most used. In the context of cobots, one IL methodology, i.e. kinesthetic teaching [11], has gained increasing popularity since it does not require any retargeting from teacher to the robot. Its implementation usually requires gravity compensation or admittance control through force sensing (force-torque (F/T) sensors at the end-effector or joint torque sensors).
To the best of the authors knowledge, currently there exist few intuitive interfaces that fully exploit the potential of CMMs in kinesthetic teaching. In the context of fixed-base manipulators, several works faced the problem of simultaneous learning of trajectories and force profiles through impedance models [13] using GMM/GMR. In [14], to generate safe robot behavior, the variable stiffness is tuned according the inverse of the position covariance of the GMM. In [15], two different semi-positive definite (SPD) stiffness representations are investigated, while stiffness profiles are obtained with a least-squares estimator using a linear interaction model, based on position/force demonstrations. A similar approach based on regularized regression and force sensing was used by the authors in [16], where the GMM maps the relationship between external force (input) and stiffness (output). Then, the stiffness is generated online by GMR based on current external force during teleoperation, where high external force will result in high stiffness. Nonetheless, the high stiffness of robots may hurt humans if the large force is generated by human disturbances. A different methodology, based on reinforcement learning (RL), models the robot policy by DMP that includes trajectory and impedance parameters. The policy parameters are then offline optimized using a variant of policy improvement with path integrals [17]. However, this RL method and the iteration optimization method in [5] requires several simulated executions of the desired task and they cannot deal with unforeseen disturbances online. In addition, most of the methods in the literature do not directly ensure the stability of the VIC, whose property is fundamental for a safe Human-Robot Collaboration (HRC) [13].
To address these issues, we propose a method to learn loco-manipulation tasks exploiting the paradigm of VIC (see Fig. 1). Our goal is to teach our MObile Collaborative robotic Assistant (MOCA) from human physical demonstrations through IL and generate robust and safe autonomous behaviors by regulating the interaction forces between the robot and the unstructured environment in quasi-static conditions. Thanks to an admittance-type interface [18], complex interactive tasks are intuitively demonstrated and then encoded by a GMM that learns the task trajectory and interaction force maps between the robot and the environment simultaneously. The desired trajectory and force profile are generated using GMR. Based on the learned force, the desired impedance parameters are computed online as the result of a quadratic program (QP), which ensures the lowest stiffness required to achieve the task, subject to force and stiffness limits imposed by safety requirements. Moreover, an energy tank-based passivity constraint is added to ensure the controller’s stability. The desired trajectory and generated stiffness are finally sent to the MOCA weighted whole-body impedance controller (see Fig. 2).
The proposed framework scheme. Humans can teach MOCA the interactive tasks directly through an admittance-type physical interface, and the desired trajectories and interaction force are replicated with GMM/GMR. Then, the desired force is sent to a QP-based algorithm to optimize online stiffness. A tank energy constraint ensures the passivity of the system. Finally, the desired trajectory and stiffness are sent to MOCA’s whole-body impedance controller.
The method is evaluated with a table cleaning task, where two different experiments are conducted. First of all, the task is taught in nominal conditions. Then, MOCA replicates the learned task in the same conditions and in the presence of external disturbances coming from the environment and from unexpected physical interactions with the human. The results show that the impedance parameters’ tuning yields similar (good) performances in all conditions, allowing at the same time trajectory and force tracking and robust and compliant interactions compared to constant stiffness solutions.
Preliminaries
A. MOCA Platform and Whole-Body Impedance Control
MOCA is a research robotic platform designed for HRC, with loco-manipulation capabilities that make it potentially suitable for logistics [3] and flexible manufacturing [19]. It is composed of the lightweight torque-controlled 7-DoFs Franka Emika Panda robotic arm, mounted on top of the velocity-controlled 3-DoFs Robotnik SUMMIT-XL STEEL mobile platform.
The whole-body dynamic model [4], [20] can be formulated as:
\begin{align*}
& \begin{pmatrix}\boldsymbol{M}_{v} & \boldsymbol{0} \\
\boldsymbol{0} & \boldsymbol{M}_{a} (\boldsymbol{q}_{a}) \end{pmatrix} \begin{pmatrix}\ddot{\boldsymbol{q}}_{m} \\
\ddot{\boldsymbol{q}}_{a} \end{pmatrix} + \begin{pmatrix}\boldsymbol{D}_{v} & \boldsymbol{0} \\
\boldsymbol{0} & \boldsymbol{C}_{a} (\boldsymbol{q}_{a}, \dot{\boldsymbol{q}_{a}}) \end{pmatrix} \begin{pmatrix}\dot{\boldsymbol{q}}_{m} \\
\dot{\boldsymbol{q}}_{a} \end{pmatrix} \\
&+ \begin{pmatrix}\boldsymbol{0} \\
\boldsymbol{g}_{a} (\boldsymbol{q}_{a}) \end{pmatrix} = \begin{pmatrix}\boldsymbol{\tau }_{m}^{vir} \\
\boldsymbol{\tau }_{a} \end{pmatrix} + \begin{pmatrix}\boldsymbol{\tau }_{m}^{ext} \\
\boldsymbol{\tau }_{a}^{ext} \end{pmatrix}, \tag{1}
\end{align*}
\begin{equation*}
\boldsymbol{M}(\boldsymbol{q}) \ddot{\boldsymbol{q}} + \boldsymbol{C}(\boldsymbol{q},\dot{\boldsymbol{q}})\dot{\boldsymbol{q}} + \boldsymbol{g}(\boldsymbol{q}) = \boldsymbol{\tau } + \boldsymbol{\tau }^{ext}. \tag{2}
\end{equation*}
The MOCA Cartesian impedance controller is formulated as a prioritized weighted inverse dynamics algorithm and can be obtained by solving the problem of finding the torque vector
\begin{equation*}
\underset{\boldsymbol{\tau} \in \mathbb {R}^{n}}{\min } \: \frac{1}{2} \left\Vert \boldsymbol{\tau } - \boldsymbol{\tau} _{\boldsymbol{0}} \right\Vert _{\boldsymbol{W}}^{2} \quad \text{s.t. } \boldsymbol{F} = \bar{\boldsymbol{J}}^{T}\boldsymbol{\tau } \tag{3}
\end{equation*}
\begin{align*}
\boldsymbol{\tau } = & \boldsymbol{W}^{-1} \boldsymbol{M}^{-1} \boldsymbol{J}^{T} \boldsymbol{\Lambda }_{W} \boldsymbol{\Lambda }^{-1} \boldsymbol{F} +\\
&+ (\boldsymbol{I}-\boldsymbol{W}^{-1} \boldsymbol{M}^{-1} \boldsymbol{J}^{T} \boldsymbol{\Lambda }_{W} \boldsymbol{J} \boldsymbol{M}^{-1})\boldsymbol{\tau }_{0}, \tag{4}
\end{align*}
Finally,
\begin{equation*}
\boldsymbol{F} = - \boldsymbol{D}^{d}\dot{\boldsymbol{x}} - \boldsymbol{K}^{d}\tilde{\boldsymbol{x}}, \tag{5}
\end{equation*}
B. GMM & GMR
The desired trajectories and force profiles are generated using GMM and GMR. This choice is due to the low number of hyperparameters that need to be tuned (i.e. only K, the number of Gaussians). The parameters of GMM can be estimated by Expectation-Maximization (EM) algorithm [21] with an offline training process that makes use of the demonstrations. To make the representation easier to understand, we give the following definition:
Methodology
A. Desired Trajectory and Interaction Force Hybrid Learning Exploiting an Admittance-Type Physical Interface
Aiming at demonstrating desired end-effector trajectories and interaction wrenches to MOCA, we integrated kinesthetic teaching with an admittance-type physical interface. The interface is depicted in Fig. 3. It consists of i) an Arduino Nano microcontroller connected to a 4-buttons panel that allows the user to configure different functionalities and communicate with the robot through the Robot Operating System (ROS) middleware suite, ii) a F/T sensor to measure the user interaction wrenches with a physical part that the human can easily grasp and iii) an end-effector tool. A detailed explanation of the hardware and design of an admittance-type physical interface can be found in [18].
Experimental setup. The teacher demonstrates the table cleaning task to MOCA from Start to Goal. A cleaning tool is attached to the end-effector of MOCA. The end-effector forces
Snapshots of the experiments. (top) Kinesthetic teaching of MOCA using an admittance-type physical interface. (middle) Experiment 1: autonomous task execution in nominal conditions. (bottom) Experiment 2: autonomous task execution with random disturbances. The desk’s height was changed in different frequencies and scales in the first five photos. The human stopped MOCA’s end-effector during the cleaning and free motion phase in the last two photos, respectively. A video of the experiments is available in the multimedia extension and at the link: https://youtu.be/Ini4twYy4Rk.
The measured human wrenches
\begin{equation*}
\boldsymbol{M}^{adm}\ddot{\boldsymbol{x}}_{d} + \boldsymbol{D}^{adm}\dot{\boldsymbol{x}}_{d} = \hat{\boldsymbol{\lambda }}_{h}, \tag{6}
\end{equation*}
The additional features introduced by an admittance-type physical interface allow to exploit the full potential of a mobile base robot and ease the demonstration process, considering the human preference in different parts of the demonstrated behavior. Thanks to selecting the loco-manipulation mode, the human teacher can move the robot in a theoretically infinite workspace during the demonstration and keep the mobile base still when an interaction of the end-effector with the environment is planned. Indeed, since the mobile base movements, controlled at lower frequency w.r.t. the arm, perturb the end-effector motion, demonstrating a desired interaction with the environment can result challenging. Then, the human teacher can tune the system’s responsiveness to the interaction by selecting the admittance according to his/her experience, preference, and specific requirements of the task: high admittance for long unconstrained motion and low admittance for short and precise interactions with the environment. Finally, selecting only a subset of the task space axes could help execute the task and speed up the learning procedure.
Desired Cartesian equilibrium pose
B. QP-Based Stiffness Online Optimization
The outputs of the GMR are fed to a QP that allows to online modulate the stiffness of the Cartesian whole-body impedance controller presented in Section II-A. The QP is formulated as follows:
\begin{align*}
\min _{\substack{\boldsymbol{K}_{i}^{d} \in \mathbb {R}^{m \times m} \\
i \in \lbrace 1,\ldots,N\rbrace }} \: & \frac{1}{2} \sum _{i=1}^{N} \left(\Vert \boldsymbol{F}_{i}^{ext} - \boldsymbol{F}_{i}^{d} \Vert _{\boldsymbol{Q}}^{2} + \Vert \boldsymbol{K}^{d}_{i} - \boldsymbol{K}^{\min} \Vert _{\boldsymbol{R}}^{2} \right) \\
\quad \text{s.t. } \qquad & \boldsymbol{K}^{\min} \leq \boldsymbol{K}_{i}^{d} \leq \boldsymbol{K}^{\max} {\kern14.22636pt} i \in \lbrace 1,\ldots,N\rbrace \\
-&\boldsymbol{F}^{\max} \leq \boldsymbol{F}_{i}^{ext} \leq \boldsymbol{F}^{\max} {\kern11.38092pt} i \in \lbrace 1,\ldots,N\rbrace \tag{7}
\end{align*}
The desired impedance interaction model can be expressed in different possible ways. To render a desired mass-spring-damper system,
\begin{equation*}
\boldsymbol{F}^{ext} = \boldsymbol{\Lambda }^{d}\ddot{\tilde{\boldsymbol{x}}} + \boldsymbol{D}^{d}\dot{\tilde{\boldsymbol{x}}} + \boldsymbol{K}^{d}\tilde{\boldsymbol{x}}, \tag{8}
\end{equation*}
\begin{equation*}
\boldsymbol{F}^{ext} = \boldsymbol{\Lambda }(\boldsymbol{x}) \ddot{\tilde{\boldsymbol{x}}} + \left(\boldsymbol{\mu}(\boldsymbol{x},\dot{\boldsymbol{x}}) + \boldsymbol{D}^{d} \right) \dot{\tilde{\boldsymbol{x}}} + \boldsymbol{K}^{d}\tilde{\boldsymbol{x}}, \tag{9}
\end{equation*}
\begin{equation*}
\boldsymbol{F}^{ext} = \left(\boldsymbol{\mu}(\boldsymbol{x},\dot{\boldsymbol{x}}) + \boldsymbol{D}^{d} \right)\dot{\tilde{\boldsymbol{x}}} + \boldsymbol{K}^{d}\tilde{\boldsymbol{x}}. \tag{10}
\end{equation*}
C. Tank Energy Based Passivity Constraint
The QP formulated in Section III-B computes a Cartesian stiffness at each time step that is sent to the Cartesian impedance whole-body controller presented in Section II-A. It is well known that VIC might violate the passivity of the system [23]; thus, the stability of the controlled system is not guaranteed. In order to ensure system stability, the QP formulation previously presented can be augmented through a passivity constraint for the power port
\begin{equation*}
\dot{\text {x}}_{t} = \frac{\sigma }{\text {x}_{t}}\dot{\tilde{\boldsymbol{x}}}^{T}\boldsymbol{D}^{d}\dot{\tilde{\boldsymbol{x}}} - \frac{\boldsymbol{w}^{T}}{\text {x}_{t}}\dot{\tilde{\boldsymbol{x}}}, \tag{11}
\end{equation*}
\begin{equation*}
\boldsymbol{w}(t) = {\begin{cases}-\boldsymbol{K}^{v}(t)\tilde{\boldsymbol{x}} & {\text{if}}\ T(\text {x}_{\text{t}})>\varepsilon \\
0 & \text{otherwise,} \end{cases}} \tag{12}
\end{equation*}
\begin{equation*}
T(\text {x}_{t}(t)) = T(\text {x}_{t}(t-1)) + \dot{T}(\text {x}_{t}(t))\Delta t > \varepsilon, \tag{13}
\end{equation*}
\begin{align*}
\dot{T}(\text {x}_{t}) &= \text {x}_{t}\dot{\text {x}}_{t} = \sigma \dot{\tilde{\boldsymbol{x}}}^{T}\boldsymbol{D}^{d}\dot{\tilde{\boldsymbol{x}}} - \boldsymbol{w}^{T}\dot{\tilde{\boldsymbol{x}}} \tag{14}
\\
&= {\begin{cases}\sigma \dot{\tilde{\boldsymbol{x}}}^{T}\boldsymbol{D}^{d}\dot{\tilde{\boldsymbol{x}}} + \tilde{\boldsymbol{x}}^{T}\boldsymbol{K}^{v}\dot{\tilde{\boldsymbol{x}}} & {\text{if}}\ T(\text {x}_{\text{t}})>\varepsilon \\
\sigma \dot{\tilde{\boldsymbol{x}}}^{T}\boldsymbol{D}^{d}\dot{\tilde{\boldsymbol{x}}} & \text{otherwise.}\end{cases}} \tag{15}
\end{align*}
D. Technical Details
While in the previous subsection, the method has been explained with generality, some simplifying hypotheses have been assumed in the implemented method. First of all, in the QP, the interaction model in (10) is used, where
\begin{equation*}
\boldsymbol{d}^{d}(t) = 2 \cdot 0.707 \cdot \sqrt{\boldsymbol{k}^{d}(t-1)}, \tag{16}
\end{equation*}
The problem in (7) can be rewritten as:
\begin{align*}
\min _{\substack{\boldsymbol{k}^{d} \in \mathbb {R}^{3}}} \: & \frac{1}{2} \left(\Vert \boldsymbol{F}^{ext} - \boldsymbol{F}^{d} \Vert _{\boldsymbol{Q}}^{2} + \Vert diag\lbrace \boldsymbol{k}^{d}\rbrace - \boldsymbol{K}^{\min} \Vert _{\boldsymbol{R}}^{2} \right) \\
\quad \text{s.t. } \qquad & \boldsymbol{k}^{\min} \leq \boldsymbol{k}^{d} \leq \boldsymbol{k}^{\max} \\
-&\boldsymbol{F}^{\max} \leq \boldsymbol{F}^{ext} \leq \boldsymbol{F}^{\max} \\
& T(\text {x}_{t}) \geq \varepsilon \tag{17}
\end{align*}
Experiments and Results
A. Experimental Setup
The experiments carried out in this work consist of human demonstrations to train the GMM model and autonomous repetition of a table cleaning task in two different conditions, i.e. without and with external disturbances. For each condition, three stiffness settings are tested: i) low constant stiffness (LS), where
The experimental setup for the human demonstrations is shown in Fig. 3 along with the path followed by the human. The human teacher grasps an admittance-type physical interface and can use the four buttons as described in Section III-A. The demonstrated trajectory starts from the start, which coincides with the top-left part of the table from the human viewpoint. Then, the human guides the robot end-effector to clean the table, moving parallel to the shortest side of the table, keeping the sponge in contact with the table’s surface. When the bottom of the table is reached, the human guides the robot in free motion to the top of the table, shifted to the right w.r.t. the previous trajectory of an amount equal to the cleaning tool width. This procedure is repeated six times until the goal is reached. Three demonstrations are performed, all from the same human teacher, where desired position and interaction force of the end-effector with the environment are recorded and are later used to train the GMM.
A constant Cartesian impedance for the whole-body controller of MOCA is used during the demonstrations, with a stiffness of
The same experimental setup of the demonstrations is used for the autonomous repetition of the task. Two conditions are tested for each stiffness setting (LS, HS, and OS). In the first one, i.e., the autonomous task repetition without external disturbances, the robot repeats the demonstration in the same environment displayed during the demonstration. While in the second one, i.e., the autonomous task repetition with external disturbances, the environment is perturbed during the execution of the task. As previously mentioned, the cleaning task consists of 6 cleaning movements that go from the top to the bottom of the table, alternated with five free motions to return to the top of the table before starting the next cleaning movement. The following perturbations are applied during the experiment (see also Fig. 4 (bottom)):
Single slow table lifting and lowering during the
cleaning movement.1\text{st} Single fast table lifting and lowering during the
cleaning movement.2\text{nd} Repeated low frequency and high amplitude table lifting and lowering during the
cleaning movement.3\text{rd} Repeated high frequency and low amplitude table lifting and lowering during the
cleaning movement.4\text{th} Collision with a human during the
cleaning motion.5\text{th} Collision with a human during the last free motion.
The values of the parameters used during the autonomous repetition experiments are
B. Experimental Results
The results of the autonomous task repetition without external disturbances for LS, HS, and OS are shown in Fig. 5. For all the experiments, force tracking and position tracking performance are reported, while the stiffness values and energy in the tank are shown only for OS. Although LS allows a compliant behavior, it is not able to track the desired force along z, which is necessary for the cleaning performance (Fig. 5(b)). Conversely, for HS (Fig. 5(c)) a too large force is applied along z. This results in the stick and slip phenomenon, as can be viewed from the measured force and position trend along the x axis. On the other hand, OS (Fig. 5(a)) generates the right amount of compliance needed to track the desired force during the interaction with the table while keeping low stiffness in free motion when there are no interactions. The two sources of energy variations in the tank are the dissipated energy and the exchanged energy due to the variable stiffness (see (15)). The energy in the tank has an increasing trend, and it never falls below the threshold
Results of autonomous task execution experiments in nominal conditions. (a) OS settings, from top to bottom are desired interactive force
The results of the autonomous task repetition with external disturbances for LS, HS, and OS are shown in Fig. 6. In Fig. 6(b), the results of the autonomous task repetition with external disturbances for LS are shown. Although the robot is always compliant to environment uncertainties introduced by external disturbances, it cannot perform the task properly since insufficient force is exerted on the table. On the contrary, for HS (Fig. 6(c1)), as soon as the table is lifted, the interaction force exceeds the robot payload limit. The same situation occurs when a human collides along x as it can be viewed in Fig. 6(c2) on the right. Finally, the results for OS are reported in Fig. 6(a). For all the types of disturbances applied along z, the robot can optimize the stiffness to track the desired force. The high-frequency disturbance makes the force tracking more challenging, but the performance is still acceptable. Then, the stiffness is kept low along x, resulting in a compliant behavior when the human collides with the robot along that direction while interacting with the table and during free motion. In addition, the energy in the tank never falls below the threshold, and it has an overall increasing trend. At the same time, it is evident how the external disturbances affected the additional energy injected (e.g., during table lifting) and extracted (e.g., during table lowering) from the tank.
Results of the autonomous task repetition with external disturbances. (a) OS. (b) LS. (c) HS. The plotted variables are the same of Fig. 5. Plot (c1) shows only disturbance type 1) since the robot stopped due to violation of safety limits. Plot (c2) shows disturbances type 5) and 6), where the robot again stops due to violation of safety limits. The characterizing parameters of the first four perturbations for OS and LS respectively are (A: Amplitude; V: velocity; f: frequency): 1) A = (0.1246, 0.1230) m, V = (0.0417,0.0464) m/s; 2) A = (0.1163, 0.1443) m, V = (0.1256,0.1163) m/s; 3) A = (0.0758, 0.0986) m, f = (0.8645,0.6755) Hz; and 4) A = (0.0435, 0.0428) m, f = (2.0024,2.2422) Hz.
Discussion and Conclusion
The results described in Section IV-B show that overall OS outperforms LS and HS. Although LS keeps high compliance and robustness to external perturbations and uncertainties, it cannot achieve satisfactory cleaning performance. Instead, HS failed in performance, safety, and robustness since it renders a rigid behavior at the end-effector, resulting in high interaction forces with the environment since it rejects the external disturbances during motion. On the other hand, OS is capable of exerting the right amount of force for the cleaning task and, at the same time keeping as much compliance as possible when no interaction is required.
The framework developed in this letter allowed to easily teach a mobile manipulator to perform an interactive task. The target tasks consist of complex interaction with the environment and unconstrained motions in a larger workspace w.r.t. the reachable workspace of a fixed-base robotic arm. Moreover, the robot can robustly apply the learned interaction force while maintaining free-motion compliance. If, instead, the kinesthetic demonstrations were used to learn only desired end-effector desired position and velocity trajectories and a constant stiffness was used as it is done for HS and LS, a trade-off should have been found between ensuring compliance and high task-related performance. Even if achieving an acceptable trade-off was feasible, it would not have been possible to obtain the robustness to external perturbations that our framework accomplishes. In addition, our approach ensures the passivity of the system through an energy tank-based constraint and allows to include payload limits and stiffness bounds as constraints in the optimization problem.
Future work will include the learning procedure of all the relevant parameters of the Cartesian impedance whole-body controller, such as the motion modes and secondary tasks through an admittance-type physical interface. In this work, the task considered allowed to choose a priori some suitable controller parameters that would have ensured the successful loco-manipulation behavior of the robotic platform during the whole task execution, but in more complex environments, this is not always the case. Furthermore, the self-tuning of the weighting matrices of the QP formulation (