Hydraulic actuator systems are widely employed in various engineering domains due to their high power-to-weight ratio, reliability, and affordability. In particular, hydraulic excavators are essential in construction, demolition, mining, and forestry where large operating forces are required. However, multiple joints of the excavator must be manipulated simultaneously while maximizing the operation efficiency, making the manipulation a demanding task that must be performed by skilled operators with many years of experience. In this regard, the automation of the excavators has drawn a great interest [1] to reduce human-associated costs (e.g., fatigue, safety, etc.). Central to this automation is a precise motion control, but it still remains a challenging problem in practical settings [2], particularly with complex soil interactions.
In this paper, we propose a novel precision motion control framework of robotized industrial hydraulic excavators based on a data-driven model inversion. The data-driven inversion is challenging for typical data-driven approaches (e.g., methods using a recurrent neural network (RNN), a multilayer perceptron (MLP), etc.), as it is expensive to represent the inverse of time-related behaviors and to train discontinuous relations. The hydraulic excavators, however, are under the effect of input delays and dead-zones, which intensify in the presence of complex hydraulic circuits (e.g., a main control valve (MCV) [3]) commonly found in industrial hydraulic settings. To address these distinct features, we introduce a physics-inspired data-driven model with a modular structure composed of the following neural network modules: 1) an infinite impulse response (IIR) unit, which accommodates the input delays; 2) a piecewise linear (PL) map, which deals with the state-dependent dead-zones; and 3) MLP networks, which capture the remaining nonlinear and coupled dynamics. The environmental impacts (e.g., soil interaction forces) and the hydraulic states (e.g., hydraulic pressures) are taken into account by including the measurements in the network input.
Learning the data-driven model and its inversion online can endanger the excavator and the environments, thus the learning is done offline in a supervised manner using the operational data of the real machine. We then design our control to consist of the following two layers: 1) the data-driven model inversion control, constructed as an inversion of the data-driven model in a modular fashion that significantly enhances the training speed; and 2) the proportional (P) control, implemented on top of the data-driven model inversion control to enhance the robustness. The stability and robustness of the control framework are theoretically established. Even in the presence of intense soil interactions, the proposed control framework accomplishes a remarkable performance (i.e., the path following root-mean-square error (RMSE) less than 2 \;[\rm cm]) for digging and grading operations of a commercial 38-ton class hydraulic excavator Doosan DX380LC.
Model-based methods have been proposed [3]–[6] for the control of hydraulic excavators, but they adopted simplifications in modeling which necessarily compromise the control performance. To avoid the difficulties of deriving accurate mathematical models, data-driven methods were introduced for the hydraulic excavator control [7]–[10]. Reinforcement learning (RL) approaches were presented in [7], [8], where the dynamics were approximated by a single large MLP. However, the large number of trainable parameters, which arises from importing the data history as an input of the MLP to handle the delays, substantially slows down the learning speed (e.g., 57476 parameters leading to 10 hours of training with 0.72 million data for [7] as compared to 9160 parameters and 2 hours with 2.6 million data for our plant model). Further, they were presented only in slow operation speeds to commercialize (e.g., average speed of 10 \;[\rm cm/s] to 20 \;[\rm cm/s] for a 12-ton class excavator), limiting the practical usefulness of the control. On the other hand, an RNN was employed to learn the controller of hydraulic excavators online in [9], [10]. Their performances, however, exhibited rather large tracking errors (e.g., RMSE greater than 1 \;[\rm m]) in digging operations. The works on data-driven methods did not consider soil interactions [7], [8], [10], and the dead-zone compensation was simply defined by constant control input offsets [8], [10], further limiting them from precision motion control.
For a single hydraulic actuator, a data-driven force control was proposed in [11], where the controller was configured as an MLP. The force controller network was fed with a large dimensional history of the actuator position and force, which again put a strain on the training, and the dead-zone compensation was not considered in the control learning. The dead-zone compensation was studied in [12], [13] with a trainable tailored map, but they could not represent the state-dependent nature of the dead-zones. In [14], a soil interaction model of the excavator was suggested without an examination of the hydraulics and various soil properties for the industrial applications. In contrast to these previous results, our proposed framework can address the complex hydraulic excavator dynamics including the input delays and the state-dependent dead-zones without especially increasing the network size due to the modular structure, while fully considering the interaction with the soil. We also believe that our proposed framework would be advantageous for other hydraulically-actuated robotic systems with multiple actuators and complex environmental interactions.
The rest of the paper is organized as follows. Section II describes the autonomous hydraulic excavator adopted for the experimental validation. Section III introduces the modular design for the proposed data-driven model inversion. The entire offline process that derives data-driven model inversion is depicted in Section IV. Experimental results are presented in Section V, and then Section VI concludes the paper.
SECTION II.
System Description
This work employs Doosan DX380LC, an industrial hydraulic excavator, to validate our data-driven control strategy. The excavator is customized using sensors to measure states and environmental impacts, as shown in Fig. 1. Inertial measurement unit (IMU) sensors are attached to the boom, arm, and bucket links to estimate the joint configuration. Swing angle is also measurable, but we only consider the motion within the sagittal plane as visualized in Fig. 2 because the swing action is not involved in the excavation. Apart from the joint configuration, hydraulic pressure sensors are located in pumps and cylinders to consider the hydraulic behavior. The pumps and the cylinders are connected through the MCV as detailed in [3], which consists of spool valves that distribute the pump flow rate to generate the cylinder velocity (i.e., the joint angular rate). The spool positions are controlled by electronic proportional pressure reducing (EPPR) valves commanded by the joystick signal. The joystick signal also affects the pump control provided by the manufacturer, which implies that the joystick signal must be regarded as the control input of our industrial excavator. Meanwhile, the soil interactions are evaluated using the momentum-based wrench estimator [15] since we cannot attach a force/torque sensor to the bucket joint due to reliability and cost concerns. A LiDAR sensor scans the point cloud data (PCD) of the terrain for the reference trajectory planning. The communication is via controller area network (CAN), where the sensing and control frequency is set to 100 \;[\rm Hz].
SECTION III.
Designing Data-Driven Model Inversion
This section introduces the concept of the data-driven model inversion illustrated in Fig. 3. First, we propose a data-driven, physics-inspired, and easy-to-control model with a modular structure that provides an approximate of the excavator dynamics. The data-driven model can cope with the distinct features of the excavator dynamics, including the input delays, the state-dependent dead-zones, and the soil interactions. Then, the inversion control of the data-driven model is configured to compensate for the excavator dynamics.
A. Excavator Plant Model
Assuming that the time-related behavior (i.e., the spool dynamics and the hydraulic delays) can be approximated by a linear time-invariant (LTI) system, the resulting network, namely the excavator plant model shown in the right-hand side of Fig. 3, predicts the joint angular rate by
\begin{align*}
\eta _{f,t} &= f_{\Gamma _t} (u_t) \tag{1}
\\
\mathcal Z \lbrace \eta _{h,t}\rbrace &= P (z) \mathcal Z \lbrace \eta _{f,t}\rbrace \tag{2}
\\
\hat{\omega }_t &= h_{\Gamma _t} (\eta _{h,t}) \tag{3}
\end{align*}
View Source
\begin{align*}
\eta _{f,t} &= f_{\Gamma _t} (u_t) \tag{1}
\\
\mathcal Z \lbrace \eta _{h,t}\rbrace &= P (z) \mathcal Z \lbrace \eta _{f,t}\rbrace \tag{2}
\\
\hat{\omega }_t &= h_{\Gamma _t} (\eta _{h,t}) \tag{3}
\end{align*}
where t \in \mathbb{Z} is the time step identified by the subscript of a time signal \star _t := \star (t), \mathcal Z \lbrace \star _t\rbrace := \sum _{t = 0}^\infty \star _t / z^t is the z-transform, and P (z) is the delaying system, a stable z-domain n_h \times n_f LTI transfer function matrix which captures the multiple and different delays of the hydraulic excavator. The nonlinear nature of the hydraulic circuit is accommodated in pre-delay map f_{\Gamma _t} : [-1, 1]^3 \to \mathbb {R}^{n_f} and post-delay map h_{\Gamma _t} : \mathbb {R}^{n_h} \to \mathbb {R}^3 with a simplified expression of a \Gamma _t-dependent map \star _{\Gamma _t} (\cdot) := \star (\Gamma _t,\cdot). There are two intermediate variables, the pre-delay state \eta _{f,t} \in \mathbb {R}^{n_f} and the post-delay state \eta _{h,t} \in \mathbb {R}^{n_h}, to integrate the LTI system and the nonlinear maps. The control input (i.e., joystick signal) is denoted by u_t \in [-1, 1]^3, the joint angular rate and its prediction are denoted by \omega _t, \hat{\omega }_t \in \mathbb {R}^3, and the excavator state is denoted by
\begin{equation*}
\Gamma _t := (\theta _t, P_t^{\rm cyl}, P_t^{\rm pump}, F_t^{\rm ext}) \in \mathbb {R}^{13}
\end{equation*}
View Source
\begin{equation*}
\Gamma _t := (\theta _t, P_t^{\rm cyl}, P_t^{\rm pump}, F_t^{\rm ext}) \in \mathbb {R}^{13}
\end{equation*}
where \theta _t := (\theta _t^{\rm boom}, \theta _t^{\rm arm}, \theta _t^{\rm bucket}) \in \mathbb {R}^3 is the joint angle, P_t^{\rm cyl} \in \mathbb {R}^6 is the pressure of head- and rod-side chambers of the cylinders, P_t^{\rm pump} \in \mathbb {R}^2 is the pressure of two pumps that supply the hydraulic fluid, and F_t^{\rm ext} \in \mathbb {R}^2 is the horizontal and vertical external force acting on the bucket tip which captures the soil interactions. Refer to Section IV-A for neural network module architectures and offline learning methods for the proposed excavator plant model. We would like to comment that a state-dependent delaying system is available for (2), but the LTI transfer function matrix P(z) works satisfactorily in our application with and without large soil interactions.
B. Excavator Plant Model Inversion Control
From the command joint angular rate \omega _t^{\rm cmd} \in \mathbb {R}^3, the excavator plant model inversion control shown in the left-hand side of Fig. 3 computes the joystick signal as
\begin{align*}
\zeta _{h,t} &= g_{h,\Gamma _t} (\omega _t^{\rm cmd}) \tag{4}
\\
\mathcal Z \lbrace \zeta _{f,t}\rbrace &= C_P (z) \mathcal Z \lbrace \zeta _{h,t}\rbrace \tag{5}
\\
u_t &= g_{f,\Gamma _t} (\zeta _{f,t}) \tag{6}
\end{align*}
View Source
\begin{align*}
\zeta _{h,t} &= g_{h,\Gamma _t} (\omega _t^{\rm cmd}) \tag{4}
\\
\mathcal Z \lbrace \zeta _{f,t}\rbrace &= C_P (z) \mathcal Z \lbrace \zeta _{h,t}\rbrace \tag{5}
\\
u_t &= g_{f,\Gamma _t} (\zeta _{f,t}) \tag{6}
\end{align*}
where C_P (z) is the delay-tracking system, a stable z-domain n_f \times n_h LTI transfer function, g_{h,\Gamma _t} : \mathbb {R}^3 \to \mathbb {R}^{n_h} is the pre-control map, and g_{f,\Gamma _t} : \mathbb {R}^{n_f} \to [-1, 1]^3 is the post-control map. The pre-control state \zeta _{h,t} \in \mathbb {R}^{n_h} and the post-control state \zeta _{f,t} \in \mathbb {R}^{n_f} are intermediate variables. The reference (e.g., n_r = 0 for the step-reference and n_r = 1 for the ramp-reference) tracking condition of the delay-tracking system C_P (z) is written as
\begin{equation*}
\lim _{z \to 1} (z - 1) \left(P (z) C_P (z) - I_{n_h}\right) \mathcal Z \lbrace t^{n_r}\rbrace = 0_{n_h \times n_h} \tag{7}
\end{equation*}
View Source
\begin{equation*}
\lim _{z \to 1} (z - 1) \left(P (z) C_P (z) - I_{n_h}\right) \mathcal Z \lbrace t^{n_r}\rbrace = 0_{n_h \times n_h} \tag{7}
\end{equation*}
from the final value theorem, where I_a \in \mathbb {R}^{a \times a} is an identity matrix and 0_{a \times b} \in \mathbb {R}^{a \times b} is a zero matrix. Two nonlinear maps g_h, g_f satisfy the pseudo-inverse relation s.t. \star _{\Gamma _t} \circ g_{\star,\Gamma _t} is an identity function on the domain of g_{\star,\Gamma _t} given \Gamma _t. Note that the exact inverse of the pre- and post-delay maps (i.e., g_{\star,\Gamma _t} \circ \star _{\Gamma _t} is also an identity function) may be out of existence because of many-to-one relations such as the dead-zones. The inversion method for each module is illustrated in Section IV-B. The following Proposition 1 provides the properties of our data-driven inversion control (4), (5), and (6).
Proposition 1:
Consider the excavator plant model (1), (2), and (3) under the data-driven inversion control (4), (5), and (6). Assume that a) errors of the model prediction \delta _{\omega,t} := \hat{\omega }_t - \omega _t \in \mathbb {R}^3 and the errors of the pseudo-inverse relations \delta _{h,t} := (h_{\Gamma _t} \circ g_{h,\Gamma _t}) (\omega _t^{\rm cmd}) - \omega _t^{\rm cmd} \in \mathbb {R}^3, \delta _{f,t} := (f_{\Gamma _t} \circ g_{f,\Gamma _t}) (\zeta _{f,t}) - \zeta _{f,t}\in \mathbb {R}^{n_f} are bounded; b) the post-delay map h_{\Gamma _t} is a Lipschitz continuous function; and c) the pre-control map g_{h,\Gamma _t} is a bounded function. Then if the joint angular rate \omega _{t_0} and its command \omega _{t_0}^{\rm ref} at the initial time step t_0 \in \mathbb{Z} are bounded, the difference between the joint angular rate and its command \nu _{\omega,t} := \omega _t - \omega _t^{\rm cmd} \in \mathbb {R}^3 is bounded \forall t \geq t_0.
Proof:
The triangle inequality provides two inequalities s.t.
\begin{align*}
\Vert \nu _{\omega,t}\Vert &\leq \Vert \delta _{\omega,t}\Vert + \Vert \hat{\omega }_t - \omega _t^{\rm cmd}\Vert \\
&\leq \Vert \delta _{\omega,t}\Vert + \Vert \delta _{h,t}\Vert + \Vert h_{\Gamma _t} (\eta _{h,t}) - h_{\Gamma _t} (\zeta _{h,t})\Vert
\end{align*}
View Source
\begin{align*}
\Vert \nu _{\omega,t}\Vert &\leq \Vert \delta _{\omega,t}\Vert + \Vert \hat{\omega }_t - \omega _t^{\rm cmd}\Vert \\
&\leq \Vert \delta _{\omega,t}\Vert + \Vert \delta _{h,t}\Vert + \Vert h_{\Gamma _t} (\eta _{h,t}) - h_{\Gamma _t} (\zeta _{h,t})\Vert
\end{align*}
where \Vert \delta _{\omega,t}\Vert and \Vert \delta _{h,t}\Vert are bounded from the first assumption. For the second inequality, see the definition of the post-delay map (3) and the pseudo-inverse error \delta _{h,t} = h_{\Gamma _t} (\zeta _{h,t}) - \omega _t^{\rm cmd}. From the Lipschitz continuity of h_{\Gamma _t}, there \exists L \in \mathbb {R}_\geq s.t.
\begin{equation*}
\Vert h_{\Gamma _t} (\eta _{h,t}) - h_{\Gamma _t} (\zeta _{h,t})\Vert \leq L \Vert \eta _{h,t} - \zeta _{h,t}\Vert
\end{equation*}
View Source
\begin{equation*}
\Vert h_{\Gamma _t} (\eta _{h,t}) - h_{\Gamma _t} (\zeta _{h,t})\Vert \leq L \Vert \eta _{h,t} - \zeta _{h,t}\Vert
\end{equation*}
where L is referred to as a Lipschitz constant. The delaying system P (z) and its tracking control C_P (z) is rearranged as
\begin{align*}
&\mathcal Z \lbrace \eta _{h,t} -\zeta _{h,t}\rbrace \\
&\quad= \left(P (z) C_P (z) - I_{n_h} \right) \mathcal Z \lbrace \zeta _{h,t}\rbrace + P (z) \mathcal Z \lbrace \delta _{f,t}\rbrace
\end{align*}
View Source
\begin{align*}
&\mathcal Z \lbrace \eta _{h,t} -\zeta _{h,t}\rbrace \\
&\quad= \left(P (z) C_P (z) - I_{n_h} \right) \mathcal Z \lbrace \zeta _{h,t}\rbrace + P (z) \mathcal Z \lbrace \delta _{f,t}\rbrace
\end{align*}
where P (z) C_P (z) - I_{n_h} is a stable linear system satisfying the reference tracking condition (7). From the bounded-input bounded-output (BIBO) property, the error converges as
\begin{multline*}
\Vert \eta _{h,t} - \zeta _{h,t}\Vert \leq \beta \left(\Vert \eta _{h,t_0} - \zeta _{h,t_0}\Vert, t - t_0 \right) \\
+ \gamma _1 \left({\textstyle \sup _{t_0 \leq \tau \leq t}} \Vert \zeta _{h,\tau }\Vert \right) + \gamma _2 \left({\textstyle \sup _{t_0 \leq \tau \leq t}} \Vert \delta _{f,\tau }\Vert \right)
\end{multline*}
View Source
\begin{multline*}
\Vert \eta _{h,t} - \zeta _{h,t}\Vert \leq \beta \left(\Vert \eta _{h,t_0} - \zeta _{h,t_0}\Vert, t - t_0 \right) \\
+ \gamma _1 \left({\textstyle \sup _{t_0 \leq \tau \leq t}} \Vert \zeta _{h,\tau }\Vert \right) + \gamma _2 \left({\textstyle \sup _{t_0 \leq \tau \leq t}} \Vert \delta _{f,\tau }\Vert \right)
\end{multline*}
where \gamma _\star : [0, a) \to [0, \infty) is a class \mathcal K function (i.e., \gamma _\star is strictly increasing with \gamma _\star (0) = 0), and \beta : [0, a) \times [0, \infty) \to [0, \infty) is a class \mathcal {K L} function (i.e., \beta (r,s) for each fixed s belongs to class \mathcal K and \beta (r,s) for each fixed r is decreasing with \lim _{s \to \infty } \beta (r,s) = 0). The pre-control state \zeta _{h,\tau } and the pseudo-inverse error \delta _{f,\tau } are bounded due to the last and the first assumptions, respectively. Bounded properties of the initial conditions lead \Vert \eta _{h,t_0} - \zeta _{h,t_0}\Vert to be bounded, which implies that \nu _{\omega,t} is also bounded.\blacksquare
On top of the excavator plant model inversion control, the joint angle P control added to the feedforward reference angular rate enhances the robustness of the entire framework with the command joint angular rate:
\begin{equation*}
\omega _t^{\rm cmd} := \omega _t^{\rm ref} - K e_{\theta,t} \in \mathbb {R}^3 \tag{8}
\end{equation*}
View Source
\begin{equation*}
\omega _t^{\rm cmd} := \omega _t^{\rm ref} - K e_{\theta,t} \in \mathbb {R}^3 \tag{8}
\end{equation*}
where \omega _t^{\rm ref} \in \mathbb {R}^3 is the reference joint angular rate, \theta _t^{\rm ref} \in \mathbb {R} is the reference joint angle, e_{\theta,t} := \theta _t - \theta _t^{\rm ref} \in \mathbb {R}^3 is the joint angle error, and K \in \mathbb {R}^{3 \times 3} is the P gain which is a positive-definite matrix. Theorem 1 then concludes the entire control framework. Note that the command joint angular rate (8) can be determined independently of the data-driven inversion control. For instance, a velocity field control or a proportional-integral (PI) control can replace the P control in (8).
Theorem 1:
Consider the excavator plant model (1), (2), and (3) under the data-driven inversion control (4), (5), and (6) with the P control (8). Following the assumptions of Proposition 1, the joint angle error e_{\theta,t} is ultimately bounded.
Proof:
Let us first consider the following Lyapunov function:
\begin{equation*}
V := \frac{1}{2} e_{\theta,t}^T e_{\theta,t}
\end{equation*}
View Source
\begin{equation*}
V := \frac{1}{2} e_{\theta,t}^T e_{\theta,t}
\end{equation*}
for the error convergence in continuous-time domain. The time derivative of the Lyapunov function yields
\begin{equation*}
\dot{V} = e_{\theta,t}^T \dot{e}_{\theta,t} = - e_{\theta,t}^T K e_{\theta,t} + e_{\theta,t}^T \nu _{\omega,t}
\end{equation*}
View Source
\begin{equation*}
\dot{V} = e_{\theta,t}^T \dot{e}_{\theta,t} = - e_{\theta,t}^T K e_{\theta,t} + e_{\theta,t}^T \nu _{\omega,t}
\end{equation*}
with \dot{e}_{\theta,t} + K e_{\theta,t} = \nu _{\omega,t} where \dot{e}_{\theta,t} = e_{\omega,t} := \omega _t - \omega _t^{\rm ref} \in \mathbb {R}^3 is the joint angular rate error. From the inequality \dot{V} \leq - \lambda _{\min } (K) \Vert e_{\theta,t}\Vert ^2 + \Vert e_{\theta,t}\Vert \Vert \nu _{\omega,t}\Vert with the minimum eigenvalue operator \lambda _{\min } (\cdot), the joint angle error is ultimately bounded by a closed ball of radius \Vert \nu _{\omega,t}\Vert / \lambda _{\min } (K) where \Vert \nu _{\omega,t}\Vert is bounded by Proposition 1.\blacksquare
In Proposition 1, the first assumption stems from reliable learning performances, and the second assumption is based on the continuous and bounded dynamic behavior of the excavator. The last assumption can be enforced by choosing a bounded output activation, such as a hyperbolic tangent, an arc-tangent, or a logistic function, for the pre-control map g_h. Theorem 1 theoretically establishes the robustness of the entire control system, which is not provided in other data-driven controls of hydraulic excavators (e.g., [7]–[10]).
SECTION IV.
Learning Data-Driven Model Inversion
As schematized in Fig. 3, constructing the data-driven model inversion consists of two steps. The first step is to learn the excavator plant model, made up of the delaying system P (z) and the pre- and post-delay maps f, h; and the second step is to obtain the inversion of each component to constitute the excavator plant model inversion control. The learning steps are detailed in the following Section IV-A and IV-B.
For the offline learning process, we assemble the measurements of Doosan DX380LC to capture complex nonlinear dynamics and soil interactions. The data is collected from autonomous digging/grading operations (with various depths and bucket speeds) and sinusoidal joystick signals (at frequencies 0.25 \;[\rm Hz] to 0.5 \;[\rm Hz] and amplitudes 0.3 to 0.5) near the initial and final configurations of the operations. The reference path of the autonomous operation is obtained by length-scaling the nominal bucket configuration extracted from the pattern of human experts [15]. The reference joint angle is computed by inverse kinematics, and the trajectory is time-scaled by the bang-bang approach on joint angular rate considering the hardware limits (e.g., the workspace of the excavator, a rough range of the joint angular rate, and the maximum excavation volume). For the control during the data collection, we employ the manufacturer-provided control and the proposed control trained with a small amount of data. In this work, the main focus is digging/grading tasks, but we believe that the proposed framework can be easily extended to the entire workspace by collecting sufficient data.
We use the data of 2.6 million time steps at a frequency of 100 \;[\rm Hz] (i.e., 7.2 hours of data) to train the controller. The data is randomly split into training, validation, and test sets at a ratio of 80:15:5. Using the data sets, the offline learning is performed on a computer with an AMD Ryzen 5 3600X 3.8 \;[\rm GHz] CPU, a 16 \;[\rm GB] RAM, and an NVIDIA GeForce GTX 1660 Ti GPU. Note that the proposed controller requires at least 1.2 million time steps (i.e., 3.3 hours) of data to obtain good enough performance under the nominal operating condition. However, we include data as extensively as possible on various operating conditions (e.g., soil properties and weather conditions) to address the diverse circumstances of the machines commercialized by the manufacturer.
A. Learning Excavator Plant Model
The first step, a supervised learning of the excavator plant model, exploits the following loss function:
\begin{equation*}
L^{\rm plant} := \Vert \hat{\omega }_t - \omega _t\Vert ^2
\end{equation*}
View Source
\begin{equation*}
L^{\rm plant} := \Vert \hat{\omega }_t - \omega _t\Vert ^2
\end{equation*}
where P (z), f, and h are all trainable. The preexisting neural network architectures, however, cannot effectively address the unique properties of the excavator dynamics. For this reason, we propose new neural network modules: an IIR unit for the delaying system P (z) and a monotonically non-decreasing PL map for the pre-delay map f.
Infinite Impulse Response Unit
The delaying system P (z) is a multi-input multi-output (MIMO) transfer function configured as a matrix of single-input single-output (SISO) transfer functions. To construct a neural network for the transfer function learning, let us first consider a z-domain SISO LTI transfer function written as
\begin{equation*}
H (z) := \frac{b_0 + b_1 z^{-1} + \cdots + b_{n_b} z^{-n_b}}{a_0 + a_1 z^{-1} + \cdots + a_{n_a} z^{-n_a}}
\end{equation*}
View Source
\begin{equation*}
H (z) := \frac{b_0 + b_1 z^{-1} + \cdots + b_{n_b} z^{-n_b}}{a_0 + a_1 z^{-1} + \cdots + a_{n_a} z^{-n_a}}
\end{equation*}
where b_0, b_1,\ldots, b_{n_b} \in \mathbb {R} and a_0, a_1,\ldots, a_{n_a} \in \mathbb {R} are constant coefficients with a_0 \ne 0. The transfer function is equivalent to a recursive filter in t-domain, described in terms of the difference equation y_t = (\sum _{i = 0}^{n_b} b_i x_{t - i} - \sum _{i = 1}^{n_a} a_i y_{t - i}) / a_0 where x_t \in \mathbb {R} is the input signal and y_t \in \mathbb {R} is the output signal. The difference equation then can be rearranged to
\begin{equation*}
y_t = \sum _{i = 1}^{n_b} \bar{b}_i (x_{t - i} - x_t) - \sum _{i = 1}^{n_a} \bar{a}_i (y_{t - i} - DC_H x_t) + DC_H x_t
\end{equation*}
View Source
\begin{equation*}
y_t = \sum _{i = 1}^{n_b} \bar{b}_i (x_{t - i} - x_t) - \sum _{i = 1}^{n_a} \bar{a}_i (y_{t - i} - DC_H x_t) + DC_H x_t
\end{equation*}
where \bar{b}_i := b_i / a_0 \in \mathbb {R} \ \forall i \in \lbrace 1, 2,\ldots, n_b\rbrace and \bar{a}_i := a_i / a_0 \in \mathbb {R} \ \forall i \in \lbrace 1, 2,\ldots, n_a\rbrace are normalized coefficients and DC_H := H(1) \in \mathbb {R} is the low-frequency (DC) gain. Now the IIR unit can be written as an n_h \times n_f matrix s.t.
\begin{equation*}
P (z) := \begin{bmatrix}P_{j k} (z) \end{bmatrix}_{j \in \lbrace 1, 2,\ldots, n_h\rbrace \text{ and } k \in \lbrace 1, 2,\ldots, n_f\rbrace }
\end{equation*}
View Source
\begin{equation*}
P (z) := \begin{bmatrix}P_{j k} (z) \end{bmatrix}_{j \in \lbrace 1, 2,\ldots, n_h\rbrace \text{ and } k \in \lbrace 1, 2,\ldots, n_f\rbrace }
\end{equation*}
where P_{j k} (z) \ \forall j, k is a SISO transfer function whose orders of the numerator and the denominator are denoted by n_b^{j k}, n_a^{j k} \in \mathbb{Z}_\geq and normalized coefficients are denoted by \bar{b}_i^{j k} \in \mathbb {R} \ \forall i \in \lbrace 1, 2,\ldots, n_b^{j k}\rbrace and \bar{a}_i^{j k} \in \mathbb {R} \ \forall i \in \lbrace 1, 2,\ldots, n_a^{j k}\rbrace. The IIR unit belongs to the recurrent neural network family with the given network size n_h, n_f, n_b^{j k}, n_a^{j k} and trainable variables \bar{b}_i^{j k}, \bar{a}_i^{j k} \ \forall i, j, k. The DC gain can also be trainable, but here, we choose DC_P := P (1) as a n_h \times n_f matrix with ones on the main diagonal and zeros on the off-diagonal so that \operatorname{rank}P (1) = \min (n_h, n_f). Any n_h \times n_f transfer function matrix whose DC gain rank is \min (n_h, n_f) can be transformed into the IIR unit with row and column matrix operations.
Piecewise Linear Map
The post-control map g_f must deal with jump discontinuities or large slopes to compensate the dead-zones. However, the compensation map is not well trainable with a vanilla MLP because (\omega _t, u_t) pairs have one-to-many relations in the dead-zone intervals. For this reason, the pre-delay map f learning is conducted with a monotonically non-decreasing n-segment PL map \operatorname{PL}_{(X, Y)} : [X_0, X_n] \to [Y_0, Y_n] s.t.
\begin{multline*}
\operatorname{PL}_{(X, Y)} (x) \\
:= {\begin{cases}\left((Y_i - Y_{i-1}) x + X_i Y_{i-1} - X_{i-1} Y_i\right) \big / \left(X_i - X_{i-1}\right) \\
& \text{{\kern-170.0pt} if } x \in [X_{i-1}, X_i) \ \forall i \in \lbrace 1, 2,\ldots, n\rbrace \\
Y_n & \text{{\kern-170.0pt} if } x = X_n \end{cases}}
\end{multline*}
View Source
\begin{multline*}
\operatorname{PL}_{(X, Y)} (x) \\
:= {\begin{cases}\left((Y_i - Y_{i-1}) x + X_i Y_{i-1} - X_{i-1} Y_i\right) \big / \left(X_i - X_{i-1}\right) \\
& \text{{\kern-170.0pt} if } x \in [X_{i-1}, X_i) \ \forall i \in \lbrace 1, 2,\ldots, n\rbrace \\
Y_n & \text{{\kern-170.0pt} if } x = X_n \end{cases}}
\end{multline*}
where X_i, Y_i \in \mathbb {R} \ \forall i \in \lbrace 0, 1,\ldots, n\rbrace are breakpoints of the map and X := \lbrace X_i\rbrace _{i = 0}^n, Y := \lbrace Y_i\rbrace _{i = 0}^n are the non-decreasing sequences of the breakpoints. The PL map is continuous at x = X_i with \operatorname{PL}_{(X, Y)} (X_i) = Y_i if X_{i-1} < X_i, while the map can represent the jump discontinuity at x = X_i if X_{i-1} = X_i. To apply distinct PL maps to the boom, arm, and bucket joystick signals, a tuple of multiple PL maps is expressed as (y_1, y_2,\ldots, y_m) := \operatorname{PL}_{(\mathbf X, \mathbf Y)} (x_1, x_2,\ldots, x_m) where X_i^k, Y_i^k \in \mathbb {R} \ \forall i \in \lbrace 0, 1,\ldots, n^k\rbrace are the breakpoints of the n^k-segment k-th PL map \forall k \in \lbrace 1, 2,\ldots, m\rbrace. Lists of the breakpoints are denoted by \mathbf X := \lbrace X^k := \lbrace X_i^k\rbrace _{i = 0}^{n^k}\rbrace _{k = 1}^m and \mathbf Y := \lbrace Y^k := \lbrace Y_i^k\rbrace _{i = 0}^{n^k}\rbrace _{k = 1}^m. Here, we choose the boundary breakpoints as X_0^k = Y_0^k = -1 and X_{n^k}^k = Y_{n^k}^k = 1 \ \forall k, so that \operatorname{PL}_{(\mathbf X, \mathbf Y)} : [-1, 1]^m \to [-1, 1]^m. Note that the pseudo-inverse of the PL map is defined as \operatorname{PL}_{(\mathbf X, \mathbf Y)}^+ = \operatorname{PL}_{(\mathbf Y, \mathbf X)} owing to its monotonicity. The non-decreasing sequences of an interval [a, b] can be trained with the custom activation function \sigma _{[a, b]} : \mathbb {R}^n \to [a, b]^{n+1} written as \sigma _{[a, b]} (c) = d s.t.
\begin{equation*}
d_i := a + (b - a) \frac{\sum _{j = 1}^i \exp (c_j)}{\sum _{j = 1}^n \exp (c_j)} \ \forall i \in \lbrace 1, 2,\ldots, n\rbrace
\end{equation*}
View Source
\begin{equation*}
d_i := a + (b - a) \frac{\sum _{j = 1}^i \exp (c_j)}{\sum _{j = 1}^n \exp (c_j)} \ \forall i \in \lbrace 1, 2,\ldots, n\rbrace
\end{equation*}
where c := (c_1, c_2,\ldots, c_n) \in \mathbb {R}^n is the activation input and d := (a, d_1,\ldots, d_n) \in [a, b]^{n+1} is the partition of the interval with a \leq d_1 \leq d_2 \leq \cdots \leq d_n = b. The activation function can also be extended to a two-dimensional input (e.g., \sigma _{[a, b]} : \mathbb {R}^{n (\times \text{2}m)} \to [a, b]^{(n+1) \times \text{2}m} for m distinct n-segment PL maps) by applying the function to every column of the input.
Employing the proposed neural network modules, the delaying system P (z) is configured as a 3 \times 3 IIR unit whose order is chosen by n_b^{j k} = n_a^{j k} = 3 if j = k and n_b^{j k} = n_a^{j k} = 0 if j \ne k \ \forall j, k (i.e., 3-rd order transfer functions on the diagonal and zeros on the off-diagonal). This is based on the assumption that the pre- and post-delay maps f, h can adjust the input coupling. Fig. 4 shows the characteristics of the delaying system trained with the pre- and post-delay maps. Here, we would like to mention that the trained controller without the IIR unit causes fatal oscillations in the real machine.
The pre-delay map f is replaced with PL maps as f_{\Gamma _t} := \operatorname{PL}_{\Phi _{f,t}} : [-1, 1]^3 \to [-1, 1]^3. Breakpoints of the PL map are assumed to depend on \Gamma _t by an auxiliary network \Phi _{f,t} := \phi _f (\Gamma _t) \in [-1, 1]^{9 \times 6} which consists of a hidden layer of 64 ReLU nodes and an output layer of 8 \times 6 nodes with the custom output activation function \sigma _{[-1, 1]} : \mathbb {R}^{8 \times 6} \to [-1, 1]^{9 \times 6}, for the three distinct 8-segment PL maps. The trained PL map is visualized in Fig. 4. Notice that the PL map enables the dead-zone compensation, shown in Fig. 5 as jumps of joystick signals. The pre-delay map can be further customized with constraints or other designs (e.g., applying a constraint to pass through the origin, training boundary breakpoints, or combining the PL map with an additional MLP), but this is not necessary in our case.
To address the remaining nonlinear properties and couplings, the post-delay map h is expressed as an MLP, with a single hidden layer of 256 nodes with a ReLU activation and a linear output layer. Training the excavator plant model (1), (2), and (3) takes only around 2 hours to converge using the Adam optimizer. Fig. 5 visualizes two examples of the prediction, while the prediction RMSE of the test data set is (0.51, 0.66, 1.16) \;[\rm deg/s] for each boom, arm, and bucket joint angular rate, which is small enough to justify the presented neural network architecture.
B. Learning Excavator Plant Model Inversion Control
The second step configures the modular inversion of the data-driven excavator plant model. The delay-tracking system C_P (z), the inversion of the delaying system P (z), is a 3 \times 3 MIMO LTI transfer function. Although we can train C_P (z) with another IIR unit, we analytically compose the delay-tracking system since obtaining the tracking control for the diagonal transfer function matrix is relatively simple. A stable and exact inverse of the delaying system (i.e., P^{-1} (z)) does not exist because the trained delaying system has unstable zeros characterized by inverse responses as shown in Fig. 4. Thus, the delay-tracking system is constructed to meet the reference tracking condition (7) \forall n_r \in \lbrace 0, 1\rbrace where the poles are empirically optimized to 0.82 with multiplicity 2. We choose the delay-tracking system with the minimum numerator order, which is a proper transfer function matrix. The post-control map g_f, the pseudo-inverse of the pre-delay map f, does not require any offline learning process as the post-control map can be easily computed as g_{f,\Gamma _t} := \operatorname{PL}_{\Phi _{f,t}}^+.
On the other hand, the pre-control map g_h (i.e., the pseudo-inverse of the post-delay map h) cannot be analytically obtained since the post-delay map is an MLP. For the offline learning of the pre-control map, a distal learning approach [16] is introduced. The loss function is defined as
\begin{equation*}
L_h^{\rm inv} := \Vert \check{\omega }_t - \omega _t \Vert ^2
\end{equation*}
View Source
\begin{equation*}
L_h^{\rm inv} := \Vert \check{\omega }_t - \omega _t \Vert ^2
\end{equation*}
where \check{\omega }_t := (h_{\Gamma _t} \circ g_{h,\Gamma _t}) (\omega _t) \in \mathbb {R}^3 to realize the pseudo-inverse relation. The MLP network of the pre-control map g_h has a hidden layer of 256 nodes with a ReLU activation and an output layer with a hyperbolic tangent activation. The pre-control map can also be customized with additional PL maps to tolerate large slopes or jump discontinuities. After some trials, however, we found that an MLP is enough for the pre-control map g_h, implying that the dead-zones are all captured in the pre-delay map f while training the plant model. The pre-control map training takes less than 5 minutes, owing to the modular inversion method. Reconstruction of the joystick signal is compared to the recorded signal in Fig. 5.
The proposed control framework is verified with the commercial 38-ton class excavator Doosan DX380LC in digging and grading tasks. The reference trajectories for both operations are generated using the same planning algorithm as described in Section IV. The control frequency is 100 \;[\rm Hz], though the inversion control can be easily implemented with higher frequencies. The P gain is chosen as K = 1.5 I_3 to determine the model inversion input. We implemented PI feedback as well, yet, found that only P control suffices as the control error is already fairly small without the I feedback.
The bucket tip position p_t := (p_{x,t}, p_{z,t}) \in \mathbb {R}^2 is calculated to evaluate the control performance, where p_{x,t}, p_{z,t} \in \mathbb {R} are the horizontal and vertical tip positions as shown in Fig. 2. The path following error is denoted by e_{p,t}^{\rm path} := \min _{t_0 \leq \tau \leq t_f} \Vert p_t - p_\tau ^{\rm ref}\Vert \in \mathbb {R}, which indicates the error of the excavated ground geometry. The trajectory error, or the bucket tip position error, is written as e_{p,t}^{\rm traj} := \Vert p_t - p_t^{\rm ref}\Vert \in \mathbb {R}. Here, we calculate the RMSE from one second after the initial time. This is to assess the performance/precision of our proposed control in the steady-state as typically done in control literature. Note from Figs. 6, 7, and the supplementary video that, at the initial time, we have a non-zero initial error due to our using of the (less accurate) manufacturer-provided PI control until then. See to it also that our proposed control well-behaves (e.g., with no sudden dipping) during this transition. This initial performance can further be improved by turning on our proposed data-driven controller before the initial operation or by re-planning the path using the measured configuration as the initial condition (i.e., ensuring the controller in its steady-state). The control performance is compared to the manufacturer-provided PI control, which determines the joystick signal by a (\omega _t, u_t) pair look-up table with a joint angle PI feedback and an angular rate feedforward. The manufacturer manually fine-tuned the control gain and the look-up table using the air digging data (i.e., data without soil interactions).
Digging: The digging operation is the removal of soil from the current terrain to achieve the target ground shape. Due to the excavation capacity limit, multiple digging operations may be required to reach the final target ground geometry. Fig. 6 visualizes bucket trajectories, soil interactions, and error distributions of repeated experiments on various excavation depths and volumes. The experimental results of the manufacturer-provided PI control have a path following RMSE of 5.79 \;[\rm cm] and a trajectory RMSE of 25.0 \;[\rm cm]. The PI control results have an RMS reference bucket tip velocity of 66.2 \;[\rm cm/s] and an RMS external force of 4.31 \times 10^4 \;[\rm N]. The proposed control framework outperforms the manufacturer-provided PI control, where a path following RMSE is 1.99 \;[\rm cm] and a trajectory RMSE is 5.21 \;[\rm cm] with an RMS reference bucket tip velocity of 66.3 \;[\rm cm/s] and an RMS external force of 6.41 \times 10^4 \;[\rm N]. The operation speed and the external force are large enough for industrial applications.
Grading: The grading operation is to level the ground surface after the digging operations, where its experimental results are shown in Fig. 7. The manufacturer-provided PI control has a path following RMSE of 5.28 \;[\rm cm], trajectory RMSE of 15.1 \;[\rm cm], an RMS reference bucket tip velocity of 87.2 \;[\rm cm/s], and an RMS external force of 2.69 \times 10^4 \;[\rm N]. The excavator plant model inversion control with the P control attains a path following RMSE of 1.83 \;[\rm cm] and a trajectory RMSE of 3.17 \;[\rm cm] with an RMS reference bucket tip velocity of 87.3 \;[\rm cm/s] and an RMS external force of 2.23 \times 10^4 \;[\rm N]. The errors are evenly small both with and without intense soil interactions since the inversion captures and compensates for the effect of the external force.
This work presents a precision motion control of robotized industrial hydraulic excavators via data-driven model inversion. Considering distinct features that hinder the learning-based control methods (i.e., input delays and dead-zones), we propose a data-driven model with a physics-inspired modular structure to approximate the excavator dynamics. We then derive the inversion of the plant model in a modular manner which considerably promotes the learning speed. To prevent injuries of the machine and surroundings, the model and its inversion are trained offline in a supervised fashion using the measurements of the Doosan DX380LC, a 38-ton class industrial hydraulic excavator. Our proposed control framework is composed of the data-driven model inversion control, which compensates for the excavator dynamics, and a P control that computes the model inversion input and enhances the robustness. The stability and robustness of the control framework are theoretically proven, and experimental results are presented in comparison with the manufacturer-provided PI control. The proposed control framework significantly outperforms the PI control and shows a precise control performance (i.e., path following RMSE under 2 \;[\rm cm]) even in the presence of intense soil interactions.
Some possible future research directions include: 1) generalization of the excavator plant model using the state-dependent delaying system; 2) incorporation of the expert-emulating planning [15]; 3) implementation of the over-the-air programming to effectively collect the measurements and update the control; 4) rigorous comparison with other control strategies (e.g., [17]); and 5) application of our framework to other systems with delays and dead-zones.