Introduction
Robots need dexterity for complex manipulation tasks. Consider taking a book from the bookshelf. The robot should consider the occlusion of the bookshelf and other books, even use them, to get the book out. The robot needs to not only use its own fingers dexterously, but also be smart about exploiting its surroundings, as “external” fingers to support the movements of the object.
Planning for dexterity remains challenging. First, planning through contacts, which involves changes in system dynamics and non-smoothness, is particularly difficult [1], especially considering both robot and environment contacts. Second, due to the diverse nature of manipulation, robots need to discover various fine motions, making it hard to simplify problems with predefined primitives. Third, current manipulation planners are tailored for specific tasks. An in-hand manipulation policy [2] cannot solve object reorientation in [3], and neither of them can be directly applied to planar pushing [4]. As real-world manipulation is a mix of various manipulation problems, it is important for a general manipulation planner to cover different tasks.
We propose a hierarchical framework, as shown in Fig. 1, aiming to address the challenges mentioned above. In Level 1, we perform object trajectory and environment contact mode planning. Contact modes of object-environment contacts guide the generation of object motions — as the active exploration of extrinsic dexterity. In Level 2, given an object trajectory, the intrinsic dexterity is planned by optimizing for robot contact sequences on the object surface. In Level 3, more details of the plans are computed and rewards are backpropagated.
Overview of our framework, with an example of picking up a card. The following processes run iteratively. Level 1 plans object trajectories, interleaving searches over contact modes ( nodes) and continuous object poses (
nodes). An object trajectory is passed to Level 2 (
) to plan robot contacts on the object (
nodes). The full trajectory of object motions and robot contacts is passed to Level 3 (
) for evaluation and control optimization. After evaluation, Level 3 passes the reward back to the upper levels (
). The reward is updated for every node in the path (bold nodes). In the example, the robot pulls the card to the edge of the table and then grasps it.
Our contributions are in three aspects: framework, methodology, and experiment. Framework: Our framework adapts with minor adjustments to various tasks. It easily encodes expert knowledge and priors through the MCTS action policies, value estimations, and rewards. This is the first framework that achieves manipulation tasks of such complexity and variety. For future development, it can directly integrate new components like trajectory optimization and learning. Methodology: We demonstrate the efficacy of three novel ideas with potential benefits for contact-rich robotic research: 1) an efficient hierarchical structure that decomposes the search of environment contact modes, object trajectory, and robot contacts; 2) replacing Monte-Carlo rollout with a rapidly exploring random tree (RRT) based method, leveraging its effectiveness in goal-oriented exploration without losing the Monte-Carlo spirit; 3) a novel representation that plans contact changes rather than contacts for each step, which greatly speeds up robot contact planning. Experiments: We instantiate and demonstrate a variety of dexterous manipulation tasks, including pick up a card, book-out-of-bookshelf, peg-out-of-hole, block flipping, muti-robot planar manipulation, and in-hand reorientation. Some tasks were never explored in previous works, including occluded grasp, upward peg-in-hole, and sideway peg-in-hole. As a contribution to the community, we open-sourced our code for these tasks. It is also easy to configure new scenarios and adjust parameters with one setup.yaml file.
Related Work
A. Dexterous Manipulation Planning
In dexterous manipulation, potential contact changes introduce a high dimensional, non-convex, discrete and continuous space to plan through. Contact-implicit trajectory optimization [1] performs local optimization in the complex space, which can be intractable without good initialization, and currently only works for 2D problems with shape simplifications [5], [6]. Alternatively, contact kinematics [7], [8], [9], [10] can be leveraged to generate motions between two rigid bodies [11], within a robot hand [12], dexterous pregrasps [13], under environment contacts [14], [15], [16], [17], and with trajectory optimization [18]. We build upon [16], which incorporates contact modes into an RRT to guide node expansion. We introduce a more optimized search structure, addressing three key issues: no mechanism to optimize for any reward, random sampling for robot contact planning, and an entangled configuration space of the object and the robot that results in many redundant searches. We introduce hierarchical search to decompose the space of object motion and robot contacts [19], [20]. Unlike previous approaches using contact states, which can have the combinatoric explosion for 3D cases, our method leverages contact modes and efficient robot contact search, extending the application from 2D polyhedrons to 3D meshes.
Reinforcement learning (RL) can discover skills like in-hand manipulation [2], dexterous grasping [21], and object reorientation [22]. RL faces the same challenges from high-dimensional complex spaces, leading to sample efficiency problems. Our framework's advantage lies in adapting to new tasks and discovering new skills through contact exploration, without the need for training or reward shaping.
B. Monte-Carlo Tree Search
MCTS is a heuristic search algorithm that uses random sampling for efficient exploration. AlphaGo [23] combines MCTS with deep neural networks, achieving superhuman performance in board games like Go. MCTS has shown potential in contact planning, including gait planning for legged robots [24] and robot contact sequence planning given object trajectories [25]. Our work plans not only robot contacts but also object motions and interactions with environment contacts. MCTS offers benefits like efficient search through vast complex space, continuous improvement through learning and self-exploration, and parallelizability.
Preliminaries
A. MCTS
Level 1 and 2 use the MCTS skeleton in Algorithm 1. A search tree
Grow-tree expands the tree by iteratively running four steps in Fig. 2: selection, expansion, rollout, and backpropagation. We employ the idea in AlphaGo [23] — use value estimation and action probability to prioritize empirically good directions. Each node maintains a value estimation
Four steps run iteratively in grow-tree in MCTS [26]. Selection: start from the root node and select successive nodes. Expansion: Create a new node from unexplored children of the last selected node. Rollout: evaluate the new node by simulating to the end with random sampling. Backpropagation: Update the tree using rollout rewards.
Selection determines search directions by balancing exploration and exploitation. Among the set of feasible actions
\begin{equation*}
U(s,a) = Q(s,a) + \eta p(s,a) \frac{\sqrt{N(s)}}{1 + N(s,a)} \tag{1}
\end{equation*}
In backpropagation, every node on the evaluated path is updated with the reward
\begin{align*}
v(s) =& \frac{N(s) v(s) + r}{N(s) + 1} \tag{2}\\
N(s) =& N(s) + 1 \tag{3}
\end{align*}
B. Contact Modes
Collision detection obtains the contact points between the object and the environment. Contact modes describe the possible evolution of these contact points. In this letter, for
Contact modes offer an efficient way to generate object motions in lower-dimensional manifolds that have zero probability to be generated through random sampling. In addition, each contact mode corresponds to one set of continuous contact dynamics. By choosing among contact modes, discreteness in dynamics is efficiently captured.
Algorithm 1: MCTS Skeleton.
procedure Search
while
grow-tree(
end while
end procedure
Hierarchical Planning Framework
A. Task Description
This letter focuses on one rigid body in a non-movable rigid environment or no environment component.
A planner designed under this framework takes in:
The object: a rigid body
with known center of mass and inertia matrix, and friction coefficients with environment\mathcal {O} and with the manipulator\mu _{\text{env}} . The object geometry, used for collision check and surface point sampling, should be provided as a mesh model or a primitive shape like a cuboid.\mu _{\text{mnp}} Environment with known geometries, as primitive shapes or mesh models.
Robot model: The robot manipulates the object through
predefined fingertip contacts. The collision models, forward and inverse kinematics, and finger contact models are known. We assume the robot makes non-sliding and non-rolling contacts.N_{\text{mnp}} Start specification: object start pose
.\boldsymbol{x}_{\text{start}} \in SE(3) Goal specification: object goal region
.X_{\text{goal}} \subset SE(3)
It outputs an object configuration trajectory
B. Level 1: Planning Environment Contact Modes and Object Trajectories
Level 1 is summarized in Algorithm 2, and visualized in Fig. 3. It plans trajectories of environment contact modes and object configurations. The search process is as follows. In the existing tree, it first performs the selection phase to choose a node to grow a new branch. We interleave the selection over environment contact modes and object configurations. When a node is selected for expansion and rollout, an RRT-based rollout replaces random rollouts to improve exploration efficiency. The RRT rollout grows a new branch for the MCTS and then passes to Level 2. During the process, environment contact modes are generated through a contact mode enumeration algorithm [9]. Object configurations are generated through the projected forward integration in the RRT rollout (details in Section IV-B3).
Level 1 search visualized with a block reorientation example. In the selection phase, the block is selected to follow contact mode 00++, leading to a 90-degree rotation at contact 1 and 2 and the separation of 3 and 4. In the new node, contact mode 0000, indicating contact 1, 2, 5, 6 are maintained, is then selected for the RRT rollout. The RRT generates an object configuration trajectory where the block is first pushed and then rotated 90 on the edge of contact 2 and 5. New nodes from the RRT solution path are added to the MCTS after reward evaluation.
Algorithm 2: Level 1: Search Object Motion.
procedure Grow-tree-level-1(
while resources left do
while
if nodetype
end if
if nodetype
if
path
attach(
back-propagate(
break loop
else
end if
end if
end while
back-propagate(
end while
end procedure
1) Selection — Interleaved Search Over Discrete and Continuous Space
We define Level 1 state as
The selection policy follows (1). Action probabilities
2) Expansion
The expansion phase equals to explore-new being selected for a mode node. It is a variant of the progressive widening technique in MCTS for continuous space [28], where we control the expansion rate with the action probability of explore-new.
3) RRT as Rollout
Contact-rich solutions live sparsely on lower-dimensional manifolds of the search space. It is unlikely to get any useful results from random trajectory rollouts as most traditional MCTS do. We replace the random rollout with an RRT search guided by contact modes (line 17, Algorithm 2), modified based on [16].
The RRT begins with an object pose
One difference to [16] is that we can choose whether to plan robot contacts or not in the RRT. We can turn on the option to relax a contact mode to be feasible if there “exists any feasible robot contact”, while previously the RRT needs to plan and store robot contacts, searching in a higher-dimensional space. This relaxation could improve the planning speed for some tasks as discussed in Section V.
C. Level 2: Planning Robot Contacts
Level 2 initiates in evaluate-reward in Level 1 (line 19 and 27, Algorithm 2). Level 2 takes in the object configuration trajectory, and outputs the best robot contact sequence. Algorithm 3 summarizes the grow-tree process in Level 2 MCTS.
1) State and Action Representation
Each node is associated with a robot contact state
For example, consider 5 available robot fingertips. Grasping an object at timestep 0 with the first and third finger at locations (1, 0, 0) and
Compared to the common practice of planning contacts for every timestep [6], [25], we plan for contact relocations. While the complexity of the search space does not change, empirically in most tasks, this modification significantly reduces the depth of the search tree and speeds up the discovery of a solution (experiments in Section V, Fig. 5).
Manipulation with extrinsic dexterity. From left to right: Scenario 1, pick up a card: pick up a thin card that cannot be directly grasped using two available fingertips. Scenario 2, bookshelf: get a book among other books out of a bookshelf using three available fingertips. Scenario 3, peg-out-of-hole: get a peg out of a tight hole using three robot contacts (narrow gaps prevent direct grasps). Scenario 4, block flipping using two available fingertips.
Planning time (in log scale) with respect to search space size for planning robot contacts for cube sliding. We have
2) Sampling and Pruning for Action Selection
Consider 100 object surface points, 4 fingers, and 10 steps. The action space is
We mix the selection, expansion, and rollout using the same heuristic function. If the selected action is explored before, it is the selection phase. If the selected action is new, it is the expansion and rollout phase.
Algorithm 3: Level 2: Search Robot Contacts.
procedure Grow-tree-level-2(
while resources left do
while
end while
back-propagate(
end while
end procedure
3) Feasibility Check
To prune fruitless search directions, we enforce Level 2 nodes and Level 1 RRT rollout nodes to pass the feasibility check, including:
Kinematic feasibility: whether there exist inverse kinematics solutions for the robot contact points
Collision-free: whether the robot links are collision-free with the environment and the object
Relocation feasibility: whether there exists a plan to relocate from previous robot contacts to new contacts
Force conditions: whether the chosen contact points can fulfill the task dynamics, like quasistatic, quasidynamic, force balance, or force closure conditions.
Other task-specific requirements may also be added.
D. Level 3: Path Evaluation and Control Optimization
Level 3 (line 9, Algorithm 3) computes feasibility, robot controls
If the task mechanics are quasi-static or force-closure, we individually solve for each step
For the control trajectory
Examples and Experiments
A. Implementation
We implemented two task types: manipulation with extrinsic dexterity and in-hand manipulation.1 We use Dart [30] for visualization and Bullet [31] for collision detection. New scenario setup requirements and more implementation details can be found in appendices.
1) Robot Model
We implemented two robot types.
Sphere fingertips: Each fingertip is a sphere with workspace limits as kinematic feasibility checks. Collision is checked with the environment. We use three vertices of an equilateral triangle on the sphere perpendicular to the contact normal to approximate a patch contact.
Dexterous Direct-drive Hand: (DDHand) A DDHand has two fingers. Each fingertip has two degrees of freedom for planar translation and is equipped with a horizontal rod [32], [33]. We provide an analytical inverse kinematics model and use the line contact model as the fingertip contact model.
2) Task Mechanics
quasi-static, quasi-dynamic, and force closure models (details in appendices).
3) Feasibility Checks
include task mechanics check, finger relocation force check (during relocation, it needs to satisfy the task mechanics assuming the object is static), kinematic feasibility check, and collision check.
4) Features and Rewards
We use features including travel distance ratio (total object travel distance divided by the start to goal distance), path size, robot contact change ratio (number of finger contact changes divided by the path length), and grasp centroid distance [34]. Given some feature values as data points, we manually label the reward values, favoring smaller object traveling distance, less number of contact changes, and better grasp measures. Given the labeled data, we fit a logistic function as the reward function.
5) Action Probability
In Level 1, in choosing the next contact mode, the action probability prioritizes the same contact mode as before:
\begin{equation*}
p\bigl (s1 = (x, \mathit{mode}),a \bigr) = {\begin{cases}0.5 \quad \text{if} \quad a = \text{previous mode}\\
\frac{0.5}{\text{number of modes} - 1} \quad \text{else} \quad \end{cases}} \tag{4}
\end{equation*}
In Level 2, in choosing a timestep to relocate and the contact points to relocate to, the action probability is calculated using a weight function
\begin{equation*}
p(s2, a) = \frac{w(s2, a)}{\sum _{a^{\prime } \in \mathcal {A}_{\text{sp}}(s2)} w(s2, a^{\prime })} \tag{5}
\end{equation*}
\begin{equation*}
w(s2, a) = {\begin{cases}0.5 + \frac{0.5}{t_{\text{max}} - t_{c} + 1} \quad \text{if} \quad t_{c} = t_{\text{max}}\\
\frac{0.5}{t_{\text{max}} - t_{c} + 1} \quad \text{else} \quad \end{cases}} \tag{6}
\end{equation*}
6) Value Estimation
We only use value estimation in Level 1. Each node has
7) Search Parameters
We let
B. Manipulation With Extrinsic Dexterity
We evaluate four examples in Fig. 4. Each scenario is implemented with sphere fingertips without workspace limit and quasi-static mechanics. Additional scenarios are demonstrated in the real robot experiments in Section V-D.
Table I shows the planning statistics from 100 runs for each scenario, using a desktop with the Intel Core i9-10900 K 3.70 GHz CPU (also for all other statistics in this letter). As our algorithm is anytime, we let the planner stop after 10 seconds and collect the results. The search process is fast. Similar to CMGMP [16], about 80
1) Ablation of Hierarchical Structure and MCTS
We compare with CMGMP [16], which uses an RRT in searching object motions and robot contacts. For all scenarios, our method has a faster “solution found time”. The hierarchical structure speeds up solution discovery by decoupling the object pose and robot contact space. Our method continuously improves the solution by searching more space, whereas CMGMP can only stop upon initial solution discovery, resulting in our method generating more “nodes in the tree”. Guided by the MCTS rewards, our method also finds solutions with a smaller “travel distance ratio”, less “finger relocation” and “environment contact changes”, and smaller “grasp centroid distance”, while CMGMP cannot optimize for any reward.
2) Efficient Robot Contact Planning
We compare the planning of contact relocations in Level 2 with the common practice that explicitly plans contacts for each timestep [6], [25] (w/o relocation selection). We run the robot contact planning for a straight-line cube sliding trajectory with one allowable robot contact, with varying numbers of total timesteps in the trajectory and object surface points. As shown in Fig. 5, the search space size grows exponentially for both methods. The planning time of the common practice also grows exponentially. Our modification uses drastically less time. Our assumption is that this modification aligns better with the fact that contact relocations are sparse compared to discretization of the entire trajectory.
C. In-Hand Manipulation
In-hand manipulation demonstrates intrinsic dexterity. For inherently safer motions, we require every motion to have force balance or force closure solutions.
1) Different Hand Configurations
Table II shows the statistics for YCB dataset objects [35] with 100 runs of randomized start and goal poses on three workspace configurations. Without any training or tuning, our framework achieves a high planning success rate within seconds. Point sampling on object surfaces ensures consistent performance for complex shapes, enabling planning contacts inside concave objects, like the mug and the power drill in the video.
2) Add Auxiliary References
While our framework is designed to achieve object goal poses, we can incorporate auxiliary references, like goal fingertip locations, by modifying the reward function and the action probability. We define
\begin{equation*}
w(s2, a) = {\begin{cases}0.5 + \frac{0.5}{t_{\text{max}} - t_{c} + 1} p_{r}(d) \quad \text{if} \quad t_{c} = t_{\text{max}}\\
\frac{0.5}{t_{\text{max}} - t_{c} + 1} p_{r}(d) \quad \text{else} \quad \end{cases}} \tag{7}
\end{equation*}
We compare planners with and without additional goal fingertip location for 100 reorientation trials with a hammer and a mug using a 5-finger hand. Each trial has a randomized start pose, goal pose, and reference fingertip locations. As Table III shows, the new planner results in smaller “Final finger distance” to the reference fingertip locations, but more finger relocations are needed. Note that due to potential conflicts from the primary goal of object pose and trade-offs from other reward terms, there is no guarantee to achieve good alignment with the auxiliary goal.
D. Robot Experiments
We test 8 new scenarios on the DDHand (12 kHz bandwidth) and a configurable array of soft delta robots (delta array, 500 Hz control frequency) [36]. We perform open-loop trajectory execution (no object pose or contact estimation). Given a planned fingertip trajectory, we compute the robot joint trajectory using inverse kinematics and execute it with joint position control. The object start position errors are within 1
1) DDhand
The planner enables the DDHand to use intrinsic and extrinsic dexterity, as shown in Fig, 6 and the video. For example, in occluded grasp, the fixed green block and the table prevent a direct grasp. The DDHand uses three steps: pivot the object on the corner; use one finger to hold the object; and move the other finger to the other side to form a grasp. In upward peg-in-hole, the hole prevents the fingers from getting in while grasping the object, but without a grasp, gravity will cause the peg to fall. The DDHand uses one finger to press the peg against the hole — using the wall as an external finger to grasp. The other robot finger then relocates and pushes the peg from the bottom.
Keyframes of DDHand experiments. From top to bottom: cube reorientation, occluded grasp, sideway peg-in-hole, upward peg-in-hole. Object motions, fingertip locations, and fingertip relocations are respectively marked in yellow, red, and blue.
2) Delta Array
As shown in Fig. 7 and the video, due to the small workspace of a delta robot (a cylinder with a 2 cm radius and 6 cm height), many contact changes are required to accomplish the tasks.
Keyframes of the delta array experiments. Object motions are marked with yellow arrows. From top to bottom: 2-finger and 6-finger planar block passing, 6-finger and 5-finger planar reorientation.
Discussion
This letter proposes a hierarchical framework for planning dexterous robotic manipulation. It facilitates efficient searches across complex spaces, the generation of diverse manipulation skills, and adaptability for various scenarios. This method can potentially automate wide-ranging manipulation applications, such as functional grasps, caging, forceful manipulation, and mobile and aerial manipulation. For future development, this framework is compatible with direct integration of trajectory optimization, learning, complex robot contact strategies like sliding and rolling.