Introduction
Throwing is a means to increase the capabilities of a manipulator by exploiting dynamics, a form of dynamic extrinsic dexterity [5]. In the case of pick-and-place, throwing enables a robot arm to place objects rapidly into boxes located outside its maximum kinematic range, which not only reduces the total physical space used by the robot, but also maximizes its picking efficiency. Rather than having to transport objects to their destination before executing the next pick, objects are instead immediately “passed to Newton” (see Fig. 1).
TossingBot learns to grasp arbitrary objects from an unstructured bin and to throw them into target boxes located outside its maximum kinematic reach range. The aerial trajectory of different objects are controlled by jointly optimizing grasping policies and throwing release velocities.
However, precisely throwing arbitrary objects in unstructured settings is challenging because it depends on many factors: from prethrow conditions (e.g., initial grasp of the object) to varying object-centric properties (e.g., mass distribution, friction, shape) and dynamics (e.g., aerodynamics). For example, grasping a screwdriver near the tip before throwing it can cause centripetal accelerations to swing it forward with significantly higher release velocities—resulting in drastically different projectile trajectories than if it were grasped closer to its center of mass (CoM) (see Fig. 2). Yet regardless of how it is grasped, its aerial trajectory would differ from that of a thrown ping pong ball, which can significantly decelerate after release due to air resistance. Many of these factors are notoriously difficult to model or measure analytically [22], [25]—hence, prior studies are often confined to assuming homogeneous prethrow conditions (e.g., object fixtured in gripper or manually reset after each throw) with predetermined, homogeneous objects (e.g., balls or darts). Such assumptions rarely hold in real unstructured settings, where a throwing system needs to acquire its own prethrow conditions (via grasping) and adapt its throws to account for varying properties and dynamics of arbitrary objects.
(a) Projectile trajectories of a thrown ping pong ball, (b) screwdriver grasped and thrown by its handle, and (c) same screwdriver grasped and thrown by its shaft. The difference between (a) and (b) is largely due to aerodynamics, while the difference between (b) and (c) is largely due to grasping at different offsets from the object's CoM (near the handle). Our goal is to learn joint grasping and throwing policies that can compensate for these differences to achieve accurate targeted throws.
In this work, we present TossingBot, an end-to-end formulation that uses trial and error to learn how to plan control parameters for grasping and throwing from visual observations. The formulation learns grasping and throwing jointly—discovering grasps that enable accurate throws, while learning throws that compensate for the dynamics of arbitrary objects. There are the following two key aspects to our system.
Joint learning of grasping and throwing policies: With a deep neural network that maps from visual observations (of objects in a bin) to control grasping and throwing parameters: the likelihood of grasping success for a dense pixel-wise sampling of end effector orientations and locations [42], and the throwing release velocities for each sampled grasp. Grasping is directly supervised by the accuracy of throws (grasp success = accurate throw), while throws are directly conditioned on specific grasps (via dense predictions). As a result, the end-to-end policy learns to execute stable grasps that lead to predictable throws, as well as throwing velocities that account for the variations in object-centric properties and dynamics that can be inferred from visual information.
Residual learning of throw release velocities:
on top of velocities\delta predicted by a physics controller based on an ideal ballistic motion. The complete controller uses the superposition of the two predictions to obtain a final throwing release velocity\hat{v} . The physics-based controller uses ballistics to provide consistent estimates ofv=\hat{v}+\delta that generalize well to different landing locations, while the data-driven residuals learn to exploit those grasps, and compensate for object-centric properties and dynamics. Our experiments show that this hybrid data-driven method, residual physics, leads to significantly more accurate throws than baseline alternatives.\hat{v}
The primary contribution of this paper is to provide new perspectives on throwing: in particular—its relationship to grasping, its efficient learning by combining physics with trial and error, and its potential to improve practical real-world picking systems. We provide several experiments and ablation studies in both simulated and real settings to evaluate the key components of our system. We observe that throwing performance strongly correlates with the quality of grasps, and experimental results show that our formulation is capable of learning synergistic grasping and throwing policies for arbitrary objects in real settings. An after-the-fact analysis of what the deep network learns, shows that the deep features internal to TossingBot effectively use visual appearance to cluster objects based on geometric and physical attributes—without any explicit supervision other than the goal to throw with accuracy. This journal paper is a revision of a conference paper appearing in Robotics: Science and Systems (RSS) 2019.
Most importantly, this version includes additional experiments in Section VI-H to visualize emerging visual deep features learned by TossingBot, demonstrating that it is possible to implicitly learn object-level semantics from physical interactions alone. Other additional changes include: more system details (i.e., on training and timing), algorithmic details on inferring throwing primitive parameters
Related Work
A. Analytical Models for Throwing
Many previous systems built for throwing [10], [25], [26], [33], [35] rely on handcrafting or approximating dynamics based on frictional rigid body mechanics, and then optimizing control parameters to execute a throw such that the projectile (typically a ball) lands at a target location. However, as highlighted by Mason and Lynch in [25], accurately modeling throwing dynamics is challenging. It requires knowledge of physical properties that are difficult to estimate (e.g., aerodynamics, inertia, coefficients of restitution, friction, shape, mass distribution, etc.) for both objects and manipulators. As a result, these model-based systems often observe limited throwing accuracy (e.g., 40% success rate in [33]), and have difficulty generalizing to changing dynamics over time (e.g., deteriorating friction on gripper finger contact surfaces from repeated throwing). In our work, we leverage deep learning and self-supervision to compensate for the dynamics that are not explicitly accounted for in contact/ballistic models, and we train our policies online via trial and error so that they can adapt to new situations on the fly (e.g., new object and manipulator dynamics).
B. Learning Models for Throwing
More recently, learning-based systems for robotic throwing [2], [11], [16], [20] have also been proposed, which ignore low-level dynamics and directly optimize for task-level success signals (e.g., did the projectile land on the target?). These methods have demonstrated better accuracy than those which solely rely only on analytical models, but have two primary drawbacks: 1) limited generalization to new object types (beyond balls, blocks, or darts), and 2) limited prethrow conditions (e.g., human operators are required to manually reset objects and manipulators to match a prescribed initial state before every throw), which makes training from trial and error costly. Both drawbacks prevent their use in real unstructured settings.
In contrast to prior work, we make no assumptions on the physical properties of thrown objects, nor do we assume that the objects are at a fixed pose in the gripper before each throw. Instead, we propose an object-agnostic pick-and-throw formulation that jointly learns to acquire its own prethrow conditions (via grasping) while learning throwing control parameters that compensate for varying object properties and dynamics. The system learns through self-supervised trial and error, and resets it own training so that human intervention is kept at a minimum.
C. Learning Residual Models and Policies
Our approach to data-efficient learning, residual physics, falls under a broader category of hybrid controllers [1], [15], [29] that leverage both 1) analytical models to provide initial estimates of control parameters, and 2) learned residuals on top of those estimates to compensate for unknown dynamics [see Fig. 3(d)]. In contrast to prior work on learning residuals on predictions of future states for model-based control [3], [19] or data-augmented models [9], [17], [36], we instead directly learn the residuals on control parameters (i.e., action space) with deep networks. This approach provides a wider range of data-driven corrections that can compensate for noisy observations as well as dynamics that are not explicitly modeled. These benefits are also observed in concurrent work on residual reinforcement learning [18], [34] in block-assembly and object manipulation tasks.
Learning residual models and policies. (a) Analytical solutions that determine action
Method Overview
TossingBot consists of a neural network
Overview. An RGB-D heightmap of the scene is fed into a perception module to compute spatial features
The network
a perception module that accepts visual input
and outputs a spatial feature representationI ; this is shared as input into;\mu a grasping module that predicts
;\phi _g a throwing module that predicts
.\phi _t
A. Perception Module: Learning Visual Representations
We represent the visual input
The edges of the heightmaps are defined with respect to the boundaries of the robot's picking workspace. In our experiments, this area covers a
B. Grasping Module: Learning Parallel-Jaw Grasps
The grasping module consists of a grasping network that predicts the probability of grasping success for a predefined grasping primitive across a dense pixel-wise sampling of end effector locations and orientations in
1) Grasping Primitive
The grasping primitive takes as input parameters
2) Grasping Network
The grasping network is a seven-layer fully convolutional residual network [4], [14], [21] (interleaved with two layers of spatial bilinear
As in [38], [39], [42], and [43], we account for different grasping angles by rotating the input heightmap by 16 orientations (multiples of
C. Throwing Module: Learning Throwing Velocities
The goal of the throwing module is to predict the release position and velocity of a predefined throwing primitive for each possible grasp (over the dense pixel-wise sampling of end effector locations and orientations in
1) Throwing Primitive
The throwing primitive takes as input parameters
2) Planning the Release Position
In most real-world settings, only a handful of release positions are physically accessible by the robot for throwing. So for simplicity in our system, we directly constrain the release position
3) Planning the Release Velocity
Given a target landing location
\begin{align*}
\theta &= \arctan \left(\frac{p_y}{p_x}\right)\\
r_x &= c_d\sin (\theta)\\
r_y &= c_d\cos (\theta) \tag{1}\\
\Vert v\Vert &= \sqrt{\frac{a(p_x^2+p_y^2)}{(r_z-p_z-\sqrt{p_x^2+p_y^2})}} \tag{2}
\end{align*}
These equations are valid for any given target landing location
Learning Residual Physics for Throwing
A key aspect of TossingBot's throwing module is that it learns to predict a residual
1) Physics-Based Controller
The physics-based controller uses the standard equations of linear projectile motion, by assuming a grasp on the CoM of the object, to analytically solve back for the release velocity
We also provide the estimated physics-based release velocity
This physics-based controller has several advantages in that it provides a closed-form solution, generalizes well to new landing locations
2) Residual Physics-Based Controller
To compensate for the shortcomings of the physics-based controller, the throwing module includes a throwing network that predicts a residual
Jointly Learning Grasping and Throwing
Our full network
\begin{equation*}
\mathcal {L}_g=-(y_i\log {q_i}+(1-y_i)\log (1-q_i))
\end{equation*}
\begin{equation*}
\mathcal {L}_t=\left\lbrace \begin{array}{ll}
\frac{1}{2}(\delta _i-\bar{\delta }_i)^2,& \text{for}\;|\delta _i-\bar{\delta }_i| < 1\\
|\delta _i-\bar{\delta }_i| - \frac{1}{2},&\text{otherwise}\end{array}\right.
\end{equation*}
We train our network
1) Network Architecture Details
Our network architecture consists of the following layers for each module:
Perception: C(3,64)-MP-RB(128)-MP-RB(256)-RB(512);
Grasping: RB(256)-RB(128)-UP-RB(64)-UP-C(1,2);
Throwing: RB(256)-RB(128)-UP-RB(64)-UP-C(1,1).
2) Training Via Self-Supervision
We obtain our ground truth training labels
Success after grasping, by checking the distance between gripper fingertips after the grasping primitive.
Success after throwing, by checking the binary signal of whether or not a throw lands in the correct box.
In experiments in Section VI, we train our models by self-supervision with the same procedure:
Evaluation
We execute a series of experiments in simulated and real settings to evaluate the learned grasping and throwing policies. The goal of the experiments are four-fold:
to evaluate the overall accuracy and efficiency of our pick-and-throw system on arbitrary objects,
to test its generalization to new objects and target locations unseen during training,
to investigate how learned grasps can improve the accuracy of subsequent throws, and
to compare our proposed method based on residual physics to other baseline alternatives.
Evaluation Metrics
Evaluation metrics are defined in the following points: 1) grasping success: the percent rate, which an object remains in the gripper after executing the grasping primitive (by measuring distance between fingertips), and 2) throwing success: the percent rate, which a thrown object lands in the intended target box (tracked by an overhead camera).
A. Experimental Setup
We evaluate each policy on its ability to grasp and throw various objects into 12 boxes located outside a UR5 robot arm's maximum reach range (as shown in Fig. 1). Specifically, the task is to pick objects from a cluttered bin and stow them uniformly into the boxes such that all boxes have the same number of objects, regardless of object type. Since boxes are outside the robot's reach range, throwing is necessary to succeed in the task. Each box is 20 cm tall with a
1) Simulation Setup
The simulation environment (shown in Fig. 5) is built using PyBullet [6]. We use eight different objects: four seen during training and four unseen for testing. Training objects are chosen in order of increasing difficulty: 4 cm-diameter ball,
Simulation environment in PyBullet [6]. This snapshot illustrates the aerial motion trajectory of a purple ball being thrown into the target landing box highlighted in green. The top right image depicts the view captured from the simulated RGB-D camera before the ball was grasped and thrown.
Objects used in simulated (top) and real (bottom) experiments, split by seen objects (left), and unseen objects (right). The CoM of each simulated object is indicated with a red sphere (for illustration).
Although simulation provides a consistent and controlled environment for fair ablative analyses, the simulated environment does not model aerodynamics and only approximates frictional interactions. As a result, the performance in simulation does not necessarily reflect performance in the real world. Therefore, we also provide quantitative experiments on a real system.
2) Real-World Setup
We use a UR5 arm with an RG2 gripper to pick and throw a collection of 80+ different toy blocks, fake fruit, decorative items, and office objects (see Fig. 6). For perception data, we capture
B. Baseline Methods
1) Residual-Physics
Denotes our approach described in Section III. Since there are no comparable available algorithms that can learn joint grasping and throwing policies, we compare our approach to three baselines based on variations of the proposed method.
2) Regression
It is a variant of our approach where the throwing network is trained to directly regress the final release velocity
3) Physics-Only
It is also a variant of our approach where the throwing network is removed and completely replaced by velocity predictions made by the physics-based controller. In other words, this variant only learns grasping and uses physics for throwing (without learning a residual).
4) Regression-Pretrained-on-Physics
It is a version of Regression that is pretrained on release velocity predictions
C. Baseline Comparisons
In simulated and real settings, we train our models via trial and error for 15 000 steps, then test each model for 1000 steps and report their average grasping and throwing success rates.
1) Simulation Results
They are reported in Tables I and II. Each column of the table represents a different set of test objects, e.g., “Hammers” is a set of
The throwing results in Table I indicate that learning residuals (Residual-physics) on top of a physics-based controller provides the most accurate throws. Physics-only performs competitively in simulation, where the environment is void of aerodynamics and unstable contact dynamics, but falls short of performance in comparison to Residual-physics—particularly for difficult objects like rods or hammers of which the grasping offsets from CoM can significantly change projectile trajectories. We also observe that regression pretrained on physics (Regression-PoP) always consistently outperforms regression alone. On the other hand, the results in Table II show that the grasping performance remains roughly the same across all methods. All policies experience moderately lower grasping and throwing success rates for unseen testing objects.
Fig. 7 plots the average throwing performance of all baseline methods over training steps on the hardest seen object set: hammers. Throwing performance is measured by throwing success rates over the last
Our method (Residual-physics) outperforms baseline alternatives in terms of throwing success rates in simulation on the Hammers object set.
2) Real-World Results
They are reported in Table III on seen and unseen object sets. The results show that residual-physics continues to provide more accurate throws than the baseline methods. Most notably, in contrast to simulation, physics-only does not perform as competitively to residual-physics in the real world. This is likely because the ballistic model used by physics-only does not account for the unmodeled and uncertain contact- and aero-dynamics in the real world. Residual-physics can compensate for them in one of two ways: either improving the model (learning good residuals), or avoiding regions of the model that are not predictable (avoiding complex grasps). This allows TossingBot to maintain a throwing accuracy above 80% for both seen and unseen objects.
Interestingly, our system also seems to match or even slightly exceed the average performance of an untrained human (i.e., with no training time provided beforehand). To measure human throwing performance, we asked 15 willing participants (average height: 174.0
Surprisingly, the human performance was lower than we expected. The largest contributor to the poor performance was fatigue—the accuracy of throws deteriorates over time, particularly after around the 20th object regardless of picking speed. The second largest contributor to the performance was the physical height of the participant (taller performed better)—this may be due to differences in throwing distance (measured from grasp release to object landing locations, which is smaller for taller participants with longer arms) and the throwing strategies (taller participants more often preferred overhand throws to underhand ones). Other common throwing strategies included the following points:
largely relying tactile feedback to grasp objects in the bin to maintain visual attention on target boxes,
grasping objects with one hand and throwing with the other so that the throwing arm can make more repeatable movements,
grouping objects by weight, then correspondingly changing to different grasping and throwing strategies.
These additional strategies were interesting, but did not seem to correlate with the better performance. Also, most strategies seem designed to overcome human limitations in terms of restricted attention spans, limited viewpoints, limited motor control calibration, or fatigue, which do not hinder robotic systems.
D. Pick-and-Place Efficiency
Throwing enables our system (TossingBot) to achieve picking speeds of 514 MPPH, where 1 pick = successful grasp and accurate throw. Specifically, the system performs 608 grasps per hour (measured over two hours), and achieves 84.7% throwing accuracy, yielding 514 MPPH. In Table IV, we compare against other state-of-the-art picking systems found in the literature: Cartman [27], Dex-Net 2.0 [23], FC-GQ-CNN [31], Dex-Net 4.0 [24], and a variant of TossingBot that places objects into a box 0.8 m away from the bin without throwing. This is not a like-for-like comparison, since throwing is only practical for certain types of objects (e.g., not eggs) and hardware, and placing is only practical for limited distance ranges. Yet, the results suggest that throwing may be useful to improve the overall MPPH in some applications.
In addition to throwing, there are the following three other aspects that enable our system's picking speeds:
fast algorithmic run-time speeds (220 ms for inference),
real-time TSDF fusion [7], [28], [30], [41] of RGB-D data, which enables us to capture and aggregate observed 3-D data of the scene simultaneously as the robot moves around within the field-of-view, and
online training and inference in parallel to robot actions (described in Algorithm 1).
Algorithm 1: System Pipeline.
Initialize robot.
Initialize policy with model
Initialize replay buffer.
while step
while robot.is_grasping do
robot.ExecuteThrow(
while robot.is_throwing do
robot.ExecuteGrasp(
buffer.SaveData(
E. Learning Stable Grasps for Throwing
We next investigate the importance of supervising grasps with the accuracy of throws. To this end, we train two variants of residual-physics: 1) grasping network supervised by accuracy of throws (i.e., grasp success = object landed on target), and 2) grasping network supervised by checking grasp width after grasping primitive (i.e., grasp success = object in gripper). We plot their grasping and throwing success rates over training steps in Fig. 8 on the hammer object set.
Both grasping and throwing success rates of residual-physics policies in simulation improve when grasps are supervised by the accuracy of throws (blue), versus when grasps are supervised by a heuristic that checks gripper width (purple).
The results indicate that throwing performance significantly improves when grasping is supervised by the accuracy of throws. This not only suggests that the grasping policies are capable of learning to execute the subset of grasps that lead to more predictable throws, but also indirectly that throwing accuracy is strongly influenced by the quality of grasps. Interestingly, the results also show that grasping performance slightly increases when supervised by the accuracy of throws.
We also investigate the quality of learned grasps by visualizing 2-D histograms of successful grasps, mapped directly on the hammer object in Fig. 9. To create this visualization from simulation, we record each grasping position by saving the 3-D location (with respect to the hammer) of the middle point between gripper fingertips after each successful grasp. We then project the grasping positions recorded over 15 000 training steps onto a 2-D histogram, where darker regions indicate more grasps. The silhouette of the hammer is outlined in black, with a green dot indicating its CoM. We illustrate the grasp histograms of three policies: Residual-physics with grasping supervised by heuristic that checks grasp width after grasping primitive (left), residual-physics with grasping supervised by accuracy of throws (middle), and physics-only with grasping supervised by accuracy of throws (right).
Projected 2-D histograms of successful grasping positions on hammers in simulation: show that 1) leveraging accuracy of throws as supervision enables the grasping policy to learn a more restricted but stable set of grasps, while 2) learning throwing in general helps to relax this constraint.
The differences between left and middle histograms indicate that leveraging accurate throws as a supervisory signal encourages the grasping policy to learn a more restricted but stable and homogeneous set of grasps: slightly further from the CoM to avoid unintentional collisions between the fingers and rest of the object at the moment of release, but also further from the ends of the handle to avoid less predictable throws. The differences between middle and right histograms show that when using only ballistics for the throwing module (i.e., without learning throwing), the grasping policy seems to further optimize for grasps that are closer to the CoM. This leads to a more restricted set of grasps in contrast to Residual-physics, where the throwing can learn to compensate, respectively.
We also provide similar 2-D grasp histogram visualizations in Fig. 10 for all simulation objects. Across all policies, the histograms visualizing grasps, which lead to successful throws (columns 2, 5, 8) share large overlaps with the grasps that lead to failed throws (red columns 3, 6, 9). This suggests grasping and throwing might have been learned simultaneously, rather than one after the other—likely because the way the robot throws is conditioned on how it grasps in a nontrivial manner.
Additional grasping histograms of all simulation objects. Histograms are generated for successful grasps, grasps that lead to successful throws, and grasps that lead to failed throws—recorded over 15 000 training steps. Darker regions indicate more grasps. The silhouette of each object is outlined in black, with a green dot indicating its CoM.
F. Generalizing to New Target Locations
One of the key benefits from any residual physics approach is that the physics-based part of the controller naturally generalizes to conditions outside the collected data, for example, to new target locations. To explore how well our trained TossingBot policies generalize to new target locations, we displace the locations of the boxes in both the horizontal plane from where they were during training, such that there is no overlap between training and testing locations. For this experiment, we set in simulation 12 training boxes and 12 testing boxes; while in real settings, we set four training and four testing boxes (limited by physical setup). We record each model's throwing performance on seen objects over these new box locations across 1000 testing steps in Table V.
We observe that both in simulated and in real experiments, residual-physics significantly outperforms the regression baseline. The performance margin in this scenario illustrates how Residual-physics leverages the generalization of the ballistic equations to adapt to new target locations.
G. Deep Object Semantics Emerging From Task Training
In this section, we explore the deep features being learned by the neural network
To this end, we place several training objects in the bin (well-isolated from each other for visualization purposes), capture RGB-D images to construct a heightmap
Emerging semantics from interaction. Visualizing pixel-wise deep features
The procedure creates a form of similarity map between pixels. Interestingly, when choosing the pixel from a ping-pong ball, the visualization immediately localizes all other ping-pong balls in the scene—presumably because they share similar deep features. It is also interesting to note that the orange wooden block, despite sharing a similar color, does not get picked up by the query. Similarly, Fig. 11(b) illustrates the feature distances between a query pixel on a pink marker pen to all other pixels of the scene. The visualization immediately localizes all other marker pens, which share similar shape and mass, but do not necessarily share color textures.
These interesting results suggest that the deep network is learning to bias the features (i.e., learning a prior) based on the objects’ shapes more so than their visual textures or color. The network likely learns that geometric cues are more useful for learning grasping and throwing policies—i.e., provides more information related to grasping interactions and projectile behaviors. In addition to shape, one could also argue that the learned deep features reflect the second-order (beyond visual or geometric) physical attributes of objects, which influence their aerial behaviors when thrown. This perspective is also plausible, since the throwing policies are effectively learning to compensate for these physical attributes, respectively. For comparison, these visualizations generated by features from TossingBot are more informative in this setting than those generated using deep features from a 18-layer ResNet pretrained on ImageNet (also shown in Fig. 11).
These emerging features were learned implicitly from scratch without any explicit supervision beyond task-level grasping and throwing. Yet, they seem to be sufficient for enabling the system to distinguish between ping-pong balls and markers. As such, this experiment speaks out to a broader concept related to machine vision: how should robots learn the semantics of the visual world? From the perspective of classic computer vision, semantics are often predefined using human-fabricated image datasets and manually constructed class categories (i.e., this is a “hammer,” and this is a “pen”). However, our experiment suggests that it is possible to implicitly learn such object-level semantics from physical interactions alone (as long as they matter for the task at hand). The more complex these interactions, the higher the resolution of the semantics. Toward more generally intelligent robots—perhaps it is sufficient for them to develop their own notion of semantics through interaction [36], without human guidance.
Discussion and Future Work
This article presents a framework for jointly learning grasping and throwing policies that enable TossingBot to pick-and-throw arbitrary objects from an unstructured bin into boxes located outside its maximum reach range at 500+ MPPH. We show that a key is the use of residual physics, a hybrid controller that leverages deep learning to predict residuals on top of control parameters estimated with physics. The combination enables the data-driven predictions to focus on learning the aspects of dynamics that are difficult to model analytically. Our experiments in both simulation and real settings show that the system: 1) learns to improve grasps for throwing through joint training from trial and error, and 2) performs significantly better with residual physics than comparable alternatives.
The proposed system is a prototype with several limitations that suggest directions for future work. First, it assumes that objects are rigid and robust enough to withstand forces encountered when thrown—further work is required to train networks to predict motions that account for fragile, articulated, or deformable objects. Second, it infers object-centric properties and dynamics only from visual data (an RGB-D image of the bin)—exploring additional sensing modalities such as force-torque or tactile may enable the system to better react to new objects and better adapt its throwing velocities. Third, it is only able to infer the parameters needed to get an object to land in a target location—it would be interesting to explore how to achieve more fine-grained control of the pose (including orientation) of an object in flight, potentially to reach a target landing pose while avoiding or leveraging external obstacles. Finally, we have so far demonstrated the benefits of residual physics only in the context of throwing—investigating how the idea generalizes to other tasks is a promising direction for future research.
ACKNOWLEDGMENT
The authors would like to thank R. Hickman for managerial support, I. Krasin, and S. Welker for technical discussions, B. Hurd, J. Salazar, and S. Snyder for hardware support, C. Richards and J. Freidenfelds for feedback on writing, E. Coumans for advice on PyBullet, L. Graesser for video narration, and R. Hickman for photography. The authors are also grateful for hardware and financial support from Google, Amazon, Intel, NVIDIA, ABB Robotics, and Mathworks.