I. Introduction
Learning-based methods like deep Reinforcement Learning (RL) allow robots to tackle complex tasks in areas such as object manipulation [1], [2] and locomotion for quadrupedal robots [3], [4] and humanoids [5]. However, RL's high sample complexity and risks of unsafe exploration [6], [7], [8] make it necessary to train policies in simulations and then deploy in the real world. A key challenge is the sim-to-real gap, caused by discrepancies between simulated and real-world dynamics [9], [10], which can lead to catastrophic failures during deployment.