I. Introduction
The study of musculoskeletal models as overactuated systems, where the number of muscles significantly surpasses the degrees of freedom, poses a fascinating research challenge [1], [2]. These models demonstrate exceptional performance, highlighting the complexity and efficiency of biological systems in executing movements. Reinforcement learning, inspired by biological behaviors [3], offers a compelling approach for exploring decision-making processes within such overactuated systems. It holds the potential to enhance our understanding of how these models achieve their remarkable capabilities [4]. Nonetheless, existing reinforcement learning techniques have shown limited success in addressing the decision-making challenges inherent to these complex problems. This gap underscores the need for advanced methodologies that can more effectively capture the intricacies of musculoskeletal systems and improve decision-making performance in overactuated models.