I. INTRODUCTION
In the realm of robotics, visual perception has traditionally served as a central modality extensively leveraged for acquiring nuanced environmental representations, a role emphasized in a range of studies [1], [2]. However, this approach harbors intrinsic limitations in fully encapsulating the dynamic and intricate state of the surrounding environment [3]. Conversely, tactile sensing excels in delineating fine-grained attributes that are beyond the grasp of visual modalities, effectively capturing the subtleties that evade visual systems.