I. Introduction
With the rapid advancement of artificial intelligence technology, robots have acquired better perception, decision-making, and execution capabilities through advanced algorithms and learning techniques, particularly in computer vision and natural language processing (NLP), greatly enhancing their perceptual abilities [1]. These abilities enable robots to better understand and interpret data from their environment, such as identifying objects, faces, gestures, and voice commands [2]. Such progress not only allows them to respond more flexibly and intelligently to diverse environments and tasks but also enables robots to play a role in various service sectors, such as customer service, caregiving, and home assistance. Consequently, humanoid service robots have demonstrated significant potential in manufacturing, healthcare, and services, improving production efficiency, reducing costs, and creating safer and more convenient work and living environments for people [3].