I. Introduction
In the field of control and robotics, one of the primary objectives of artificial intelligence is to create fully autonomous agents capable of interacting with their environment, learning optimal behaviors, and continually improving through trial and error over time. This challenge spans a wide spectrum, ranging from robots with sensory inputs doing a task to software-based agents engaging in natural language and multimedia interactions [1].