I. Introduction
Ever since the onset of pioneering research into robot learning methods of learning by demonstration have attracted much attention. Robot learning can facilitate applications in industry, manufacturing area, healthcare, etc., because it directly clones motor skills by extracting task-relevant information that can be transferred to the robot [1]–[3]. Generally, traditional imitation methods use supervised learning to obtain the regression parameters by modeling dynamic motion primitives (DMPs) [4] and Gaussian mixture models (GMMs) [5]. However, a major drawback of these methods is that they are not so adaptable and highly dependent on large amounts of data [2]–[6]. Therefore, they tend to be restricted to real-world robotic applications. In a real-world scenario, it is significant to design a more efficient learning policy based on the finite expert data.