I. Introduction
In Recent years, machine learning research has been developed and exploited in speech enhancement. In order to solve the tasks in real-world applications, such as hearing aids, machine translation, and robotics, various techniques, including deep learning, reinforcement learning (RL), and transfer learning, have been extensively utilized in the past decade [1]. As the main concept of deep learning, it refers to the hidden layers and neural units of various network models that have been applied in supervised and unsupervised problems. It has significantly improved speech-enhancement performance because of the regression model [2]–[4]. The main target for RL algorithms is to decide a direction or action in different environments, for maximizing the sum of a cumulative reward [5]. Due to this unique principle, the RL has been exploited in computer game design and dialogue management [6].