1. Introduction
Speech is a basic interface used by humans to communicate with each other and has several advantages in that no additional equipment is required. More importantly, speech is particularly useful for various devices indoors, including smart TVs, where users and devices are typically a certain distance away because speech can travel a relatively long distance. However, the speech signal recorded an indoor environment includes a reverberation in the form of direct sound waves and a delayed version of the reflected sound waves. The reverberation is generated after the propagated sound waves hit the surface of the solid object in the acoustic environment. The reverberation decreases speech intelligibility and quality [1]. The reverberant environment is an interior space that can be regarded as a linear time-invariant system that takes a source speech signal as input and produces a reverberant speech signal as an output. The impulse response of this system is called the Room Impulse Response (RIR) and can be represented using the image method. De-reverberation can be performed via a direct inverse method that multiplies the inverse matrix of RIR by the reverberant signal. On the other hand, it is also necessary to reduce noise as well as de-reverberation. De-noising can be performed using Wiener filter, two-step noise reduction(TSNR) [2], Harmonic Regeneration Noise Reduction(HSNR) and other methods. However, these rule-based methods require not only the RIR characteristics but also the computation every time to reduce the noise per frame.