1. INTRODUCTION
Acoustic echo cancellation (AEC) plays an essential role in smart speaker and speech communication systems and has received significant attention for several decades [1], [2]. Conventionally, acoustic echo is removed by identifying an acoustic impulse response between loudspeaker and microphone using adaptive algorithms such as normalized least mean square (NLMS) [3]. However, the presence of double-talk and background noise, especially non-stationary noise, disrupts the convergence of these algorithms. Moreover, most of the traditional AEC algorithms are fundamentally linear systems, which cannot deal with nonlinear distortions introduced to the recorded echo due to the poor quality of devices such as amplifiers and loudspeakers [4], [5].