I. INTRODUCTION
Robot audition [1] has been studied for more than twenty years. Sound source localization and separation are essential functions in robot audition to communicate with people and understand the surrounding auditory scene in the real world, since robots have to listen to several sound sources at the same time even in a noisy environment. Many methods for sound source localization and separation have been reported. They are mainly classified into four groups such as fixed beamforming, adaptive beamforming, blind separation, and deep learning-based methods.