1. Introduction
Robustness to degraded acoustic environments is a critical factor limiting the impact and adoption of speech technologies. Numerous sources of variations in the audio can degrade or hide the signal of interest and impact the performance of automatic speech processing systems. Be it automatic speech recognition (ASR) [1, 2, 3], speaker identification/diarization [4, 5], or speaker localization [6], most systems exhibit a loss of performance when applied in noisy or reverberant conditions.