I. Introduction
Environmental Sound Classification (ESC) is a challenging task that implies a correct differentiation between sound classes that occur in our everyday life (e.g., “sneezing”, “airplane”, “jackhammer”, “cat”, “idling engine”, “brushing teeth”, “street music”). Widely used datasets, such as ESC-50 [1] and UrbanSound8K [2], provide a reliable basis to compare a variety of approaches on the ESC-task, which allowed to confirm the advantage of using cross-domain techniques [3].