I. Introduction
Face recognition (FR) systems are integrated into our everyday lives and are used by a large number of users worldwide. Such users are diverse in terms of genders, ethnicities, and age groups, posing particular challenges for the technology, which should guarantee the same usability and security regardless of the demographic and non-demographic attributes of an individual user. However, recent works show that FR systems are biased towards demographic attributes (e.g. gender, ethnicity, age, …) [1], [9], [27] and non-demographic attributes (e.g. facial hair style, illumination, headwear, …) [54], [61], [10], [23]. This leads to recognition performance disparities depending on these attributes. This performance variation motivated a variety of works investigating possible sources [8], [62], [10], [3], [7], [6], measuring approaches [1], [14], ways to visualize [20] and investigate the bias problem further by e.g. asking experts [44].