I. Introduction
Designing a benchmark with real-world software is a challenging task [1]. Therefore, existing approaches either insert bugs artificially [2], [3], or use historical bugs from the software repository of a project [4]. Artificial bug injection is often difficult to verify (see [2, p.2]), whilst historical vulnerabilities may represent only a subset of the ground truth.