I. Introduction
Static analysis of code is one of the most effective ways to avoid defects in software, and, when security is a concern, is essential. Static analysis can find problems that are extremely hard to detect by testing, when the inputs triggering a bug are hard to find. Static analysis is also often more efficient than testing; a bug that takes a fuzzer days to find may be immediately identified. Users of static analysis tools often wonder which of multiple tools available for a language are most effective, and how much tools overlap in their results. Tools often find substantially different bugs, making it important to use multiple tools [32]. However, given the high cost of examing results, if a tool provides only marginal novelty, it may not be worth using, especially if it has a high false-positive rate. Developers of static analysis tools also want to be able to compare their tools to other tools, in order to see what detection patterns or precision/soundness trade-offs they might want to imitate. Unfortunately, comparing static analysis tools in these ways is hard, and would seem to require vast manual effort to inspect findings and determine ground truth on a scale that would provide statistical confidence.