1 Introduction
Motivation. Mainstream servers employ multiple features in order to achieve high-performance: multicore, multi-level caches, powerful vector units, and complicated microarchitectures. On one hand, it is required to utilize all these features to reach peak performance [1]. On the other hand, it is hard for developers to efficiently utilize all these features. Thus, techniques to identify potential optimizations and guide their tuning are valuable for the software community.