I. Introduction
Today's largest High Performance Computing (HPC) systems consist of thousands of nodes that are capable of concurrently executing up to millions of threads to solve complex problems within a feasible period of time. Significant effort is required to exploit the full performance of these systems. Extracting this performance is essential in different research areas such as climate, environment, physics and energy which can be characterized by complex scientific models.