Tree structured analysis on GPU power study | IEEE Conference Publication | IEEE Xplore

Tree structured analysis on GPU power study


Abstract:

Graphics Processing Units (GPUs) have emerged as a promising platform for parallel computation. With a large number of processor cores and abundant memory bandwidth, GPUs...Show More

Abstract:

Graphics Processing Units (GPUs) have emerged as a promising platform for parallel computation. With a large number of processor cores and abundant memory bandwidth, GPUs deliver substantial computation power. While providing high computation performance, a GPU consumes high power and needs sufficient power supplies and cooling systems. It is essential to institute an efficient mechanism for evaluating and understanding the power consumption when running real applications on high-end GPUs. In this paper, we present a high-level GPU power consumption model using sophisticated tree-based random forest methods which correlate and predict the power consumption using a set of performance variables. We demonstrate that this statistical model not only predicts the GPU runtime power consumption more accurately than existing regression based approaches, but more importantly, it provides sufficient insights into understanding the correlation of the GPU power consumption with individual performance metrics. We use a GPU simulator that can collect more runtime performance metrics than hardware counters. We measure the power consumption of a wide-range of CUDA kernels on an experimental system with GTX 280 GPU to collect statistical samples for power analysis. The proposed method is applicable to other GPUs as well.
Date of Conference: 09-12 October 2011
Date Added to IEEE Xplore: 17 November 2011
ISBN Information:

ISSN Information:

Conference Location: Amherst, MA, USA
No metrics found for this document.

1. Introduction

Due to excessive power consumptions, limited instruction level parallelism, and escalating processor-memory walls, the computer industry has moved away from building expensive single processor chips with limited performance improvement to multi-core chip for higher chip-level IPCs (Instructions Per Cycle) with an acceptable power budget. Instead of replicating general-purpose CPUs (cores) in a single chip, the recent introduction of Nvidia's GPUs [17] [26] take a different approach by building many-core GPU chip as co-processors to be connected through a PCI-Express bus to the host CPU. The host executes the source program and initiates computation kernels, each with multiple thread blocks to be executed on the GPU. In the GPU chip, multiple streaming processors (or SPs) are grouped into a few streaming multiprocessors (or SMs) as a scheduling unit. Based on the resource requirement, one or more thread blocks can be scheduled on an SM. Each thread block contains one or more 32-thread warps to be executed on multiple SPs in a Single-Instruction-Multiple-Threads (SIMT) fashion for achieving high floating-point operations per second.

Usage
Select a Year
2025

View as

Total usage sinceNov 2011:475
012345JanFebMarAprMayJunJulAugSepOctNovDec144000000000
Year Total:9
Data is updated monthly. Usage includes PDF downloads and HTML views.
Contact IEEE to Subscribe

References

References is not available for this document.