Conferences >2011 IEEE 29th International ...

Tree structured analysis on GPU power study

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Graphics Processing Units (GPUs) have emerged as a promising platform for parallel computation. With a large number of processor cores and abundant memory bandwidth, GPUs...Show More

Metadata

Abstract:

Graphics Processing Units (GPUs) have emerged as a promising platform for parallel computation. With a large number of processor cores and abundant memory bandwidth, GPUs deliver substantial computation power. While providing high computation performance, a GPU consumes high power and needs sufficient power supplies and cooling systems. It is essential to institute an efficient mechanism for evaluating and understanding the power consumption when running real applications on high-end GPUs. In this paper, we present a high-level GPU power consumption model using sophisticated tree-based random forest methods which correlate and predict the power consumption using a set of performance variables. We demonstrate that this statistical model not only predicts the GPU runtime power consumption more accurately than existing regression based approaches, but more importantly, it provides sufficient insights into understanding the correlation of the GPU power consumption with individual performance metrics. We use a GPU simulator that can collect more runtime performance metrics than hardware counters. We measure the power consumption of a wide-range of CUDA kernels on an experimental system with GTX 280 GPU to collect statistical samples for power analysis. The proposed method is applicable to other GPUs as well.

Published in: 2011 IEEE 29th International Conference on Computer Design (ICCD)

Date of Conference: 09-12 October 2011

Date Added to IEEE Xplore: 17 November 2011

ISBN Information:

ISSN Information:

DOI: 10.1109/ICCD.2011.6081376

Conference Location: Amherst, MA, USA

Contents

1. Introduction

Due to excessive power consumptions, limited instruction level parallelism, and escalating processor-memory walls, the computer industry has moved away from building expensive single processor chips with limited performance improvement to multi-core chip for higher chip-level IPCs (Instructions Per Cycle) with an acceptable power budget. Instead of replicating general-purpose CPUs (cores) in a single chip, the recent introduction of Nvidia's GPUs [17] [26] take a different approach by building many-core GPU chip as co-processors to be connected through a PCI-Express bus to the host CPU. The host executes the source program and initiates computation kernels, each with multiple thread blocks to be executed on the GPU. In the GPU chip, multiple streaming processors (or SPs) are grouped into a few streaming multiprocessors (or SMs) as a scheduling unit. Based on the resource requirement, one or more thread blocks can be scheduled on an SM. Each thread block contains one or more 32-thread warps to be executed on multiple SPs in a Single-Instruction-Multiple-Threads (SIMT) fashion for achieving high floating-point operations per second.

References is not available for this document.

MIT Libraries

MIT Libraries

Tree structured analysis on GPU power study

Abstract:

Metadata

Abstract:

ISSN Information:

1. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

MIT Libraries

MIT Libraries

Tree structured analysis on GPU power study

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

1. Introduction

References