Loading [MathJax]/extensions/MathZoom.js
Get Out of the Valley: Power-Efficient Address Mapping for GPUs | IEEE Conference Publication | IEEE Xplore

Get Out of the Valley: Power-Efficient Address Mapping for GPUs


Abstract:

GPU memory systems adopt a multi-dimensional hardware structure to provide the bandwidth necessary to support 100s to 1000s of concurrent threads. On the software side, G...Show More

Abstract:

GPU memory systems adopt a multi-dimensional hardware structure to provide the bandwidth necessary to support 100s to 1000s of concurrent threads. On the software side, GPU-compute workloads also use multi-dimensional structures to organize the threads. We observe that these structures can combine unfavorably and create significant resource imbalance in the memory subsystem - causing low performance and poor power-efficiency. The key issue is that it is highly application-dependent which memory address bits exhibit high variability. To solve this problem, we first provide an entropy analysis approach tailored for the highly concurrent memory request behavior in GPU-compute workloads. Our window-based entropy metric captures the information content of each address bit of the memory requests that are likely to co-exist in the memory system at runtime. Using this metric, we find that GPU-compute workloads exhibit entropy valleys distributed throughout the lower order address bits. This indicates that efficient GPU-address mapping schemes need to harvest entropy from broad address-bit ranges and concentrate the entropy into the bits used for channel and bank selection in the memory subsystem. This insight leads us to propose the Page Address Entropy (PAE) mapping scheme which concentrates the entropy of the row, channel and bank bits of the input address into the bank and channel bits of the output address. PAE maps straightforwardly to hardware and can be implemented with a tree of XOR-gates. PAE improves performance by 1.31X and power-efficiency by 1.25X compared to state-of-the-art permutation-based address mapping.
Date of Conference: 01-06 June 2018
Date Added to IEEE Xplore: 23 July 2018
ISBN Information:
Electronic ISSN: 2575-713X
Conference Location: Los Angeles, CA, USA

I. Introduction

GPUs need high-bandwidth memory systems to support their massively parallel execution model. Current DRAM solutions such as GDDR5 [1] and 3D-stacked memory [2], [3] deliver high theoretical performance. Unfortunately, it is difficult to reach this potential with contemporary GPU-compute workloads, leading to suboptimal bandwidth utilization, performance and power-efficiency [4]. To maximize bandwidth, DRAM interfaces are organized in a four-dimensional structure of channels, banks, rows and columns. The way the application memory access streams are mapped onto this structure has a significant impact on performance and power consumption. For the row bits, the addresses should change as little as possible to ensure high row buffer locality. For the channel and bank bits, the addresses should be highly variable to ensure uniform distribution of memory requests across channels and banks [5].

Contact IEEE to Subscribe

References

References is not available for this document.