Conferences >2022 IEEE 34th International ...

NUMA-Aware Dense Matrix Factorizations and Inversion with Look-Ahead on Multicore Processors

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

We address the efficient design and implementation of dense matrix factorizations and inversion (DMFI) on modern multicore processors with several NUMA (non-uniform memor...Show More

Metadata

Abstract:

We address the efficient design and implementation of dense matrix factorizations and inversion (DMFI) on modern multicore processors with several NUMA (non-uniform memory access) nodes. Our approach enhances the DMFI routines with a look-ahead strategy, in order to overcome the “panel factorization bottleneck”. In addition, it exploits both hybrid task- and loop-level parallelizations while taking into account the NUMA organization of the memory hierarchy. The experiments on a Huawei Kunpeng-based server, with two sockets and 48 cores per socket, for three representative dense linear algebra operations, expose the necessity of adapting both the codes and their execution environment parameters to improve data access locality. The results of these changes deliver performance across inter- and intra-socket NUMA configurations superior to that of reference implementations from state-of-the-art libraries for this platform.

Published in: 2022 IEEE 34th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)

Date of Conference: 02-05 November 2022

Date Added to IEEE Xplore: 20 December 2022

ISBN Information:

ISSN Information:

DOI: 10.1109/SBAC-PAD55451.2022.00020

Conference Location: Bordeaux, France

Funding Agency:

Contents

I. Introduction

The rise of power wall, due to the end of Dennard scaling [1], [2] in the mid , has promoted the investigation on more efficient energy efficiency technologies and, in connection with the users' never ending appetite for higher through-put, the integration of a larger number of cores in processor architectures. As a result, high performance computing (HPC) servers -for example, from AMD, Fujitsu, Huawei, IBM and Intel- currently integrate dozens of cores in a single socket (or chip), and one or more of these sockets. While the memory wall [3], [4] has long posed a major performance bottleneck, the increasing number of cores has added memory contention (i.e., conflicts due to simultaneous accesses to memory from two or more cores) to the problem. Consequently, hardware architects have responded to this new challenge with the design of NUMA (Non-Uniform Memory Access) systems [5], [6]. Unfortunately, the NUMA design principles come at the cost of introducing a supplemental burden on the programmers' shoulders, who now have to reduce remote memory accesses as well as control thread-to-core pinning [7]–[9]. For the particular domain of (dense) linear algebra, developing software for these NUMA architectures presents many similarities with message-passing programming [10] which, to a certain extent, can be addressed via a domain-specific approach that raises the abstraction level, as done in [11]. Concretely, in that work we addressed the difficulties of attaining high performance in the execution of multi-threaded dense matrix factorization and inversion (DMFI) on NUMA architectures. For that purpose, we proposed a methodology that improves performance portability, combines a hybrid task/loop-level parallelization, and exploits locality at the expense of minor code modifications.

References is not available for this document.

NUMA-Aware Dense Matrix Factorizations and Inversion with Look-Ahead on Multicore Processors

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

I. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

NUMA-Aware Dense Matrix Factorizations and Inversion with Look-Ahead on Multicore Processors

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

I. Introduction

References