Journals & Magazines >IEEE Micro >Early Access

Understanding and Characterizing Communication Characteristics for Distributed Transformer Models

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

The transformer architecture has revolutionized many applications such as large language models. This progress has been largely enabled by distributed training, yet commu...Show More

Metadata

Abstract:

The transformer architecture has revolutionized many applications such as large language models. This progress has been largely enabled by distributed training, yet communication remains a significant bottleneck. This paper examines the communication behavior of transformer models, focusing on how different parallelism schemes in multi-node/multi-GPU training communicate data. We use GPT-based language models as a case study due to their prevalence. We validate our empirical results using analytical models. Our analysis reveals practical insights and potential areas for further optimization in framework and HPC middleware design.

Published in: IEEE Micro ( Early Access )

Page(s): 1 - 7

Date of Publication: 22 January 2025

ISSN Information:

DOI: 10.1109/MM.2025.3531323

Understanding and Characterizing Communication Characteristics for Distributed Transformer Models

Abstract:

Metadata

Abstract:

ISSN Information:

IEEE Account

Purchase Details

Profile Information

Need Help?

Understanding and Characterizing Communication Characteristics for Distributed Transformer Models

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

IEEE Account

Purchase Details

Profile Information

Need Help?