Loading [a11y]/accessibility-menu.js
Understanding and Characterizing Communication Characteristics for Distributed Transformer Models | IEEE Journals & Magazine | IEEE Xplore

Understanding and Characterizing Communication Characteristics for Distributed Transformer Models


Abstract:

The transformer architecture has revolutionized many applications such as large language models. This progress has been largely enabled by distributed training, yet commu...Show More

Abstract:

The transformer architecture has revolutionized many applications such as large language models. This progress has been largely enabled by distributed training, yet communication remains a significant bottleneck. This paper examines the communication behavior of transformer models, focusing on how different parallelism schemes in multi-node/multi-GPU training communicate data. We use GPT-based language models as a case study due to their prevalence. We validate our empirical results using analytical models. Our analysis reveals practical insights and potential areas for further optimization in framework and HPC middleware design.
Published in: IEEE Micro ( Early Access )
Page(s): 1 - 7
Date of Publication: 22 January 2025

ISSN Information:


Contact IEEE to Subscribe