Loading [MathJax]/extensions/MathZoom.js

Dhabaleswar K. Dk Panda - IEEE Xplore Author Profile

IEEE.org
IEEE Xplore
IEEE SA
IEEE Spectrum
More Sites

- Donate
- Personal Sign In

Institutional Sign In

Institutional Sign In

ADVANCED SEARCH

Author details

Dhabaleswar K. Dk Panda

Also published under: Dhabaleswar K. Panda, D. K. Panda, Dhabaleswar K. D. K. Panda, Dhabaleswar Panda, Dhableswar K. Dk Panda, D. Panda, Dhableswar K. Panda

Publications

399

Citations

5,454

Publications by Year

19932024

Co-Authors:

B. AbaliMustafa AbduljabbaiMustafa AbduljabbarShinyoung AhnHooyoung Ahn

Show All Co-Authors (317)

Dhabaleswar K. Dk Panda

Also published under: Dhabaleswar K. Panda, D. K. Panda, Dhabaleswar K. D. K. Panda, Dhabaleswar Panda, Dhableswar K. Dk Panda, D. Panda, Dhableswar K. Panda

Affiliation

The Ohio State University, Columbus, Ohio

Publication Topics

Message Passing Interface,
High-performance Computing,
Message Size,
High-performance Computing Systems,
Graphics Processing Unit,
Parallelization,
Deep Learning,
Parallel Data,
Communication Patterns,
Point-to-point Communication,
Deep Learning Models,
Deep Neural Network

Biography

Dhabaleswar K. Panda is a professor of computer science and engineering and university distinguished scholar at The Ohio State University, Columbus, OH, 43210-1277, USA. His research interests include parallel computer architecture, high-performance networking, exascale computing, big data, deep learning, programming models, accelerators, high-performance file systems and storage, virtualization, and cloud computing. He is a Fellow of IEEE. Contact him at panda@cse.ohio-state.edu.(Based on document published on 31 January 2023).

Publications

399

Citations

5,454

Publications by Year

19932024

Co-Authors:

B. Abali
Mustafa Abduljabbai
Mustafa Abduljabbar
Shinyoung Ahn
Hooyoung Ahn

Show All Co-Authors (317)

Author's Published Works

Search History

Showing 1-25 of 399 results

Conferences (374)

Journals (18)

Magazines (7)

Sort

Filter Results

Show

Open Access Only

Range
Single Year
Dhabaleswar K. Panda(192)
D.K. Panda(125)
Hari Subramoni(115)
Xiaoyi Lu(58)
Aamir Shafi(41)
Department of Computer Science and Engineering, The Ohio State University(82)
Department of Computer and Information Science, Ohio State Uinversity, Columbus, OH, USA(50)
Department of Computer Science and Engineering, Ohio State Uinversity, USA(34)
Ohio State University, Columbus, OH, US(27)
Department of Computer Science and Engineering, Ohio State University, USA(23)
IEEE Transactions on Parallel and Distributed Systems(16)
IEEE Micro(6)
Proceedings 20th IEEE International Parallel & Distributed Processing Symposium(6)
2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935)(5)
2007 IEEE International Conference on Cluster Computing(5)
IEEE(398)
Prometeus GmbH(1)
Media(4)
Austin, TX, USA(13)
Bengaluru, India(13)
Denver, CO, USA(12)
Chicago, IL, USA(11)
Bangalore, India(10)
Message Passing Interface(173)
Message Size(162)
High-performance Computing(113)
High Performance(77)
Shared Memory(68)

Select All on Page

Sort By

Results

Scaling Large Language Model Training on Frontier with Low-Bandwidth Partitioning

Lang Xu;Quentin Anthony;Jacob Hatef;Aamir Shafi;Hari Subramoni;Dhabaleswar K. DK Panda

2024 IEEE 31st International Conference on High Performance Computing, Data, and Analytics (HiPC)

Year: 2024 | Conference Paper |

HTML

Scaling up Large Language Model(LLM) training involves fitting a tremendous amount of training parameters across a limited number of workers. However, methods like ZeRO-3 that drastically reduce GPU memory pressure often incur heavy communication to ensure global synchronization and consistency. Established efforts such as ZeRO++ use secondary partitions to avoid inter-node communications, given t...Show More

Scaling Large Language Model Training on Frontier with Low-Bandwidth Partitioning

Lang Xu;Quentin Anthony;Jacob Hatef;Aamir Shafi;Hari Subramoni;Dhabaleswar K. DK Panda

2024 IEEE 31st International Conference on High Performance Computing, Data, and Analytics (HiPC)

Year: 2024 | Conference Paper |

HyperSack: Distributed Hyperparameter Optimization for Deep Learning using Resource-Aware Scheduling on Heterogeneous GPU Systems

Nawras Alnaasan;Bharath Ramesh;Jinghan Yao;Aamir Shafi;Hari Subramoni;Dhabaleswar K Panda

2024 IEEE 31st International Conference on High Performance Computing, Data, and Analytics (HiPC)

Year: 2024 | Conference Paper |

HTML

Hyperparameter Optimization (HPO) can unlock the full potential of Deep Learning (DL) models; however, it is considered one of the most compute-intensive tasks in the DL do-main due to multi-dimensional search spaces and complex neural network architectures. A common method for accelerating HPO workloads is parallelizing training jobs on multiple computing devices, such as modern GPUs in High-Perf...Show More

HyperSack: Distributed Hyperparameter Optimization for Deep Learning using Resource-Aware Scheduling on Heterogeneous GPU Systems

Nawras Alnaasan;Bharath Ramesh;Jinghan Yao;Aamir Shafi;Hari Subramoni;Dhabaleswar K Panda

2024 IEEE 31st International Conference on High Performance Computing, Data, and Analytics (HiPC)

Year: 2024 | Conference Paper |

Using BlueField-3 SmartNICs to Offload Vector Operations in Krylov Subspace Methods

Kaushik Kandadi Suresh;Benjamin Michalowicz;Nick Contini;Bharath Ramesh;Mustafa Abduljabbar;Aamir Shafi;Hari Subramoni;Dhabaleswar Panda

2024 IEEE 31st International Conference on High Performance Computing, Data, and Analytics (HiPC)

Year: 2024 | Conference Paper |

HTML

Modern SmartNICs are capable of performing both computation and communication operations. In this context, past works on accelerating HPC/DL applications have manually selected some computational phases for offloading them to the SmartNICs. In this work, we identify Vector Multiply-Adds (VMA), Distributed Dot Products (DDOT), and Sparse Matrix-Vector Multiplication (Matvec) as three fundamental op...Show More

Using BlueField-3 SmartNICs to Offload Vector Operations in Krylov Subspace Methods

Kaushik Kandadi Suresh;Benjamin Michalowicz;Nick Contini;Bharath Ramesh;Mustafa Abduljabbar;Aamir Shafi;Hari Subramoni;Dhabaleswar Panda

2024 IEEE 31st International Conference on High Performance Computing, Data, and Analytics (HiPC)

Year: 2024 | Conference Paper |

Design and Implementation of Kernel-based MPI Reduction Operations for Intel GPU s

Chen-Chun Chen;Goutham Kalikrishna Reddy Kuncham;Hari Subramoni;Dhabaleswar K. Panda

2024 IEEE 31st International Conference on High Performance Computing, Data, and Analytics (HiPC)

Year: 2024 | Conference Paper |

HTML

The demand for computing power in high- performance computing and deep learning applications is steadily increasing, leading to a noticeable inclination toward equipping modern exascale clusters with accelerators. In particular, dis-tributed Deep Learning training necessitates high-performance G PU -aware MPI operations, with reduction operations being widely employed. Unlike data movement-based M...Show More

Design and Implementation of Kernel-based MPI Reduction Operations for Intel GPU s

Chen-Chun Chen;Goutham Kalikrishna Reddy Kuncham;Hari Subramoni;Dhabaleswar K. Panda

2024 IEEE 31st International Conference on High Performance Computing, Data, and Analytics (HiPC)

Year: 2024 | Conference Paper |

Effective and Efficient Offloading Designs for One-Sided Communication to SmartNICs

Ben Michalowicz;Kaushik Kandadi Suresh;Hari Subramoni;Mustafa Abduljabbar;Dhabaleswar K. Panda;Steve Poole

2024 IEEE 31st International Conference on High Performance Computing, Data, and Analytics (HiPC)

Year: 2024 | Conference Paper |

HTML

One-sided communication is one of many approaches to use for data transfer in High-Performance Computing (HPC) applications. One-sided operations require less demand on parallel programming libraries and do not require HPC hardware to issue acknowledgments of successful data transfer. Thanks to its inherently non-blocking nature, one-sided communication is also useful for improving overlap between...Show More

Effective and Efficient Offloading Designs for One-Sided Communication to SmartNICs

Ben Michalowicz;Kaushik Kandadi Suresh;Hari Subramoni;Mustafa Abduljabbar;Dhabaleswar K. Panda;Steve Poole

2024 IEEE 31st International Conference on High Performance Computing, Data, and Analytics (HiPC)

Year: 2024 | Conference Paper |

MPI Allgather Utilizing CXL Shared Memory Pool in Multi-Node Computing Systems

Hooyoung Ahn;Seonyoung Kim;Yoomi Park;Woojong Han;Shinyoung Ahn;Tu Tran;Bharath Ramesh;Hari Subramoni;Dhabaleswar K. Panda

2024 IEEE International Conference on Big Data (BigData)

Year: 2024 | Conference Paper |

HTML

In Artificial Intelligence (AI) and high-performance computing (HPC), growing data and model sizes require distributed processing across multiple nodes due to single-node limitations, increasing inter-node communication. To address these challenges, we propose a novel MPI allgather method leveraging CXL technology, which supports composable architectures and dynamic resource allocation in data cen...Show More

MPI Allgather Utilizing CXL Shared Memory Pool in Multi-Node Computing Systems

Hooyoung Ahn;Seonyoung Kim;Yoomi Park;Woojong Han;Shinyoung Ahn;Tu Tran;Bharath Ramesh;Hari Subramoni;Dhabaleswar K. Panda

2024 IEEE International Conference on Big Data (BigData)

Year: 2024 | Conference Paper |

Accelerating Large Language Model Training with Hybrid GPU-based Compression

Lang Xu;Quentin Anthony;Qinghua Zhou;Nawras Alnaasan;Radha Gulhane;Aamir Shafi;Hari Subramoni;Dhabaleswar K. DK Panda

2024 IEEE 24th International Symposium on Cluster, Cloud and Internet Computing (CCGrid)

Year: 2024 | Conference Paper |

Cited by: Papers (1)

HTML

Data Parallelism (DP), Tensor Parallelism (TP), and Pipeline Parallelism (PP) are the three strategies widely adopted to enable fast and efficient Large Language Model (LLM) training. However, these approaches rely on data-intensive communication routines to collect, aggregate, and re-distribute gradients, activations, and other important model information, which pose significant overhead. Co-desi...Show More

Accelerating Large Language Model Training with Hybrid GPU-based Compression

Lang Xu;Quentin Anthony;Qinghua Zhou;Nawras Alnaasan;Radha Gulhane;Aamir Shafi;Hari Subramoni;Dhabaleswar K. DK Panda

2024 IEEE 24th International Symposium on Cluster, Cloud and Internet Computing (CCGrid)

Year: 2024 | Conference Paper |

Characterizing Communication in Distributed Parameter-Efficient Fine-Tuning for Large Language Models

Nawras Alnaasan;Horng-Ruey Huang;Aamir Shafi;Hari Subramoni;Dhabaleswar K. Panda

2024 IEEE Symposium on High-Performance Interconnects (HOTI)

Year: 2024 | Conference Paper |

HTML

Parameter-efficient Fine-tuning (PEFT) methods have emerged as powerful techniques for adapting pre-trained Large Language Models (LLMs) to specific tasks with reduced computational and memory overhead. However, despite their promising potential, there remains a gap in understanding how these methods perform in distributed computing settings. In this paper, we present a comprehensive characterizat...Show More

Characterizing Communication in Distributed Parameter-Efficient Fine-Tuning for Large Language Models

Nawras Alnaasan;Horng-Ruey Huang;Aamir Shafi;Hari Subramoni;Dhabaleswar K. Panda

2024 IEEE Symposium on High-Performance Interconnects (HOTI)

Year: 2024 | Conference Paper |

Demystifying the Communication Characteristics for Distributed Transformer Models

Quentin Anthony;Benjamin Michalowicz;Jacob Hatef;Lang Xu;Mustafa Abduljabbai;Aamir Shafi;Hari Subramoni;Dhabaleswar K. Panda

2024 IEEE Symposium on High-Performance Interconnects (HOTI)

Year: 2024 | Conference Paper |

HTML

Deep learning (DL) models based on the transformer architecture have revolutionized many DL applications such as large language models (LLMs), vision transformers, audio generation, and time series prediction. Much of this progress has been fueled by distributed training, yet distributed communication remains a substantial bottleneck to training progress. This paper examines the communication beha...Show More

Demystifying the Communication Characteristics for Distributed Transformer Models

Quentin Anthony;Benjamin Michalowicz;Jacob Hatef;Lang Xu;Mustafa Abduljabbai;Aamir Shafi;Hari Subramoni;Dhabaleswar K. Panda

2024 IEEE Symposium on High-Performance Interconnects (HOTI)

Year: 2024 | Conference Paper |

OHIO: Improving RDMA Network Scalability in MPI_Alltoall Through Optimized Hierarchical and Intra/Inter-Node Communication Overlap Design

Tu Tran;Goutham Kalikrishna Reddy Kuncham;Bharath Ramesh;Shulei Xu;Hari Subramoni;Mustafa Abduljabbar;Dhabaleswar K. DK Panda

2024 IEEE Symposium on High-Performance Interconnects (HOTI)

Year: 2024 | Conference Paper |

HTML

The presence of exascale computers has pushed a new boundary in computing capability, which poses performance challenges in parallel programming models on how to exploit such systems efficiently. A dominant programming model for running parallel programs is the Message Passing Interface. Among primitives provided by MPI, Alltoall is a communication-intensive operation, which is utilized by many ap...Show More

OHIO: Improving RDMA Network Scalability in MPI_Alltoall Through Optimized Hierarchical and Intra/Inter-Node Communication Overlap Design

Tu Tran;Goutham Kalikrishna Reddy Kuncham;Bharath Ramesh;Shulei Xu;Hari Subramoni;Mustafa Abduljabbar;Dhabaleswar K. DK Panda

2024 IEEE Symposium on High-Performance Interconnects (HOTI)

Year: 2024 | Conference Paper |

Towards Accelerating k-NN with MPI and Near-Memory Processing

Hooyoung Ahn;Seonyoung Kim;Yoomi Park;Woojong Han;Nick Contini;Bharath Ramesh;Mustafa Abduljabbar;Dhabaleswar K. Panda

2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

Year: 2024 | Conference Paper |

HTML

Message Passing Interface (MPI) is a common parallel programming model in High-Performance Computing fields. Recently, it has been widely used in Artificial Intelligence (AI) applications. However, the performance of those applications is limited by the memory wall problem, which is the performance gap between the processor and memory. To address these problems, we propose a novel computing archit...Show More

Towards Accelerating k-NN with MPI and Near-Memory Processing

Hooyoung Ahn;Seonyoung Kim;Yoomi Park;Woojong Han;Nick Contini;Bharath Ramesh;Mustafa Abduljabbar;Dhabaleswar K. Panda

2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

Year: 2024 | Conference Paper |

PML-MPI: A Pre-Trained ML Framework for Efficient Collective Algorithm Selection in MPI

Mingzhe Han;Goutham Kalikrishna Reddy Kuncham;Ben Michalowicz;Rahul Vaidya;Mustafa Abduljabbar;Aamir Shafi;Hari Subramoni;Dhabaleswar K. DK Panda

2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

Year: 2024 | Conference Paper |

HTML

The Message Passing Interface is the de facto standard in high-performance computing (HPC) for inter-process communication. MPI libraries employ numerous algorithms for each collective communication pattern whose behavior is largely affected by the underlying hardware, communication pattern, message size, and number of processes involved. Choosing the “best” algorithm for every possible scenario i...Show More

PML-MPI: A Pre-Trained ML Framework for Efficient Collective Algorithm Selection in MPI

Mingzhe Han;Goutham Kalikrishna Reddy Kuncham;Ben Michalowicz;Rahul Vaidya;Mustafa Abduljabbar;Aamir Shafi;Hari Subramoni;Dhabaleswar K. DK Panda

2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

Year: 2024 | Conference Paper |

Exploiting Inter-Layer Expert Affinity for Accelerating Mixture-of-Experts Model Inference

Jinghan Yao;Quentin Anthony;Aamir Shafi;Hari Subramoni;Dhabaleswar K. DK Panda

2024 IEEE International Parallel and Distributed Processing Symposium (IPDPS)

Year: 2024 | Conference Paper |

Cited by: Papers (2)

HTML

In the realm of large language models (LLMs) like the Generative Pre-trained Transformer (GPT), the Mixture of Experts (MoE) paradigm has emerged as a powerful technique for enhancing model expressiveness and accuracy. However, the deployment of GPT MoE models for parallel inference on distributed systems presents significant challenges, primarily due to the extensive Alltoall communication requir...Show More

Exploiting Inter-Layer Expert Affinity for Accelerating Mixture-of-Experts Model Inference

Jinghan Yao;Quentin Anthony;Aamir Shafi;Hari Subramoni;Dhabaleswar K. DK Panda

2024 IEEE International Parallel and Distributed Processing Symposium (IPDPS)

Year: 2024 | Conference Paper |

HINT: Designing Cache-Efficient MPI_Alltoall using Hybrid Memory Copy Ordering and Non-Temporal Instructions

Bharath Ramesh;Nick Contini;Nawras Alnaasan;Kaushik Kandadi Suresh;Mustafa Abduljabbar;Aamir Shafi;Hari Subramoni;Dhabaleswar K. D K Panda

2024 IEEE International Parallel and Distributed Processing Symposium (IPDPS)

Year: 2024 | Conference Paper |

HTML

Modern multi/many-core processors in HPC systems have hundreds of cores with deep memory hierarchies. HPC applications run at high core counts often experience contention between processes/threads on shared resources such as caches, leading to degraded performance. This is especially true for dense collective patterns, such as MPI_Alltoall, that have many concurrent memory transactions. The orderi...Show More

HINT: Designing Cache-Efficient MPI_Alltoall using Hybrid Memory Copy Ordering and Non-Temporal Instructions

Bharath Ramesh;Nick Contini;Nawras Alnaasan;Kaushik Kandadi Suresh;Mustafa Abduljabbar;Aamir Shafi;Hari Subramoni;Dhabaleswar K. D K Panda

2024 IEEE International Parallel and Distributed Processing Symposium (IPDPS)

Year: 2024 | Conference Paper |

Accelerating MPI AllReduce Communication with Efficient GPU-Based Compression Schemes on Modern GPU Clusters

Qinghua Zhou;Bharath Ramesh;Aamir Shafi;Mustafa Abduljabbar;Hari Subramoni;Dhabaleswar K. Panda

ISC High Performance 2024 Research Paper Proceedings (39th International Conference)

Year: 2024 | Conference Paper |

HTML

With the increasing scale of High-Performance Computing (HPC) and Deep Learning (DL) applications through GPU adaptation, the seamless communication of data stored on GPUs has become a critical factor in enhancing overall application performance. AllReduce is a communication collective operation that is commonly used in HPC applications and distributed DL training, especially Data Parallelism. Dat...Show More

Accelerating MPI AllReduce Communication with Efficient GPU-Based Compression Schemes on Modern GPU Clusters

Qinghua Zhou;Bharath Ramesh;Aamir Shafi;Mustafa Abduljabbar;Hari Subramoni;Dhabaleswar K. Panda

ISC High Performance 2024 Research Paper Proceedings (39th International Conference)

Year: 2024 | Conference Paper |

Flover: A Temporal Fusion Framework for Efficient Autoregressive Model Parallel Inference

Jinghan Yao;Nawras Alnaasan;Tian Chen;Aamir Shafi;Hari Subramoni;Dhabaleswar K. DK Panda

2023 IEEE 30th International Conference on High Performance Computing, Data, and Analytics (HiPC)

Year: 2023 | Conference Paper |

Cited by: Papers (1)

HTML

Autoregressive models, despite their commendable performance in a myriad of generative tasks, face challenges stemming from their inherently sequential structure. Inference on these models, by design, harnesses a temporal dependency, where the current token's probability distribution is conditioned on preceding tokens. This inherent characteristic severely impedes computational efficiency during i...Show More

Flover: A Temporal Fusion Framework for Efficient Autoregressive Model Parallel Inference

Jinghan Yao;Nawras Alnaasan;Tian Chen;Aamir Shafi;Hari Subramoni;Dhabaleswar K. DK Panda

2023 IEEE 30th International Conference on High Performance Computing, Data, and Analytics (HiPC)

Year: 2023 | Conference Paper |

Optimized All-to-All Connection Establishment for High-Performance MPI Libraries Over InfiniBand

Shulei Xu;Goutham Kalikrishna Reddy Kuncham;Mustafa Abduljabbar;Hari Subramoni;Dhabaleswar K. DK Panda

2023 IEEE 30th International Conference on High Performance Computing, Data, and Analytics (HiPC)

Year: 2023 | Conference Paper |

HTML

In modern multi-/many-core HPC systems, the increasing number of processor cores presents new challenges in managing parallel compute workloads across multiple nodes. One crucial aspect that significantly impacts the startup phase of parallel MPI jobs is the methodology used for connection establishment. In this paper, we investigate the limitations of existing all-to-all connection establishment ...Show More

Optimized All-to-All Connection Establishment for High-Performance MPI Libraries Over InfiniBand

Shulei Xu;Goutham Kalikrishna Reddy Kuncham;Mustafa Abduljabbar;Hari Subramoni;Dhabaleswar K. DK Panda

2023 IEEE 30th International Conference on High Performance Computing, Data, and Analytics (HiPC)

Year: 2023 | Conference Paper |

Performance Characterization of Using Quantization for DNN Inference on Edge Devices

Hyunho Ahn;Tian Chen;Nawras Alnaasan;Aamir Shafi;Mustafa Abduljabbar;Hari Subramoni;Dhabaleswar K. Panda

2023 IEEE 7th International Conference on Fog and Edge Computing (ICFEC)

Year: 2023 | Conference Paper |

Cited by: Papers (4)

HTML

Quantization is a popular technique used in Deep Neural Networks (DNN) inference to reduce the size of models and improve the overall numerical performance by exploiting native hardware. This paper attempts to conduct an elaborate performance characterization of the benefits of using quantization techniques-mainly FP16/INT8 variants with static and dynamic schemes-using the MLPerf Edge Inference b...Show More

Performance Characterization of Using Quantization for DNN Inference on Edge Devices

Hyunho Ahn;Tian Chen;Nawras Alnaasan;Aamir Shafi;Mustafa Abduljabbar;Hari Subramoni;Dhabaleswar K. Panda

2023 IEEE 7th International Conference on Fog and Edge Computing (ICFEC)

Year: 2023 | Conference Paper |

MPI4Spark Meets YARN: Enhancing MPI4Spark through YARN support for HPC

Kinan Al-Attar;Aamir Shafi;Hari Subramoni;Dhabaleswar K. Panda

2023 IEEE International Conference on Big Data (BigData)

Year: 2023 | Conference Paper |

HTML

The MPI4Spark effort was able to reconcile disparities that existed between High-Performance Computing (HPC) environments and Big Data stacks, by adopting an MPI-based solution inside of A pache Spark’s Netty communication layer that was capable of better utilizing high-speed interconnects — such as InfiniBand (IB), Intel Omni-Path (OPA), and HPE Slingshot — across a variety of HPC systems. Apache...Show More

MPI4Spark Meets YARN: Enhancing MPI4Spark through YARN support for HPC

Kinan Al-Attar;Aamir Shafi;Hari Subramoni;Dhabaleswar K. Panda

2023 IEEE International Conference on Big Data (BigData)

Year: 2023 | Conference Paper |

HARVEST: High-Performance Artificial Vision Framework for Expert Labeling using Semi-Supervised Training

Nawras Alnaasan;Matthew Lieber;Aamir Shafi;Hari Subramoni;Scott Shearer;Dhabaleswar K Panda

2023 IEEE International Conference on Big Data (BigData)

Year: 2023 | Conference Paper |

Cited by: Papers (1)

HTML

Supervised Deep Learning (DL) thrives on Big Data; however, it inherits a major limitation—training and testing datasets must be fully annotated to train Deep Neural Networks (DNNs). To mitigate this bottleneck, we propose HARVEST—a distributed computer-vision framework that employs state-of-the-art semi-supervised learning (SSL) algorithms to train accurate DNNs using Distributed Data Parallelism...Show More

HARVEST: High-Performance Artificial Vision Framework for Expert Labeling using Semi-Supervised Training

Nawras Alnaasan;Matthew Lieber;Aamir Shafi;Hari Subramoni;Scott Shearer;Dhabaleswar K Panda

2023 IEEE International Conference on Big Data (BigData)

Year: 2023 | Conference Paper |

Battle of the BlueFields: An In-Depth Comparison of the BlueField-2 and BlueField-3 SmartNICs

Benjamin Michalowicz;Kaushik Kandadi Suresh;Hari Subramoni;Dhabaleswar K. DK Panda;Steve Poole

2023 IEEE Symposium on High-Performance Interconnects (HOTI)

Year: 2023 | Conference Paper |

Cited by: Papers (4)

HTML

Over the past several years, Smart Network Interface Cards (NIC/SmartNICs) have rapidly evolved in popularity. In particular, NVIDIA’s BlueField line of SmartNICs has been effective in a wide variety of uses: Offloading communication in High-Performance Computing applications (HPC), various stages of the Deep Learning (DL) pipeline, and is designed especially for Datacenter/virtualization uses. Th...Show More

Battle of the BlueFields: An In-Depth Comparison of the BlueField-2 and BlueField-3 SmartNICs

Benjamin Michalowicz;Kaushik Kandadi Suresh;Hari Subramoni;Dhabaleswar K. DK Panda;Steve Poole

2023 IEEE Symposium on High-Performance Interconnects (HOTI)

Year: 2023 | Conference Paper |

Designing In-network Computing Aware Reduction Collectives in MPI

Bharath Ramesh;Goutham Kalikrishna Reddy Kuncham;Kaushik Kandadi Suresh;Rahul Vaidya;Nawras Alnaasan;Mustafa Abduljabbar;Aamir Shafi;Hari Subramoni;Dhabaleswar K. DK Panda

2023 IEEE Symposium on High-Performance Interconnects (HOTI)

Year: 2023 | Conference Paper |

Cited by: Papers (1)

HTML

The Message-Passing Interface (MPI) provides convenient abstractions such as MPI_Allreduce for inter-process collective reduction operations. With the advent of deep learning and large-scale HPC systems, it is ever so important to optimize the latency of the MPI_Allreduce operation for large messages. Due to the amount of compute and communication involved in MPI_Allreduce, it is beneficial to off...Show More

Designing In-network Computing Aware Reduction Collectives in MPI

Bharath Ramesh;Goutham Kalikrishna Reddy Kuncham;Kaushik Kandadi Suresh;Rahul Vaidya;Nawras Alnaasan;Mustafa Abduljabbar;Aamir Shafi;Hari Subramoni;Dhabaleswar K. DK Panda

2023 IEEE Symposium on High-Performance Interconnects (HOTI)

Year: 2023 | Conference Paper |

In-Depth Evaluation of a Lower-Level Direct-Verbs API on InfiniBand-based Clusters: Early Experiences

Benjamin Michalowicz;Kaushik Kandadi Suresh;Bharath Ramesh;Aamir Shafi;Hari Subramoni;Mustafa Abduljabbar;Dhabaleswar Panda

2023 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

Year: 2023 | Conference Paper |

Cited by: Papers (1)

HTML

Many High-Performance Computing (HPC) clusters around the world use some variation of InfiniBand interconnects, all of which are powered by the ‘‘Verbs’’ API. Verbs supply a quick, efficient, and developer-friendly method of passing data buffers between nodes through their interconnect(s). In more recent years, the MLX5-DV (Direct Verbs) API has made itself known as a method of providing mechanism...Show More

In-Depth Evaluation of a Lower-Level Direct-Verbs API on InfiniBand-based Clusters: Early Experiences

Benjamin Michalowicz;Kaushik Kandadi Suresh;Bharath Ramesh;Aamir Shafi;Hari Subramoni;Mustafa Abduljabbar;Dhabaleswar Panda

2023 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

Year: 2023 | Conference Paper |

ScaMP: Scalable Meta-Parallelism for Deep Learning Search

Quentin Anthony;Lang Xu;Aamir Shafi;Hari Subramoni;Dhabaleswar K. Dk Panda

2023 IEEE/ACM 23rd International Symposium on Cluster, Cloud and Internet Computing Workshops (CCGridW)

Year: 2023 | Conference Paper |

HTML

In this paper, we propose Scalable Meta-Parallelism for Deep Learning Search (ScaMP): a distributed Hyperparameter Optimization (HPO) and Neural Architecture Search (NAS) framework that supports out-of-core models with flexible parallelism schemes. SCaMP is integrated into the modern DL ecosystem, and enables both efficient parallel training of concurrent candidate architectures and aggregate devi...Show More

ScaMP: Scalable Meta-Parallelism for Deep Learning Search

Quentin Anthony;Lang Xu;Aamir Shafi;Hari Subramoni;Dhabaleswar K. Dk Panda

2023 IEEE/ACM 23rd International Symposium on Cluster, Cloud and Internet Computing Workshops (CCGridW)

Year: 2023 | Conference Paper |

MCR-DL: Mix-and-Match Communication Runtime for Deep Learning

Quentin Anthony;Ammar Ahmad Awan;Jeff Rasley;Yuxiong He;Aamir Shafi;Mustafa Abduljabbar;Hari Subramoni;Dhabaleswar Panda

2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS)

Year: 2023 | Conference Paper |

Cited by: Papers (5)

HTML

In recent years, the training requirements of many state-of-the-art Deep Learning (DL) models have scaled beyond the compute and memory capabilities of a single processor, and necessitated distribution among processors. Training such massive models necessitates advanced parallelism strategies [1], [2] to maintain efficiency. However, such distributed DL parallelism strategies require a varied mixt...Show More

MCR-DL: Mix-and-Match Communication Runtime for Deep Learning

Quentin Anthony;Ammar Ahmad Awan;Jeff Rasley;Yuxiong He;Aamir Shafi;Mustafa Abduljabbar;Hari Subramoni;Dhabaleswar Panda

2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS)

Year: 2023 | Conference Paper |

IEEE Personal Account

Change username/password

Purchase Details

Payment Options
View Purchased Documents

Profile Information

Communications Preferences
Profession and Education
Technical interests

Need Help?

US & Canada: +1 800 678 4333
Worldwide: +1 732 981 0060
Contact & Support

Follow

About IEEE Xplore | Contact Us | Help | Accessibility | Terms of Use | Nondiscrimination Policy | IEEE Ethics Reporting | Sitemap | IEEE Privacy Policy

A public charity, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity.

© Copyright 2025 IEEE - All rights reserved, including rights for text and data mining and training of artificial intelligence and similar technologies.

IEEE Account

Change Username/Password
Update Address

Purchase Details

Payment Options
Order History
View Purchased Documents

Profile Information

Communications Preferences
Profession and Education
Technical Interests

Need Help?

US & Canada: +1 800 678 4333
Worldwide: +1 732 981 0060
Contact & Support

About IEEE Xplore
Contact Us
Help
Accessibility
Terms of Use
Nondiscrimination Policy
Sitemap
Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity.
© Copyright 2025 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.