Loading [MathJax]/extensions/MathZoom.js
Janardhan Rao Doppa - IEEE Xplore Author Profile

Showing 1-25 of 65 results

Filter Results

Show

Results

Transformer architectures have become the standard neural network model for various machine learning applications including natural language processing and computer vision. However, the compute and memory requirements introduced by transformer models make them challenging to adopt for edge applications. Furthermore, fine-tuning pre-trained transformers (e.g., foundation models) is a common task to...Show More
Operation Unit (OU)-based configurations enable the design of energy-efficient and reliable ReRAM crossbar-based Processing-In-Memory (PIM) architectures for Deep Neural Network (DNN) inferencing. To exploit sparsity and tackle crossbar non-idealities, matrix-vector-multiplication (MVM) operations are computed at a much smaller level of granularity than a full crossbar, referred to as OUs. However...Show More
Processing-in-memory (PIM) architectures have emerged as an attractive computing paradigm for accelerating deep neural network (DNN) training and inferencing. However, a plethora of PIM devices, e.g., resistive random-access memory, ferroelectric field-effect transistor, phase change memory, MRAM, static random-access memory, exists and each of these devices offers advantages and drawbacks in term...Show More
Resistive random access memory (ReRAM)-based processing-in-memory (PIM) architectures have demonstrated great potential to accelerate the deep neural network (DNN) training/inference. However, the computational accuracy of analog PIM is compromised due to the nonidealities, such as the conductance variation of ReRAM cells. The impact of these nonidealities worsens as the number of concurrently act...Show More
Graph neural networks (GNNs) are becoming popular in various real-world applications. However, hardware-level security is a concern when GNN models are mapped to emerging neuromorphic computing architectures, such as memristor-based crossbars. We identify a vulnerability of memristor-mapped GNNs and propose an attack mechanism based on the identified vulnerability. The proposed attack tampers memr...Show More
Large-scale manycore System-on-Chips (SoCs) need to satisfy the conflicting objectives of maximizing performance and minimizing energy consumption for dynamically changing applications. In this paper, we consider the problem of dynamic power management (DPM) in large manycore SoCs for unseen applications at runtime. We employ a machine learning (ML) based DPM policy, which selects the voltage/freq...Show More
Machine learning (ML) models have gained prominence in solving real-world tasks. However, implementing ML models is both compute- and memory-intensive. Domain-specific architectures such as Resistive Random Access Memory (ReRAM)-based Processing-in-Memory (PIM) platforms have been proposed to efficiently accelerate ML training and inference. However, existing ML workloads require a high amount of ...Show More
Wearable and internet of things (IoT) devices are transforming a number of high-impact applications. Machine learning (ML) algorithms on wearable devices assume that data from all sensors is available at runtime. However, one or more sensors may be unavailable at runtime due to malfunction, energy constraints or communication challenges. Loss of sensor data can potentially lead to severe degradati...Show More
Unsupervised domain adaptation (UDA) provides a strategy for improving machine learning performance in data-rich (target) domains where ground truth labels are inaccessible but can be found in related (source) domains. In cases where meta-domain information such as label distributions is available, weak supervision can further boost performance. We propose a novel framework, CALDA, to tackle these...Show More
A ReRAM crossbar-based computing system (RCS) can accelerate CNN training. However, hardware faults due to manufacturing defects and limited endurance impede the widespread adoption of RCS. We propose a dynamic task remapping-based technique for reliable CNN training on faulty RCS. Experimental results demonstrate that the proposed low-overhead method incurs only 0.85% accuracy loss on average whi...Show More
Chiplet-based 2.5D systems that integrate multiple smaller chips on a single die are gaining popularity for executing both compute-and data-intensive applications. While smaller chips (chiplets) reduce fabrication costs, they also provide less functionality. Hence, manufacturing several smaller chiplets and combining them into a single system enables the functionality of a larger monolithic chip w...Show More
Graph neural networks (GNNs) are used for predictive analytics on graph-structured data, and they have become very popular in diverse real-world applications. Resistive random-access memory (ReRAM)-based PIM architectures can accelerate GNN training. However, GNN training on ReRAM-based architectures is both compute- and data intensive in nature. In this work, we propose a framework called SlimGNN...Show More
Despite the rapid progress on research in adversarial robustness of deep neural networks (DNNs), there is little principled work for the time-series domain. Since time-series data arises in diverse applications including mobile health, finance, and smart grid, it is important to verify and improve the robustness of DNNs for the time-series domain. In this paper, we propose a novel framework for th...Show More
Training machine learning (ML) models at the edge (on-chip training on end user devices) can address many pressing challenges including data privacy/security, increase the accessibility of ML applications to different parts of the world by reducing the dependence on the communication fabric and the cloud infrastructure, and meet the real-time requirements of AR/VR applications. However, existing e...Show More
Processing-in-memory (PIM) enables energy-efficient deployment of convolutional neural networks (CNNs) from edge to cloud. Resistive random-access memory (ReRAM) is one of the most commonly used technologies for PIM architectures. One of the primary limitations of ReRAM-based PIM in neural network training arises from the limited write endurance due to the frequent weight updates. To make ReRAM-ba...Show More
Resistive random-access memory has become one of the most popular choices of hardware implementation for machine learning application workloads. However, these devices exhibit non-ideal behavior, which presents a challenge towards widespread adoption. Training/inferencing on these faulty devices can lead to poor prediction accuracy. However, existing fault tolerant methods are associated with high...Show More
Wearable devices are becoming popular for health and activity monitoring. The machine learning (ML) models for these applications are trained by collecting data in a laboratory with precise control of experimental settings. However, during real-world deployment/usage, the experimental settings (e.g., sensor position or sampling rate) may deviate from those used during training. This discrepancy ca...Show More
Processing-in-memory (PIM) is a promising technique to accelerate deep learning (DL) workloads. Emerging DL workloads (e.g., ResNet with 152 layers) consist of millions of parameters, which increase the area and fabrication cost of monolithic PIM accelerators. The fabrication cost challenge can be addressed by 2.5-D systems integrating multiple PIM chiplets connected through a network-on-package (...Show More
Resistive random-access memory (ReRAM)-based manycore architectures enable acceleration of graph neural network (GNN) inference and training. GNNs exhibit characteristics of both DNNs and graph analytics. Hence, GNN training/inferencing on ReRAM-based manycore architectures give rise to both computation and on-chip communication challenges. In this work, we leverage model pruning and efficient gra...Show More
Wearable devices are becoming increasingly popular for health and activity monitoring applications. These devices typically include small rechargeable batteries to improve user comfort. However, the small battery capacity leads to limited operating life, requiring frequent recharging. Recent research has proposed energy harvesting using light and user motion to improve the lifetime of wearable dev...Show More
This article presents an automated design and optimization framework for electric transportation power systems (ETPS) enabled by machine learning (ML). The use of physical models, simulations, and optimization methods can greatly aid the engineering design process. However, when considering the optimal co-design of multiple interdependent subsystems that span multiple physical domains, such model-...Show More
Resistive random-access memory (ReRAM)-based architectures can be used to accelerate convolutional neural network (CNN) training. However, existing architectures either do not support normalization at all or they support only a limited version of it. Moreover, it is common practice for CNNs to add normalization layers after every convolution layer. In this work, we show that while normalization la...Show More
Resistive random-access memory (ReRAM) is a promising technology for designing hardware accelerators for deep neural network (DNN) inferencing. However, stochastic noise in ReRAM crossbars can degrade the DNN inferencing accuracy. We propose the design and optimization of a high-performance, area-and energy-efficient ReRAM-based hardware accelerator to achieve robust DNN inferencing in the presenc...Show More
Graph Neural Networks (GNNs) are a variant of Deep Neural Networks (DNNs) operating on graphs. GNNs have attributes of both DNNs and graph computation. However, training GNNs on manycore architectures is a challenging task because it involves heavy communication that bottlenecks performance. DropEdge and Dropout, which we collectively refer to as DropLayer, are regularization techniques that can i...Show More
A huge number of edge applications including self-driving cars, mobile health, robotics, and augmented reality / virtual reality are enabled by deep neural networks (DNNs). Currently, much of this computation for these applications happens in the cloud, but there are several good reasons to perform the processing on local edge platforms such as smartphones: improved accessibility to different part...Show More