Conferences >2018 30th International Sympo...

On the Resilience of RTL NN Accelerators: Fault Characterization and Mitigation

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Machine Learning (ML) is making a strong resurgence in tune with the massive generation of unstructured data which in turn requires massive computational resources. Due t...Show More

Metadata

Abstract:

Machine Learning (ML) is making a strong resurgence in tune with the massive generation of unstructured data which in turn requires massive computational resources. Due to the inherently compute and power-intensive structure of Neural Networks (NNs), hardware accelerators emerge as a promising solution. However, with technology node scaling below 10nm, hardware accelerators become more susceptible to faults, which in turn can impact the NN accuracy. In this paper, we study the resilience aspects of Register-Transfer Level (RTL) model of NN accelerators, in particular, fault characterization and mitigation. By following a High-Level Synthesis (HLS) approach, first, we characterize the vulnerability of various components of RTL NN. We observed that the severity of faults depends on both i) application-level specifications, i.e., NN data (inputs, weights, or intermediate) and NN layers and ii) architectural-level specifications, i.e., data representation model and the parallelism degree of the underlying accelerator. Second, motivated by characterization results, we present a low-overhead fault mitigation technique that can efficiently correct bit flips, by 47.3% better than state-of-the-art methods.

Published in: 2018 30th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)

Date of Conference: 24-27 September 2018

Date Added to IEEE Xplore: 21 February 2019

ISBN Information:

Print on Demand(PoD) ISSN: 1550-6533

DOI: 10.1109/CAHPC.2018.8645906

Conference Location: Lyon, France

Contents

I. Introduction

Machine learning models and in particular Neural Networks (NNs) are increasingly being used in the context of nonlinear “cognitive” problems, such as natural language processing and computer vision. These models can learn from a dataset in the training phase and make predictions on a new, previously unseen data in the inference/prediction/classification phase with ever-increasing accuracy. However, the compute and power-intensive nature of NNs prevents their effective deployment in resource-constrained environments, such as mobile scenarios. Hardware acceleration on Application Specific Integrated Circuits (ASICs) or Field Programmable Gate Arrays (FPGAs) offers a roadmap for enabling NNs in these scenarios [1]–[4]. However, similar to general purpose devices, hardware accelerators are also susceptible to faults (permanent/hard and transient/soft), as a consequence of Single Event Upset (SEU), manufacturing defects, and below safe-voltage operations [5], [6]. The ever-increasing rate of these faults in nano-scale technology nodes, can directly impact the accuracy of NNs.

References is not available for this document.

On the Resilience of RTL NN Accelerators: Fault Characterization and Mitigation

Abstract:

Metadata

Abstract:

I. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

On the Resilience of RTL NN Accelerators: Fault Characterization and Mitigation

Alerts

Abstract:

Metadata

Abstract:

I. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?