On the Resilience of RTL NN Accelerators: Fault Characterization and Mitigation | IEEE Conference Publication | IEEE Xplore

On the Resilience of RTL NN Accelerators: Fault Characterization and Mitigation


Abstract:

Machine Learning (ML) is making a strong resurgence in tune with the massive generation of unstructured data which in turn requires massive computational resources. Due t...Show More

Abstract:

Machine Learning (ML) is making a strong resurgence in tune with the massive generation of unstructured data which in turn requires massive computational resources. Due to the inherently compute and power-intensive structure of Neural Networks (NNs), hardware accelerators emerge as a promising solution. However, with technology node scaling below 10nm, hardware accelerators become more susceptible to faults, which in turn can impact the NN accuracy. In this paper, we study the resilience aspects of Register-Transfer Level (RTL) model of NN accelerators, in particular, fault characterization and mitigation. By following a High-Level Synthesis (HLS) approach, first, we characterize the vulnerability of various components of RTL NN. We observed that the severity of faults depends on both i) application-level specifications, i.e., NN data (inputs, weights, or intermediate) and NN layers and ii) architectural-level specifications, i.e., data representation model and the parallelism degree of the underlying accelerator. Second, motivated by characterization results, we present a low-overhead fault mitigation technique that can efficiently correct bit flips, by 47.3% better than state-of-the-art methods.
Date of Conference: 24-27 September 2018
Date Added to IEEE Xplore: 21 February 2019
ISBN Information:
Print on Demand(PoD) ISSN: 1550-6533
Conference Location: Lyon, France

I. Introduction

Machine learning models and in particular Neural Networks (NNs) are increasingly being used in the context of nonlinear “cognitive” problems, such as natural language processing and computer vision. These models can learn from a dataset in the training phase and make predictions on a new, previously unseen data in the inference/prediction/classification phase with ever-increasing accuracy. However, the compute and power-intensive nature of NNs prevents their effective deployment in resource-constrained environments, such as mobile scenarios. Hardware acceleration on Application Specific Integrated Circuits (ASICs) or Field Programmable Gate Arrays (FPGAs) offers a roadmap for enabling NNs in these scenarios [1]–[4]. However, similar to general purpose devices, hardware accelerators are also susceptible to faults (permanent/hard and transient/soft), as a consequence of Single Event Upset (SEU), manufacturing defects, and below safe-voltage operations [5], [6]. The ever-increasing rate of these faults in nano-scale technology nodes, can directly impact the accuracy of NNs.

Contact IEEE to Subscribe

References

References is not available for this document.