Loading [MathJax]/extensions/TeX/mhchem.js
Reliability Evaluation and Analysis of FPGA-Based Neural Network Acceleration System | IEEE Journals & Magazine | IEEE Xplore

Reliability Evaluation and Analysis of FPGA-Based Neural Network Acceleration System

Publisher: IEEE

Abstract:

Prior works typically conducted the fault analysis of neural network accelerator computing arrays with simulation and focused on the prediction accuracy loss of the neura...View more

Abstract:

Prior works typically conducted the fault analysis of neural network accelerator computing arrays with simulation and focused on the prediction accuracy loss of the neural network models. There is still a lack of systematic fault analysis of the neural network acceleration system that considers both the accuracy degradation and system exceptions, such as system stall and running overtime. To that end, we implemented a representative neural network accelerator and corresponding fault injection modules on a Xilinx ARM-FPGA platform and evaluated the reliability of the system under different fault injection rates when a series of typical neural network models are deployed on the neural network acceleration system. The entire fault injection and reliability evaluation system is open-sourced on GitHub. With comprehensive experiments on the system, we identify the system exceptions based on the various abnormal behaviors of the FPGA-based neural network acceleration system and analyze the underlying reasons. Particularly, we find that the probability of the system exceptions dominates the reliability of the system. The faults also incur accuracy degradation of the neural network models, but the influence depends on the applications of the models and can vary greatly. In addition, we also evaluated the use of conventional triple modular redundancy (TMR) and demonstrated the challenge of TMR with both experiments and analytical models, which may shed light on the reliability design of the FPGA-based neural network acceleration system.
Page(s): 472 - 484
Date of Publication: 08 January 2021

ISSN Information:

Publisher: IEEE

Funding Agency:


I. Introduction

Neural networks have been demonstrated to be successful in many territories, such as image processing and video processing, over the years [1]. Among these territories, some of the applications, such as self-driving, robot-assisted surgery, and medical diagnosis, can be mission-critical, and they have strict prediction accuracy requirements on the neural network models [2]. When the neural network models used in these applications are deployed on accelerators for the sake of higher performance and energy efficiency, the reliability of the underlying acceleration system becomes critical because hardware faults may lead to considerable wrong predictions [3] and system exceptions that can hardly be considered by the neural network model designers, which may cause catastrophic consequences.

References

References is not available for this document.