Loading web-font TeX/Main/Regular
Hardware Acceleration in Large-Scale Tensor Decomposition for Neural Network Compression | IEEE Conference Publication | IEEE Xplore

Hardware Acceleration in Large-Scale Tensor Decomposition for Neural Network Compression


Abstract:

A tensor is a multi-dimensional array, which is embedded for neural networks. The multiply-accumulate (MAC) operations involved in a large-scale tensor introduces high co...Show More

Abstract:

A tensor is a multi-dimensional array, which is embedded for neural networks. The multiply-accumulate (MAC) operations involved in a large-scale tensor introduces high computational complexity. Since the tensor usually features a low rank, the computational complexity can be largely reduced through canonical polyadic decomposition (CPD). This work presents an energy-efficient hardware accelerator that implements randomized CPD in large-scale tensors for neural network compression. A mixing method that combines the Walsh-Hadamard transform and discrete cosine transform is proposed to replace the fast Fourier transform with faster convergence. It reduces the computations for transformation by 83%. 75% of computations for solving the required least squares problem are also reduced. The proposed accelerator is flexible to support tensor decomposition with a size of up to 512\times 512\times 9\times 9. Compared to the prior dedicated processor for tensor computation, this work support larger tensors and achieves a 112\times lower latency given the same condition.
Date of Conference: 07-10 August 2022
Date Added to IEEE Xplore: 22 August 2022
ISBN Information:

ISSN Information:

Conference Location: Fukuoka, Japan

Funding Agency:


I. Introduction

A tensor is an array with multiple dimensions. It has been applied to many applications, especially for deep learning due to the structure of neural networks with multiple dimensions, contributed by the dimensions of feature map and filter (also known as kernel). Modern neural networks usually include a large amount of hyper-parameters. Multiply-accumulate (MAC) operations involved in a large-scale tensor introduces high computational complexity and makes deploying neural networks on resource-constrained devices challenging. The tensors involved in these neural networks usually feature a low-rank property [1]. This property can be leveraged to compress the neural networks, thereby reducing their computational complexity and memory usage.

Contact IEEE to Subscribe

References

References is not available for this document.