I. Introduction
The massive deployment of deep neural networks (DNNs) in machine learning (ML) applications such as computer vision, speech recognition and natural language processing have ex-ponentially grown in the past few decades [1], [2]. Multilayer Perceptrons (MLPs) shown in Fig. 1(a) are a class of fully connected feed-forward DNNs that gained great popularity in artificial neural network (ANN) hardware accelerators. In most ANN hardware accelerators, the neural network is trained offline and then the weights are mapped to the array of the ANN accelerator to perform inference operations [3], [4]. However, with DNNs getting more complex with emerging AI tasks, the number of layers, weights, computational costs and storage requirements have dramatically increased making the hardware realization very challenging [5].