I. Introduction
Deep neural networks (DNNs) have shown unprecedented performance on various intelligent tasks such as image recognition, speech recognition, natural language processing, etc. Despite substantial accuracy improvement, the high demands on memory storage and computational resources constrain the on-chip implementation of DNNs. For example, AlexNet [1] has 61M parameters and requires 1.5B high precision operations to classify one image, making it prohibitive to directly implement the entire architecture on chip. As off-chip memory such as DRAM gets involved, the intensive data movements between on-chip processor and off-chip DRAM lead to high energy consumption. Recently, Binary Neural Networks (BNNs) [2] [3] have been proposed to provide comparable classification accuracy to conventional high-precision neural network on various datasets (e.g., MNIST, CIFAR 10, and ImageNet). In these BNNs, the weights and neuron values are truncated to binary values (i.e., and −1), thus the memory storage size is drastically reduced. Moreover, high-precision multiply operations can be replaced by bit-wise operations, reducing the computational workloads as well. Thus, BNNs provide a promising solution for the on-chip implementation of DNNs.