I. Introduction
The binary neural network (BNN) [1], [2] has been a good choice to achieve energy efficiency required in artificial intelligence of things (AIoT) [3], [4], [5], [6] applications. Both of its pre-trained weights and input activations are aggressively quantized to ±1. The computations of its binary convolution layers are simplified to the XNOR operations between the weights and activations. The activation process is simplified to perform a binary function. As a result, substantial savings of hardware resource and energy have been achieved while providing acceptable computational accuracy for AIoT inference tasks. To further optimize the energy efficiency of BNN, promising computing architectures such as in-memory computing (IMC) [7], [8], [9], [10], [11], [12], [13], [14], [15], [16] are also actively explored to minimize the excessive data movement between arithmetic/logic units and on-chip memories.