I. Introduction
Recently, neural network technology has been widely explored and adopted due to its high generalization capability, enabling applications like speech recognition, self-driving cars, smart home devices, etc. The major problem with these neural network applications is that extensive computation is required, which in turn necessitates power consumption and memory occupation, and which is not negligible especially for mobile or IoT use. For example, a convolutional neural network (CNN) model used for image recognition (VGG16 [1]) employs 16 layers, requiring 15 billion multiply-accumulation (MAC) operations and 277 MB weight memory at 16-bit expression.