I. Introduction
Deep convolutional neural network (DCNN) processors for edge platforms [1]–[3], such as the IoT require low run-time energy consumption, minimal footprint, short computing latency, and cost-effective solutions. Today, DCNN processors typically use either GPUs [4] or custom-built ASIC [3], [5] for the energy-intensive computing operations. But, the plateauing of transistor count and speed-bottlenecks posed by Von Neumann architecture have made further power and speed improvements in these systems difficult [6]. This stagnation motivates the investigation of more efficient but specialized devices and architectures [6] for faster systems with lower power, area consumption.