I. Introduction
Computer vision tasks such as image classification [1], object localization [2] and semantic segmentation [3] are fundamental to many applications such as autonomous driving, human-robotic interaction and smart factories. With the abundance of training data and compute resources, deep learning algorithms, such as convolutional neural networks (CNNs), have dominated most computer vision tasks, albeit at the cost of increased memory requirements and compute complexity [4], [5], [6].