I. Introduction
Deep Neural Networks (DNN) have been successfully applied to various computer vision and natural language processing problems. Recently, many applications based on DNNs have been developed to provide more intelligent video analytics. For example, some drones such as DJI Mavic Pro can recognize and follow a target based on video analytics; law enforcement officers can use smart glasses to identify suspects [1]. In these applications, only lightweight DNNs can be run locally and their accuracy is much lower than advanced DNNs. Although advanced DNNs can provide us with better results, they also suffer from high computational overhead which means long delay and more energy consumption when running on mobile devices.