I. Introduction
Recently, as computational power has increased and training data has become more diverse, deep learning technology has been applied to various everyday applications, such as computer vision, speech recognition, and natural language processing. In particular, object detection networks in the field of computer vision, based on convolutional neural networks (CNNs), have been actively studied and have shown high performance. However, as the network becomes deeper and wider, many CNNs are used, requiring large memory and computational resources. These over-parameterized networks are unsuitable for operation in practical embedded systems such as drones, phones, and cars, where memory, computing resources, and power are limited. Therefore, a significant challenge is to make CNNs lightweight by considering hardware specifications to use them more efficiently.