I. Introduction
Convolutional neural networks (CNNs) have achieved unprecedented success and demonstrated state-of-the-art performance in various domains. However, as the performance of CNNs has rapidly improved, the demand for computational and memory resources has also increased, limiting their deployment on resource-constrained embedded devices. Therefore, model compression and acceleration techniques have been proposed, including knowledge distillation [1], quantization [2], network architecture search [3], and Channel pruning [4] –[6]. Among them, channel pruning has gained widespread attention due to its ability to achieve hardware acceleration without requiring specialized acceleration libraries.