I. Introduction
Recently, deep neural networks (DNNs) have made great breakthroughs and found applications in various domains, including natural language processing [1], [2], [3], semantic segmentation [4], as well as image and video recognition [5]. These networks typically consist of millions of parameters, requiring hundreds of megabytes for storage, and cost a huge amount of time for processing. Moreover, each evolution in the architecture of neural networks brings a continuous increase in the number of parameters, further exacerbating the storage and processing requirements. As a result, the deployment of DNNs on resource-limited conventional desktops and portable devices becomes challenging. Therefore, it is critical to reduce the number of parameters and the complexity of processing for the practical use of DNNs.