I. Introduction
There are two chief categories of weight pruning, structured and non-structured. General non-structured pruning [1]–[4] can prune arbitrary weight in DNN. Even though it provides high pruning rates (weight reduction), the sparse weight matrix storage and associated indices limit its actual hardware implementation. [3], [5], [6]. In contrast, structured pruning [5]–[8] can directly reduce the size of weight matrix and maintain the form of an entire matrix. Therefore, it is adaptable with hardware acceleration and has become a research focus in recent years. Structured pruning has various schemes, e.g., filter pruning, channel pruning, and column pruning for convolutional layers of DNN, as summarized in [1], [5], [6], [8]. A systematic solution framework has recently been developed based on a robust optimization tool, namely alternating direction methods of multipliers (ADMM) [9], [10]. It applies to schemes of structured and non-structured pruning and has achieved the-state-of-the-art results. [4], [8].