Abstract:
Dynamic pruning is an effective model compression method to reduce the computational cost of networks. However, existing dynamic pruning methods are limited to pruning al...Show MoreMetadata
Abstract:
Dynamic pruning is an effective model compression method to reduce the computational cost of networks. However, existing dynamic pruning methods are limited to pruning along a single dimension (channel, spatial, or depth), which cannot maximally excavate the redundancy of the network. Meanwhile, most of the current state-of-the-arts usually implement dynamic pruning via masked-out partial channels and pixels for training while failing to accelerate the inference speed. To tackle these limitations, we propose a novel fuzzy-based multidimensional dynamic pruning paradigm to dynamically compress neural networks along both the channel and spatial dimensions. Specifically, we design a multidimensional fuzzy-mask block to simultaneously learn which spatial positions or channels are redundant and need to be pruned. Then, the Gumbel-Softmax trick combined with a sparsity loss is introduced to train these mask modules in an end-to-end manner. During the testing stage, we convert features and convolution kernels into two matrices, respectively, and then implement sparse convolution through matrix multiplication to accelerate the network inference. Extensive experiments demonstrate that our method outperforms existing methods in terms of accuracy and computational cost. For instance, on the CIFAR-10 dataset, our method prunes 68% FLOPs of ResNet-56 with only a 0.07% Top-1 accuracy drop.
Published in: IEEE Transactions on Fuzzy Systems ( Volume: 32, Issue: 9, September 2024)