Loading [a11y]/accessibility-menu.js
NPAS: A Compiler-aware Framework of Unified Network Pruning and Architecture Search for Beyond Real-Time Mobile Acceleration | IEEE Conference Publication | IEEE Xplore

NPAS: A Compiler-aware Framework of Unified Network Pruning and Architecture Search for Beyond Real-Time Mobile Acceleration


Abstract:

With the increasing demand to efficiently deploy DNNs on mobile edge devices, it becomes much more important to reduce unnecessary computation and increase the execution ...Show More

Abstract:

With the increasing demand to efficiently deploy DNNs on mobile edge devices, it becomes much more important to reduce unnecessary computation and increase the execution speed. Prior methods towards this goal, including model compression and network architecture search (NAS), are largely performed independently, and do not fully consider compiler-level optimizations which is a must-do for mobile acceleration. In this work, we first propose (i) a general category of fine-grained structured pruning applicable to various DNN layers, and (ii) a comprehensive, compiler automatic code generation framework supporting different DNNs and different pruning schemes, which bridge the gap of model compression and NAS. We further propose NPAS, a compiler-aware unified network pruning and architecture search. To deal with large search space, we propose a meta-modeling procedure based on reinforcement learning with fast evaluation and Bayesian optimization, ensuring the total number of training epochs comparable with representative NAS frameworks. Our framework achieves 6.7ms, 5.9ms, and 3.9ms ImageNet inference times with 78.2%, 75% (MobileNet-V3 level), and 71% (MobileNet-V2 level) Top-1 accuracy respectively on an off-the-shelf mobile phone, consistently outperforming prior work.
Date of Conference: 20-25 June 2021
Date Added to IEEE Xplore: 02 November 2021
ISBN Information:

ISSN Information:

Conference Location: Nashville, TN, USA

Funding Agency:

Citations are not available for this document.

1. Introduction

The growing popularity of mobile AI applications and the demand for real-time Deep Neural Network (DNN) executions raise significant challenges for DNN accelerations. However, the ever-growing size of DNN models causes intensive computation and memory cost, which impedes the deployment on resource limited mobile devices.

Cites in Papers - |

Cites in Papers - IEEE (9)

Select All
1.
Ming Ma, Yue Wang, Taoli Du, Qinxu Gao, Ying Wang, Wenhui Li, "Global Static Pruning via Adaptive Sample Complexity Awareness", ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.1-5, 2025.
2.
Chaoxiong Yi, Songlei Jian, Yusong Tan, Yusen Zhang, "HMO: Host Memory Optimization for Model Inference Acceleration on Edge Devices", 2024 IEEE International Conference on Systems, Man, and Cybernetics (SMC), pp.2813-2819, 2024.
3.
Hongrong Cheng, Miao Zhang, Javen Qinfeng Shi, "A Survey on Deep Neural Network Pruning: Taxonomy, Comparison, Analysis, and Recommendations", IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.46, no.12, pp.10558-10578, 2024.
4.
Chaoxiong Yi, Songlei Jian, Yusong Tan, Yusen Zhang, "MACA: Memory-aware convolution accelerating for CNN inference on edge devices", 2024 27th International Conference on Computer Supported Cooperative Work in Design (CSCWD), pp.1250-1255, 2024.
5.
Yang He, Lingao Xiao, "Structured Pruning for Deep Convolutional Neural Networks: A Survey", IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.46, no.5, pp.2900-2919, 2024.
6.
Artur Jordao, George de Araújo, Helena de Almeida Maia, Helio Pedrini, "When Layers Play the Lottery, all Tickets Win at Initialization", 2023 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), pp.1196-1205, 2023.
7.
Souvik Kundu, Sairam Sundaresan, Massoud Pedram, Peter A. Beerel, "FLOAT: Fast Learnable Once-for-All Adversarial Training for Tunable Trade-off between Accuracy and Robustness", 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pp.2348-2357, 2023.
8.
Paul Wimmer, Jens Mehnert, Alexandru Condurache, "Interspace Pruning: Using Adaptive Filter Representations to Improve Training of Sparse CNNs", 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.12517-12527, 2022.
9.
Yongzhe Jia, Bowen Liu, Wanchun Dou, Xiaolong Xu, Xiaokang Zhou, Lianyong Qi, Zheng Yan, "CroApp: A CNN-Based Resource Optimization Approach in Edge Computing Environment", IEEE Transactions on Industrial Informatics, vol.18, no.9, pp.6300-6307, 2022.

Cites in Papers - Other Publishers (6)

1.
Kun Lu, Xuejuan Pan, Chunfeng Mi, Wenyan Wang, Jun Zhang, Peng Chen, Bing Wang, "RDDPA: Real-time Defect Detection via Pruning Algorithm on Steel Surface", ISIJ International, vol.64, no.6, pp.1019, 2024.
2.
Ming-Yang Zhang, Xin-Yi Yu, Lin-Lin Ou, "Effective Model Compression via Stage-wise Pruning", Machine Intelligence Research, vol.20, no.6, pp.937, 2023.
3.
Yubin Duan, Jie Wu, "Accelerating distributed machine learning with model compression and graph partition", Journal of Parallel and Distributed Computing, vol.179, pp.104705, 2023.
4.
Haoran You, Baopu Li, Zhanyi Sun, Xu Ouyang, Yingyan Lin, "SuperTickets: Drawing Task-Agnostic Lottery Tickets from Supernets via Jointly Architecture Searching and Parameter Pruning", Computer Vision ? ECCV 2022, vol.13671, pp.674, 2022.
5.
Taeho Kim, Yongin Kwon, Jemin Lee, Taeho Kim, Sangtae Ha, "CPrune: Compiler-Informed Model Pruning for Efficient Target-Aware DNN Execution", Computer Vision ? ECCV 2022, vol.13680, pp.651, 2022.
6.
Junjie He, Yinzhang Ding, Ming Zhang, Dongxiao Li, "Towards efficient network compression via Few-Shot Slimming", Neural Networks, vol.147, pp.113, 2022.
Contact IEEE to Subscribe

References

References is not available for this document.