Conferences >2021 IEEE/CVF Conference on C...

NPAS: A Compiler-aware Framework of Unified Network Pruning and Architecture Search for Beyond Real-Time Mobile Acceleration

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

With the increasing demand to efficiently deploy DNNs on mobile edge devices, it becomes much more important to reduce unnecessary computation and increase the execution ...Show More

Metadata

Abstract:

With the increasing demand to efficiently deploy DNNs on mobile edge devices, it becomes much more important to reduce unnecessary computation and increase the execution speed. Prior methods towards this goal, including model compression and network architecture search (NAS), are largely performed independently, and do not fully consider compiler-level optimizations which is a must-do for mobile acceleration. In this work, we first propose (i) a general category of fine-grained structured pruning applicable to various DNN layers, and (ii) a comprehensive, compiler automatic code generation framework supporting different DNNs and different pruning schemes, which bridge the gap of model compression and NAS. We further propose NPAS, a compiler-aware unified network pruning and architecture search. To deal with large search space, we propose a meta-modeling procedure based on reinforcement learning with fast evaluation and Bayesian optimization, ensuring the total number of training epochs comparable with representative NAS frameworks. Our framework achieves 6.7ms, 5.9ms, and 3.9ms ImageNet inference times with 78.2%, 75% (MobileNet-V3 level), and 71% (MobileNet-V2 level) Top-1 accuracy respectively on an off-the-shelf mobile phone, consistently outperforming prior work.

Published in: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Date of Conference: 20-25 June 2021

Date Added to IEEE Xplore: 02 November 2021

ISBN Information:

ISSN Information:

DOI: 10.1109/CVPR46437.2021.01403

Conference Location: Nashville, TN, USA

Funding Agency:

Citations are not available for this document.

Contents

1. Introduction

The growing popularity of mobile AI applications and the demand for real-time Deep Neural Network (DNN) executions raise significant challenges for DNN accelerations. However, the ever-growing size of DNN models causes intensive computation and memory cost, which impedes the deployment on resource limited mobile devices.

References is not available for this document.

MIT Libraries

MIT Libraries

NPAS: A Compiler-aware Framework of Unified Network Pruning and Architecture Search for Beyond Real-Time Mobile Acceleration

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

1. Introduction

Cites in Papers - |

Cites in Papers - IEEE (9)

Cites in Papers - Other Publishers (6)

References

IEEE Account

Purchase Details

Profile Information

Need Help?

MIT Libraries

MIT Libraries

NPAS: A Compiler-aware Framework of Unified Network Pruning and Architecture Search for Beyond Real-Time Mobile Acceleration

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

1. Introduction

Cites in Papers - IEEE (9) | Other Publishers (6)

Cites in Papers - IEEE (9)

Cites in Papers - Other Publishers (6)

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Cites in Papers - |