Conferences >2023 IEEE/CVF Conference on C...

Making Models Shallow Again: Jointly Learning to Reduce Non-Linearity and Depth for Latency-Efficient Private Inference

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Large number of ReLU and MAC operations of Deep neural networks make them ill-suited for latency and compute-efficient private inference. In this paper, we present a mode...Show More

Metadata

Abstract:

Large number of ReLU and MAC operations of Deep neural networks make them ill-suited for latency and compute-efficient private inference. In this paper, we present a model optimization method that allows a model to learn to be shallow. In particular, we leverage the ReLU sensitivity of a convolutional block to remove a ReLU layer and merge its succeeding and preceding convolution layers to a shallow block. Unlike existing ReLU reduction methods, our joint reduction method can yield models with improved reduction of both ReLUs and linear operations by up to 1.73× and 1.47×, respectively, evaluated with ResNet18 on CIFAR-100 without any significant accuracy-drop.

Published in: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)

Date of Conference: 17-24 June 2023

Date Added to IEEE Xplore: 14 August 2023

ISBN Information:

ISSN Information:

DOI: 10.1109/CVPRW59228.2023.00494

Conference Location: Vancouver, BC, Canada

Contents

1. Introduction

Machine learning as a service (MLaaS) helps many users leverage the benefits of artificial intelligence (AI) augmented applications on their private data. However, due to the growing concerns associated with the model IP protection [14], the service providers often prefer to retain the model at its end rather than sharing the black box model IP with the user. Users often do not prefer sharing their personal data due to various data privacy issues. To tackle these concerns, various private inference (PI) methods [7], [18], [20], [21] have been proposed that leverage techniques such as Homomorphic encryption (HE) [1] and secure multi-party computation (MPC) protocols to preserve the privacy of the client’s data as well as the model’s IP. Popular PI frameworks including Gazelle [9], DELPHI [18], CryptoNAS [5], and Cheetah [19] leverage these privacy preserving mechanisms. However, unlike traditional inference, the non-linear ReLU operation latency in PI can increase up to two orders of magnitude. For example, PI methods generally use Yao’s Garbled Circuits (GC) [22] that demand orders of magnitude higher latency and communication than that of linear multiply-accumulate (MAC) operations.

References is not available for this document.

MIT Libraries

MIT Libraries

Making Models Shallow Again: Jointly Learning to Reduce Non-Linearity and Depth for Latency-Efficient Private Inference

Abstract:

Metadata

Abstract:

ISSN Information:

1. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

MIT Libraries

MIT Libraries

Making Models Shallow Again: Jointly Learning to Reduce Non-Linearity and Depth for Latency-Efficient Private Inference

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

1. Introduction

References