Privacy-preserving Training Algorithm for Naive Bayes Classifiers | IEEE Conference Publication | IEEE Xplore

Privacy-preserving Training Algorithm for Naive Bayes Classifiers


Abstract:

The growing popularity of Machine learning (ML) that appreciates high quality training datasets collected from multiple organizations raises natural questions about the p...Show More

Abstract:

The growing popularity of Machine learning (ML) that appreciates high quality training datasets collected from multiple organizations raises natural questions about the privacy guarantees that can be provided in such settings. Our work tackles this problem in the context of multi-party secure ML wherein multiple organizations provide their sensitive datasets to a data user and train a Naive Bayes (NB) model with the data user. We propose PPNB, a privacy-preserving scheme for training NB models, based on Homomorphic Cryptosystem (HC) and Differential Privacy (DP). PPNB achieves a balance performance between efficiency and accuracy in multi-party secure ML, enabled flexible switch among different tradeoffs by parameter tuning. Extensive experimental results validate the effectiveness of PPNB.
Date of Conference: 16-20 May 2022
Date Added to IEEE Xplore: 11 August 2022
ISBN Information:

ISSN Information:

Conference Location: Seoul, Korea, Republic of

Funding Agency:


I. Introduction

Machine learning (ML) models are widely used in many fields, such as spam detection, image classification, and natural language processing [1], [2]. The accuracy of models is closely related to the quality of the training dataset, in addition to well-designed ML algorithms. An experimental study with datasets of 300 million images at Google [3] demonstrates that the performance of models increases as the order of magnitude of training data grows. However, training datasets are usually held by multiple organizations and contain sensitive information. For example, a company wants to build a model to discern the most appropriate time for advertising. The training datasets used for learning the model are extracted from the consumer purchase data recorded by several online shopping sites, and the consumer data contains sensitive information about consumers. Therefore, it is important to protect data privacy in training ML models.

Contact IEEE to Subscribe

References

References is not available for this document.