Together or Alone: The Price of Privacy in Collaborative Learning

Open access

Abstract

Machine learning algorithms have reached mainstream status and are widely deployed in many applications. The accuracy of such algorithms depends significantly on the size of the underlying training dataset; in reality a small or medium sized organization often does not have the necessary data to train a reasonably accurate model. For such organizations, a realistic solution is to train their machine learning models based on their joint dataset (which is a union of the individual ones). Unfortunately, privacy concerns prevent them from straightforwardly doing so. While a number of privacy-preserving solutions exist for collaborating organizations to securely aggregate the parameters in the process of training the models, we are not aware of any work that provides a rational framework for the participants to precisely balance the privacy loss and accuracy gain in their collaboration.

In this paper, by focusing on a two-player setting, we model the collaborative training process as a two-player game where each player aims to achieve higher accuracy while preserving the privacy of its own dataset. We introduce the notion of Price of Privacy, a novel approach for measuring the impact of privacy protection on the accuracy in the proposed framework. Furthermore, we develop a game-theoretical model for different player types, and then either find or prove the existence of a Nash Equilibrium with regard to the strength of privacy protection for each player. Using recommendation systems as our main use case, we demonstrate how two players can make practical use of the proposed theoretical framework, including setting up the parameters and approximating the non-trivial Nash Equilibrium.

If the inline PDF is not rendering correctly, you can download the PDF file here.

  • [1] Michela Chessa Jens Grossklags and Patrick Loiseau. A game-theoretic study on non-monetary incentives in data analytics projects with privacy implications. In Computer Security Foundations Symposium (CSF) 2015 IEEE 28th. IEEE 2015.

  • [2] Cynthia Dwork. Differential privacy. In Proceedings of the 33rd international conference on Automata Languages and Programming. ACM 2006.

  • [3] Arik Friedman Shlomo Berkovsky and Mohamed Ali Kaafar. A differential privacy framework for matrix factorization recommender systems. User Modeling and User-Adapted Interaction 2016.

  • [4] Jihun Hamm Yingjun Cao and Mikhail Belkin. Learning privately from multiparty data. In International Conference on Machine Learning 2016.

  • [5] J. Han M. Kamber and J. Pei. Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers 2012.

  • [6] John C Harsanyi Reinhard Selten et al. A general theory of equilibrium selection in games. MIT Press Books 1988.

  • [7] Stratis Ioannidis and Patrick Loiseau. Linear regression as a non-cooperative game. In International Conference on Web and Internet Economics. Springer 2013.

  • [8] Yehuda Koren Robert Bell and Chris Volinsky. Matrix factorization techniques for recommender systems. Computer 2009.

  • [9] Elias Koutsoupias and Christos Papadimitriou. Worst-case equilibria. In Stacs. Springer 1999.

  • [10] Fabio Martinelli Andrea Saracino and Mina Sheikhalishahi. Modeling privacy aware information sharing systems: A formal and general approach. In Trustcom/BigDataSE/ISPA 2016 IEEE pages 767–774. IEEE 2016.

  • [11] H Brendan McMahan Eider Moore Daniel Ramage Seth Hampson et al. Communication-efficient learning of deep networks from decentralized data. arXiv preprint arXiv:1602.05629 2016.

  • [12] Dov Monderer and Lloyd S Shapley. Potential games. Games and economic behavior 1996.

  • [13] Manas Pathak Shantanu Rane and Bhiksha Raj. Multiparty differential privacy via aggregation of locally trained classifiers. In Advances in Neural Information Processing Systems 2010.

  • [14] Jeffrey Pawlick and Quanyan Zhu. A stackelberg game perspective on the conflict between machine learning and data obfuscation. In Information Forensics and Security (WIFS) 2016 IEEE International Workshop on 2016.

  • [15] Balazs Pejo. Matrix factorisation in matlab via stochastic gradient descent. https://github.com/pidzso/ML.

  • [16] Arun Rajkumar and Shivani Agarwal. A differentially private stochastic gradient descent algorithm for multiparty classification. In Artificial Intelligence and Statistics 2012.

  • [17] Mina Sheikhalishahi and Fabio Martinelli. Privacy-utility feature selection as a privacy mechanism in collaborative data classification. In Enabling Technologies: Infrastructure for Collaborative Enterprises (WETICE) 2017 IEEE 26th International Conference on pages 244–249. IEEE 2017.

  • [18] Xiaotong Wu Taotao Wu Maqbool Khan Qiang Ni and Wanchun Dou. Game theory based correlated privacy preserving analysis in big data. IEEE Transactions on Big Data 2017.

Search
Journal information
Metrics
All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 111 111 7
PDF Downloads 96 96 5