Investigating Statistical Privacy Frameworks from the Perspective of Hypothesis Testing

Open access

Abstract

Over the last decade, differential privacy (DP) has emerged as the gold standard of a rigorous and provable privacy framework. However, there are very few practical guidelines on how to apply differential privacy in practice, and a key challenge is how to set an appropriate value for the privacy parameter ɛ. In this work, we employ a statistical tool called hypothesis testing for discovering useful and interpretable guidelines for the state-of-the-art privacy-preserving frameworks. We formalize and implement hypothesis testing in terms of an adversary’s capability to infer mutually exclusive sensitive information about the input data (such as whether an individual has participated or not) from the output of the privacy-preserving mechanism. We quantify the success of the hypothesis testing using the precision- recall-relation, which provides an interpretable and natural guideline for practitioners and researchers on selecting ɛ. Our key results include a quantitative analysis of how hypothesis testing can guide the choice of the privacy parameter ɛ in an interpretable manner for a differentially private mechanism and its variants. Importantly, our findings show that an adversary’s auxiliary information - in the form of prior distribution of the database and correlation across records and time - indeed influences the proper choice of ɛ. Finally, we also show how the perspective of hypothesis testing can provide useful insights on the relationships among a broad range of privacy frameworks including differential privacy, Pufferfish privacy, Blowfish privacy, dependent differential privacy, inferential privacy, membership privacy and mutual-information based differential privacy.

If the inline PDF is not rendering correctly, you can download the PDF file here.

  • [1] Detection decision and hypothesis testing. http://web.mit.edu/gallager/www/papers/chap3.pdf.

  • [2] David R Anderson Kenneth P Burnham and William L Thompson. Null hypothesis testing: problems prevalence and an alternative. The journal of wildlife management pages 912–923 2000.

  • [3] Miguel E Andrés Nicolás E Bordenabe Konstantinos Chatzikokolakis and Catuscia Palamidessi. Geoindistinguishability: Differential privacy for location-based systems. In Proceedings of the 2013 ACM SIGSAC conference on Computer & communications security pages 901–914. ACM 2013.

  • [4] Borja Balle and Yu-Xiang Wang. Improving the gaussian mechanism for differential privacy: Analytical calibration and optimal denoising. In International Conference on Machine Learning (ICML) 2018.

  • [5] Vincent Bindschaedler Reza Shokri and Carl A Gunter. Plausible deniability for privacy-preserving data synthesis. Proceedings of the VLDB Endowment 10(5):481–492 2017.

  • [6] Yang Cao Masatoshi Yoshikawa Yonghui Xiao and Li Xiong. Quantifying differential privacy under temporal correlations. In Data Engineering (ICDE) 2017 IEEE 33rd International Conference on pages 821–832. IEEE 2017.

  • [7] Thee Chanyaswad Alex Dytso H Vincent Poor and Prateek Mittal. Mvg mechanism: Differential privacy under matrixvalued query. In Proceedings of the 25nd ACM SIGSAC Conference on Computer and Communications Security. ACM 2018.

  • [8] Rui Chen Benjamin C Fung Philip S Yu and Bipin C Desai. Correlated network data publication via differential privacy. volume 23 pages 653–676. Springer-Verlag New York Inc. 2014.

  • [9] Thomas M Cover and Joy A Thomas. Elements of information theory. John Wiley & Sons 2012.

  • [10] Paul Cuff and Lanqing Yu. Differential privacy as a mutual information constraint. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security pages 43–54. ACM 2016.

  • [11] Jesse Davis and Mark Goadrich. The relationship between precision-recall and roc curves. In Proceedings of the 23rd international conference on Machine learning pages 233–240. ACM 2006.

  • [12] Zeyu Ding Yuxin Wang Guanhong Wang Danfeng Zhang and Daniel Kifer. Detecting violations of differential privacy. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security pages 475–489. ACM 2018.

  • [13] Cynthia Dwork. Differential privacy. In Automata languages and programming. 2006.

  • [14] Cynthia Dwork. Differential privacy: A survey of results. In Theory and Applications of Models of Computation. 2008.

  • [15] Cynthia Dwork. A firm foundation for private data analysis. Communications of the ACM 2011.

  • [16] Cynthia Dwork Krishnaram Kenthapadi Frank McSherry Ilya Mironov and Moni Naor. Our data ourselves: Privacy via distributed noise generation. In Annual International Conference on the Theory and Applications of Cryptographic Techniques pages 486–503. Springer 2006.

  • [17] Cynthia Dwork Frank McSherry Kobbi Nissim and Adam Smith. Calibrating noise to sensitivity in private data analysis. In Springer Theory of cryptography. 2006.

  • [18] Cynthia Dwork Aaron Roth et al. The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3–4):211–407 2014.

  • [19] Cynthia Dwork and Guy N Rothblum. Concentrated differential privacy. arXiv preprint arXiv:1603.01887 2016.

  • [20] Cynthia Dwork and Adam Smith. Differential privacy for statistics: What we know and what we want to learn. Journal of Privacy and Confidentiality 2010.

  • [21] Marco Gaboardi Hyun-Woo Lim Ryan M Rogers and Salil P Vadhan. Differentially private chi-squared hypothesis testing: Goodness of fit and independence testing. In ICML’16 Proceedings of the 33rd International Conference on International Conference on Machine Learning-Volume 48. JMLR 2016.

  • [22] Srivatsava Ranjit Ganta Shiva Prasad Kasiviswanathan and Adam Smith. Composition attacks and auxiliary information in data privacy. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining pages 265–273. ACM 2008.

  • [23] Quan Geng Wei Ding Ruiqi Guo and Sanjiv Kumar. Optimal Noise-Adding Mechanism in Additive Differential Privacy. In Proceedings of the 22th International Conference on Artificial Intelligence and Statistics (AISTATS) 2019.

  • [24] Arpita Ghosh and Robert Kleinberg. Inferential privacy guarantees for differentially private mechanisms. arXiv preprint arXiv:1603.01508 2016.

  • [25] Dorothy M Greig Bruce T Porteous and Allan H Seheult. Exact maximum a posteriori estimation for binary images. Journal of the Royal Statistical Society. Series B (Methodological) pages 271–279 1989.

  • [26] Andreas Haeberlen Benjamin C Pierce and Arjun Narayan. Differential privacy under fire. In USENIX Security Symposium 2011.

  • [27] Rob Hall Alessandro Rinaldo and Larry Wasserman. Differential privacy for functions and functional data. Journal of Machine Learning Research 14(Feb):703–727 2013.

  • [28] Xi He Ashwin Machanavajjhala and Bolin Ding. Blowfish privacy: Tuning privacy-utility trade-offs using policies. In Proceedings of the 2014 ACM SIGMOD international conference on Management of data pages 1447–1458. ACM 2014.

  • [29] Justin Hsu Marco Gaboardi Andreas Haeberlen Sanjeev Khanna Arjun Narayan Benjamin C Pierce and Aaron Roth. Differential privacy: An economic method for choosing epsilon. In Computer Security Foundations Symposium (CSF) 2014 IEEE 27th pages 398–410. IEEE 2014.

  • [30] Peter Kairouz Sewoong Oh and Pramod Viswanath. The composition theorem for differential privacy. IEEE Transactions on Information Theory 63(6):4037–4049 2017.

  • [31] Shiva P Kasiviswanathan and Adam Smith. On the’semantics’ of differential privacy: A bayesian formulation. Journal of Privacy and Confidentiality 6(1) 2014.

  • [32] Daniel Kifer and Ashwin Machanavajjhala. No free lunch in data privacy. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of data pages 193–204. ACM 2011.

  • [33] Daniel Kifer and Ashwin Machanavajjhala. A rigorous and customizable framework for privacy. In Proceedings of the 31st ACM SIGMOD-SIGACT-SIGAI symposium on Principles of Database Systems pages 77–88. ACM 2012.

  • [34] Sara Krehbiel. Markets for database privacy. 2014.

  • [35] Jaewoo Lee and Chris Clifton. How much is enough? choosing ε for differential privacy. In International Conference on Information Security pages 325–340. Springer 2011.

  • [36] Jaewoo Lee and Chris Clifton. Differential identifiability. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining pages 1041–1049. ACM 2012.

  • [37] Erich L Lehmann and Joseph P Romano. Testing statistical hypotheses. Springer Science & Business Media 2006.

  • [38] Ninghui Li Wahbeh Qardaji Dong Su Yi Wu and Weining Yang. Membership privacy: a unifying framework for privacy definitions. In Proceedings of the 2013 ACM SIGSAC conference on Computer & communications security pages 889–900. ACM 2013.

  • [39] Changchang Liu Supriyo Chakraborty and Prateek Mittal. Dependence makes you vulnerable: Differential privacy under dependent tuples. In The Network and Distributed System Security Symposium (NDSS) 2016.

  • [40] Ashwin Machanavajjhala Xi He and Michael Hay. Differential privacy in the wild: A tutorial on current practices & open challenges. In Proceedings of the 2017 ACM International Conference on Management of Data pages 1727–1730. ACM 2017.

  • [41] Frank D McSherry. Privacy integrated queries: an extensible platform for privacy-preserving data analysis. In Proceedings of the 2009 ACM SIGMOD International Conference on Management of data pages 19–30. ACM 2009.

  • [42] Sebastian Meiser and Esfandiar Mohammadi. Tight on budget?: Tight bounds for r-fold approximate differential privacy. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security pages 247–264. ACM 2018.

  • [43] Deepak K Merchant and George L Nemhauser. Optimality conditions for a dynamic traffic assignment model. Transportation Science 12(3):200–207 1978.

  • [44] Ilya Mironov. Renyi differential privacy. In Computer Security Foundations Symposium (CSF) 2017 IEEE 30th pages 263–275. IEEE 2017.

  • [45] Whitney K Newey and Daniel McFadden. Large sample estimation and hypothesis testing. Handbook of econometrics 4:2111–2245 1994.

  • [46] J Neyman and ES Pearson. On the problem of the most efficient tests of statistical hypotheses. Phil. Trans. R. Soc. Lond pages 289–337 1933.

  • [47] Jerzy Neyman and Egon S Pearson. On the use and interpretation of certain test criteria for purposes of statistical inference: Part i. Biometrika pages 175–240 1928.

  • [48] Ryan Rogers Aaron Roth Adam Smith and Om Thakkar. Max-information differential privacy and post-selection hypothesis testing. arXiv preprint arXiv:1604.03924 2016.

  • [49] Albert Satorra and Willem E Saris. Power of the likelihood ratio test in covariance structure analysis. Psychometrika 50(1):83–90 1985.

  • [50] Lawrence A Shepp and Yehuda Vardi. Maximum likelihood reconstruction for emission tomography. IEEE transactions on medical imaging 1(2):113–122 1982.

  • [51] David Sommer Sebastian Meiser and Esfandiar Mohammadi. Privacy loss classes: The central limit theorem in differential privacy. Proceedings on privacy enhancing technologies 2019.

  • [52] Shuang Song Yizhen Wang and Kamalika Chaudhuri. Pufferfish privacy mechanisms for correlated data. In Proceedings of the 2017 ACM International Conference on Management of Data pages 1291–1306. ACM 2017.

  • [53] Jun Tang Aleksandra Korolova Xiaolong Bai Xueqiang Wang and Xiaofeng Wang. Privacy loss in apple’s implementation of differential privacy on macos 10.12. arXiv preprint arXiv:1709.02753 2017.

  • [54] Michael Carl Tschantz Shayak Sen and Anupam Datta. Differential privacy as a causal property. arXiv preprint arXiv:1710.05899 2017.

  • [55] Yiannis Tsiounis and Moti Yung. On the security of elgamal based encryption. In International Workshop on Public Key Cryptography pages 117–134. Springer 1998.

  • [56] Cornelis Joost van Rijsbergen. Information retrieval. In Butterworth-Heinemann Newton MA USA 1979.

  • [57] Quang H Vuong. Likelihood ratio tests for model selection and non-nested hypotheses. Econometrica: Journal of the Econometric Society pages 307–333 1989.

  • [58] Yue Wang Jaewoo Lee and Daniel Kifer. Differentially private hypothesis testing revisited. ArXiv e-prints 2015.

  • [59] Stanley L Warner. Randomized response: A survey technique for eliminating evasive answer bias. Journal of the American Statistical Association 60(309):63–69 1965.

  • [60] Larry Wasserman and Shuheng Zhou. A statistical framework for differential privacy. Journal of the American Statistical Association 105(489):375–389 2010.

  • [61] Rand R Wilcox. Introduction to robust estimation and hypothesis testing. Academic press 2011.

  • [62] Xiaotong Wu Taotao Wu Maqbool Khan Qiang Ni and Wanchun Dou. Game theory based correlated privacy preserving analysis in big data. IEEE Transactions on Big Data 2017.

  • [63] Yonghui Xiao and Li Xiong. Protecting locations with differential privacy under temporal correlations. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security pages 1298–1309. ACM 2015.

  • [64] Bin Yang Issei Sato and Hiroshi Nakagawa. Bayesian differential privacy on correlated data. In Proceedings of the 2015 ACM SIGMOD international conference on Management of Data pages 747–762. ACM 2015.

  • [65] Tianqing Zhu Ping Xiong Gang Li and Wanlei Zhou. Correlated differential privacy: Hiding information in non-iid dataset. Information Forensics and Security IEEE Transactions on 2013.

Search
Journal information
Metrics
All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 97 97 47
PDF Downloads 67 67 32