The Power of the Hybrid Model for Mean Estimation

Brendan Avent 1 , Yatharth Dubey 2 ,  and Aleksandra Korolova 3
  • 1 University of Southern California,
  • 2 Georgia Institute of Technology†,
  • 3 University of Southern California,

Abstract

We explore the power of the hybrid model of differential privacy (DP), in which some users desire the guarantees of the local model of DP and others are content with receiving the trusted-curator model guarantees. In particular, we study the utility of hybrid model estimators that compute the mean of arbitrary realvalued distributions with bounded support. When the curator knows the distribution’s variance, we design a hybrid estimator that, for realistic datasets and parameter settings, achieves a constant factor improvement over natural baselines.We then analytically characterize how the estimator’s utility is parameterized by the problem setting and parameter choices. When the distribution’s variance is unknown, we design a heuristic hybrid estimator and analyze how it compares to the baselines. We find that it often performs better than the baselines, and sometimes almost as well as the known-variance estimator. We then answer the question of how our estimator’s utility is affected when users’ data are not drawn from the same distribution, but rather from distributions dependent on their trust model preference. Concretely, we examine the implications of the two groups’ distributions diverging and show that in some cases, our estimators maintain fairly high utility. We then demonstrate how our hybrid estimator can be incorporated as a sub-component in more complex, higher-dimensional applications. Finally, we propose a new privacy amplification notion for the hybrid model that emerges due to interaction between the groups, and derive corresponding amplification results for our hybrid estimators.

If the inline PDF is not rendering correctly, you can download the PDF file here.

  • [1] M Abramowitz and IA Stegun. Handbook of Mathematical Functions. US Government Printing Office, Washington, DC, 7th edition, 1968.

  • [2] Jayadev Acharya, Gautam Kamath, Ziteng Sun, and Huanyu Zhang. Inspectre: Privately estimating the unseen. In International Conference on Machine Learning, pages 30–39, 2018.

  • [3] Brendan Avent, Aleksandra Korolova, David Zeber, Torgeir Hovden, and Benjamin Livshits. BLENDER: Enabling local search with a hybrid differential privacy model. In 26th USENIX Security Symposium (USENIX Security 17), pages 747–764. USENIX Association, 2017.

  • [4] Maria-Florina Balcan, Travis Dick, Yingyu Liang, Wenlong Mou, and Hongyang Zhang. Differentially private clustering in high-dimensional euclidean spaces. In Proceedings of the 34th International Conference on Machine Learning (ICML), volume 70, pages 322–331, 2017.

  • [5] Victor Balcer and Albert Cheu. Separating local & shuffled differential privacy via histograms. In 1st Conference on Information-Theoretic Cryptography (ITC 2020), volume 163 of Leibniz International Proceedings in Informatics (LIPIcs), pages 1:1–1:14, 2020.

  • [6] Borja Balle, James Bell, Adria Gascon, and Kobbi Nissim. Differentially private summation with multi-message shuffling. arXiv preprint arXiv:1906.09116, 2019.

  • [7] Borja Balle, James Bell, Adrià Gascón, and Kobbi Nissim. The privacy blanket of the shuffle model. In Annual International Cryptology Conference, pages 638–667. Springer, 2019.

  • [8] Artem Barger and Dan Feldman. k-means for streaming and distributed big sparse data. In Proceedings of the 2016 SIAM International Conference on Data Mining, pages 342–350. SIAM, 2016.

  • [9] Raef Bassily, Albert Cheu, Shay Moran, Aleksandar Nikolov, Jonathan Ullman, and Zhiwei Steven Wu. Private query release assisted by public data. arXiv preprint arXiv:2004.10941, 2020.

  • [10] Raef Bassily, Shay Moran, and Noga Alon. Limits of private learning with access to public data. In Advances in Neural Information Processing Systems, pages 10342–10352, 2019.

  • [11] Raef Bassily and Adam Smith. Local, private, efficient protocols for succinct histograms. In Proceedings of the Symposium on Theory of Computing (STOC), pages 127–135, 2015.

  • [12] Amos Beimel, Aleksandra Korolova, Kobbi Nissim, Or Sheffet, and Uri Stemmer. The Power of Synergy in Differential Privacy: Combining a Small Curator with Local Randomizers. In 1st Conference on Information-Theoretic Cryptography (ITC 2020), volume 163 of Leibniz International Proceedings in Informatics (LIPIcs), pages 14:1–14:25, 2020.

  • [13] Amos Beimel, Kobbi Nissim, and Uri Stemmer. Private learning and sanitization: Pure vs. approximate differential privacy. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, pages 363–378. Springer, 2013.

  • [14] Patrick Billingsley. Probability and measure. John Wiley & Sons, 2008.

  • [15] Sourav Biswas, Yihe Dong, Gautam Kamath, and Jonathan Ullman. Coinpress: Practical private mean and covariance estimation. arXiv preprint arXiv:2006.06618, 2020.

  • [16] Andrea Bittau, Ulfar Erlingsson, Petros Maniatis, Ilya Mironov, Ananth Raghunathan, David Lie, Mitch Rudominer, Ushasree Kode, Julien Tinnes, and Bernhard Seefeld. Prochlo: Strong privacy for analytics in the crowd. In Proceedings of the 26th Symposium on Operating Systems Principles, pages 441–459. ACM, 2017.

  • [17] Avrim Blum, Cynthia Dwork, Frank McSherry, and Kobbi Nissim. Practical privacy: the SuLQ framework. In Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pages 128–138, 2005.

  • [18] Mark Bun and Thomas Steinke. Average-case averages: Private algorithms for smooth sensitivity and mean estimation. In Advances in Neural Information Processing Systems, pages 181–191, 2019.

  • [19] Albert Cheu, Adam Smith, Jonathan Ullman, David Zeber, and Maxim Zhilyaev. Distributed differential privacy via shuffling. In Annual International Conference on the Theory and Applications of Cryptographic Techniques, pages 375–403. Springer, 2019.

  • [20] Wenxin Du, Canyon Foot, Monica Moniot, Andrew Bray, and Adam Groce. Differentially private confidence intervals. arXiv preprint arXiv:2001.02285, 2020.

  • [21] John C Duchi, Michael I Jordan, and Martin J Wainwright. Local privacy and statistical minimax rates. In 2013 IEEE 54th Annual Symposium on Foundations of Computer Science, pages 429–438. IEEE, 2013.

  • [22] John C. Duchi, Michael I. Jordan, and Martin J. Wainwright. Minimax optimal procedures for locally private estimation. Journal of the American Statistical Association, 113(521):182–201, 2018.

  • [23] Cynthia Dwork. A firm foundation for private data analysis. Communications of the ACM, 54(1):86–95, 2011.

  • [24] Cynthia Dwork, Krishnaram Kenthapadi, Frank McSherry, Ilya Mironov, and Moni Naor. Our data, ourselves: Privacy via distributed noise generation. In Annual International Conference on the Theory and Applications of Cryptographic Techniques, pages 486–503. Springer, 2006.

  • [25] Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. Calibrating noise to sensitivity in private data analysis. In Shai Halevi and Tal Rabin, editors, Theory of Cryptography, pages 265–284, Berlin, Heidelberg, 2006. Springer Berlin Heidelberg.

  • [26] Úlfar Erlingsson, Vitaly Feldman, Ilya Mironov, Ananth Raghunathan, Kunal Talwar, and Abhradeep Thakurta. Amplification by shuffling: From local to central differential privacy via anonymity. In Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 2468–2479. SIAM, 2019.

  • [27] Giulia Fanti, Vasyl Pihur, and Úlfar Erlingsson. Building a RAPPOR with the unknown: Privacy-preserving learning of associations and data dictionaries. Proceedings on Privacy Enhancing Technologies (PETS), 3:41–61, 2016.

  • [28] Vitaly Feldman. Dealing with range anxiety in mean estimation via statistical queries. In International Conference on Algorithmic Learning Theory, pages 629–640, 2017.

  • [29] Marco Gaboardi, Ryan Rogers, and Or Sheffet. Locally private mean estimation: z-test and tight confidence intervals. In Kamalika Chaudhuri and Masashi Sugiyama, editors, Proceedings of Machine Learning Research, volume 89, pages 2545–2554. PMLR, 2019.

  • [30] Badih Ghazi, Noah Golowich, Ravi Kumar, Pasin Manurangsi, Rasmus Pagh, and Ameya Velingker. Pure differentially private summation from anonymous messages. arXiv preprint arXiv:2002.01919, 2020.

  • [31] Badih Ghazi, Noah Golowich, Ravi Kumar, Rasmus Pagh, and Ameya Velingker. On the power of multiple anonymous messages. arXiv preprint arXiv:1908.11358, 2019.

  • [32] Badih Ghazi, Pasin Manurangsi, Rasmus Pagh, and Ameya Velingker. Private aggregation from fewer anonymous messages. In Annual International Conference on the Theory and Applications of Cryptographic Techniques, pages 798–827. Springer, 2020.

  • [33] Badih Ghazi, Rasmus Pagh, and Ameya Velingker. Scalable and differentially private distributed aggregation in the shuffled model. arXiv preprint arXiv:1906.08320, 2019.

  • [34] Andy Greenberg. Apple’s differential privacy is about collecting your data – but not your data. In Wired, June 13, 2016.

  • [35] Jihun Hamm, Yingjun Cao, and Mikhail Belkin. Learning privately from multiparty data. In International Conference on Machine Learning, pages 555–563, 2016.

  • [36] Zhanglong Ji and Charles Elkan. Differential privacy based on importance weighting. Machine learning, 93(1):163–183, 2013.

  • [37] Matthew Joseph, Janardhan Kulkarni, Jieming Mao, and Steven Z Wu. Locally private gaussian estimation. In Advances in Neural Information Processing Systems (NeurIPS), pages 2980–2989, 2019.

  • [38] Peter Kairouz, Sewoong Oh, and Pramod Viswanath. Extremal mechanisms for local differential privacy. In Advances in Neural Information Processing Systems (NIPS), pages 2879–2887, 2014.

  • [39] Gautam Kamath, Jerry Li, Vikrant Singhal, and Jonathan Ullman. Privately learning high-dimensional distributions. In Alina Beygelzimer and Daniel Hsu, editors, Proceedings of the Thirty-Second Conference on Learning Theory, volume 99, pages 1853–1902. PMLR, 2019.

  • [40] Gautam Kamath, Or Sheffet, Vikrant Singhal, and Jonathan Ullman. Differentially private algorithms for learning mixtures of separated gaussians. In Advances in Neural Information Processing Systems (NeurIPS), pages 168–180, 2019.

  • [41] Gautam Kamath, Vikrant Singhal, and Jonathan Ullman. Private mean estimation of heavy-tailed distributions. arXiv preprint arXiv:2002.09464, 2020.

  • [42] Vishesh Karwa and Salil Vadhan. Finite sample differentially private confidence intervals. In 9th Innovations in Theoretical Computer Science Conference (ITCS 2018). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2018.

  • [43] Zhigang Lu and Hong Shen. Differentially private k-means clustering with guaranteed convergence. arXiv preprint arXiv:2002.01043, 2020.

  • [44] Mary Madden and Lee Rainie. Americans’ attitudes about privacy, security and surveillance. Technical report, Pew Research Center, 2015.

  • [45] Chris Merriman. Microsoft reminds privacy-concerned Windows 10 beta testers that they’re volunteers. In The Inquirer, http://www.theinquirer.net/2374302, Oct 7, 2014.

  • [46] Kobbi Nissim, Sofya Raskhodnikova, and Adam Smith. Smooth sensitivity and sampling in private data analysis. In Proceedings of the thirty-ninth annual ACM symposium on Theory of computing, pages 75–84, 2007.

  • [47] Kobbi Nissim and Uri Stemmer. Clustering algorithms for the centralized and local models. In Firdaus Janoos, Mehryar Mohri, and Karthik Sridharan, editors, Proceedings of Algorithmic Learning Theory, volume 83 of Proceedings of Machine Learning Research, pages 619–653. PMLR, 07–09 Apr 2018.

  • [48] Richard Nock, Raphaël Canyasse, Roksana Boreli, and Frank Nielsen. k-variates++: more pluses in the k-means++. In International Conference on Machine Learning (ICML), pages 145–154, 2016.

  • [49] U. of California. 2010 annual report on emplyee compensation. https://ucnet.universityofcalifornia.edu/compensationand-benefits/compensation/index.html.

  • [50] Nicolas Papernot, Martín Abadi, Úlfar Erlingsson, Ian J. Goodfellow, and Kunal Talwar. Semi-supervised knowledge transfer for deep learning from private training data. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings, 2017.

  • [51] Tiberiu Popoviciu. Sur les équations algébriques ayant toutes leurs racines réelles. Mathematica, 9:129–145, 1935.

  • [52] Uri Stemmer. Locally private k-means clustering. In Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 548–559. SIAM, 2020.

  • [53] Dong Su, Jianneng Cao, Ninghui Li, Elisa Bertino, and Hongxia Jin. Differentially private k-means clustering. In Proceedings of the sixth ACM conference on data and application security and privacy, pages 26–37, 2016.

  • [54] Lin Sun, Jun Zhao, and Xiaojun Ye. Distributed clustering in the anonymized space with local differential privacy. arXiv preprint arXiv:1906.11441, 2019.

  • [55] Di Wang, Huanyu Zhang, Marco Gaboardi, and Jinhui Xu. Estimating smooth GLM in non-interactive local differential privacy model with public unlabeled data. arXiv preprint arXiv:1910.00482, 2019.

  • [56] Stanley L Warner. Randomized response: A survey technique for eliminating evasive answer bias. Journal of the American Statistical Association, 60(309):63–69, 1965.

  • [57] Chang Xia, Jingyu Hua, Wei Tong, and Sheng Zhong. Distributed k-means clustering guaranteeing local differential privacy. Computers & Security, 90:101699, 2020.

  • [58] Sijie Xiong, Anand D Sarwate, and Narayan B Mandayam. Randomized requantization with local differential privacy. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 2189–2193. IEEE, 2016.

OPEN ACCESS

Journal + Issues

Search