Averaging Attacks on Bounded Noise-based Disclosure Control Algorithms

Hassan Jameel Asghar 1  and Dali Kaafar 2
  • 1 Macquarie University, , Australia
  • 2 Macquarie University, , Australia


We describe and evaluate an attack that reconstructs the histogram of any target attribute of a sensitive dataset which can only be queried through a specific class of real-world privacy-preserving algorithms which we call bounded perturbation algorithms. A defining property of such an algorithm is that it perturbs answers to the queries by adding zero-mean noise distributed within a bounded (possibly undisclosed) range. Other key properties of the algorithm include only allowing restricted queries (enforced via an online interface), suppressing answers to queries which are only satisfied by a small group of individuals (e.g., by returning a zero as an answer), and adding the same perturbation to two queries which are satisfied by the same set of individuals (to thwart differencing or averaging attacks). A real-world example of such an algorithm is the one deployed by the Australian Bureau of Statistics’ (ABS) online tool called TableBuilder, which allows users to create tables, graphs and maps of Australian census data [30]. We assume an attacker (say, a curious analyst) who is given oracle access to the algorithm via an interface. We describe two attacks on the algorithm. Both attacks are based on carefully constructing (different) queries that evaluate to the same answer. The first attack finds the hidden perturbation parameter r (if it is assumed not to be public knowledge). The second attack removes the noise to obtain the original answer of some (counting) query of choice. We also show how to use this attack to find the number of individuals in the dataset with a target attribute value a of any attribute A, and then for all attribute values aiA. None of the attacks presented here depend on any background information. Our attacks are a practical illustration of the (informal) fundamental law of information recovery which states that “overly accurate estimates of too many statistics completely destroys privacy” [9, 15].

If the inline PDF is not rendering correctly, you can download the PDF file here.

  • [1] Aircloak. Fix for mit/georgetown univ attack on diffix. https://aircloak.com/fix-for-the-mit-georgetown-univattack-on-diffix, 2018.

  • [2] K Andersson, I Jansson, and K Kraft. Protection of frequency tables–current work at statistics sweden. Joint UNECE/Eurostat Work Session on Statistical Data Confidentiality, Helsinki, Finland, 5, 2015.

  • [3] Hassan Jameel Asghar, Paul Tyler, and Mohamed Ali Kaafar. Differentially private release of public transport data: The opal use case. arXiv preprint arXiv:1705.05957, 2017.

  • [4] Boaz Barak, Kamalika Chaudhuri, Cynthia Dwork, Satyen Kale, Frank McSherry, and Kunal Talwar. Privacy, accuracy, and consistency too: a holistic solution to contingency table release. In Proceedings of the twenty-sixth ACM SIGMOD-SIGACT- SIGART symposium on Principles of database systems, pages 273–282. ACM, 2007.

  • [5] Christopher M Bishop. Pattern recognition and machine learning. springer, 2006.

  • [6] James Chipperfield, Daniel Gow, and Bronwyn Loong. The australian bureau of statistics and releasing frequency tables via a remote server. Statistical Journal of the IAOS, 32(1):53–64, 2016.

  • [7] Aloni Cohen and Kobbi Nissim. Linear program reconstruction in practice. arXiv preprint arXiv:1810.05692, 2018.

  • [8] Chris Culnane, Benjamin IP Rubinstein, and Vanessa Teague. Privacy assessment of de-identified opal data: A report for transport for nsw. arXiv preprint arXiv:1704.08547, 2017.

  • [9] Irit Dinur and Kobbi Nissim. Revealing information while preserving privacy. In Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pages 202–210. ACM, 2003.

  • [10] Dheeru Dua and Casey Graff. UCI machine learning repository, 2017.

  • [11] Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. Calibrating noise to sensitivity in private data analysis. In Theory of cryptography conference, pages 265–284. Springer, 2006.

  • [12] Cynthia Dwork, Frank McSherry, and Kunal Talwar. The price of privacy and the limits of lp decoding. In Proceedings of the thirty-ninth annual ACM symposium on Theory of computing, pages 85–94. ACM, 2007.

  • [13] Cynthia Dwork, Moni Naor, and Salil Vadhan. The privacy of the analyst and the power of the state. In 2012 IEEE 53rd Annual Symposium on Foundations of Computer Science, pages 400–409. IEEE, 2012.

  • [14] Cynthia Dwork, Aaron Roth, et al. The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science, 9(3–4):211–407, 2014.

  • [15] Cynthia Dwork and Guy N Rothblum. Concentrated differential privacy. arXiv preprint arXiv:1603.01887, 2016.

  • [16] Cynthia Dwork, Adam Smith, Thomas Steinke, and Jonathan Ullman. Exposed! a survey of attacks on private data. Annual Review of Statistics and Its Application, 4:61–84, 2017.

  • [17] Paul Francis, Sebastian Probst Eide, and Reinhard Munz. Diffix: High-utility database anonymization. In Annual Privacy Forum, pages 141–158. Springer, 2017.

  • [18] Paul Francis, Sebastian Probst-Eide, Pawel Obrok, Cristian Berneanu, Sasa Juric, and Reinhard Munz. Diffix-birch: Extending diffix-aspen. arXiv preprint arXiv:1806.02075, 2018.

  • [19] Bruce Fraser and Janice Wooton. A proposed method for confidentialising tabular output to protect against differencing. Monographs of Official Statistics. Work session on Statistical Data Confidentiality, pages 299–302, 2005.

  • [20] Bruce Fraser and Janice Wooton. A proposed method for confidentialising tabular output to protect against differencing. Monographs of Official Statistics: Work Session on Statistical Data Confidentiality, pages 299–302, 2005.

  • [21] Andrea Gadotti, Florimond Houssiau, Luc Rocher, Ben Livshits, and Yves-Alexandre de Montjoye. When the signal is in the noise: Exploiting diffix’s sticky noise. In USENIX Security Conference Proceedings, 2019.

  • [22] Simson Garfinkel, John M. Abowd, and Christian Martin-dale. Understanding database reconstruction attacks on public data. Queue, 16(5):50:28–50:53, October 2018.

  • [23] Michael Hay, Vibhor Rastogi, Gerome Miklau, and Dan Suciu. Boosting the accuracy of differentially private histograms through consistency. Proceedings of the VLDB Endowment, 3(1-2):1021–1032, 2010.

  • [24] Shiva Prasad Kasiviswanathan, Mark Rudelson, and Adam Smith. The power of linear reconstruction attacks. In Proceedings of the twenty-fourth annual ACM-SIAM symposium on Discrete algorithms, pages 1415–1433. Society for Industrial and Applied Mathematics, 2013.

  • [25] Jon Kleinberg, Christos Papadimitriou, and Prabhakar Raghavan. Auditing boolean attributes. Journal of Computer and System Sciences, 66(1):244–253, 2003.

  • [26] Jennifer Marley and Victoria Leaver. A method for confidentialising user-defined tables: statistical properties and a risk-utility analysis. In Proceedings of the 58th Congress of the International Statistical Institute, ISI, pages 21–26, 2011.

  • [27] Yosef Rinott, Christine M O’Keefe, Natalie Shlomo, Chris Skinner, et al. Confidentiality and differential privacy in the dissemination of frequency tables. Statistical Science, 33(3):358–385, 2018.

  • [28] Thomas Steinke and Jonathan Ullman. Between pure and approximate differential privacy. Journal of Privacy and Confidentiality, 7(2):3–22, 2016.

  • [29] Latanya Sweeney. Simple demographics often identify people uniquely. Health (San Francisco), 671:1–34, 2000.

  • [30] G Thompson, S Broadfoot, and D Elazar. Methodology for automatic confidentialisation of statistical outputs from remote servers at the australian bureau of statistics. Joint UNECE/Eurostat Work Session on Statistical Data Confidentiality, pages 28–30, 2013.

  • [31] Salil Vadhan. The complexity of differential privacy. In Tutorials on the Foundations of Cryptography, pages 347–450. Springer, 2017.


Journal + Issues