Quantifying the privacy loss of a privacy-preserving mechanism on potentially sensitive data is a complex and well-researched topic; the de-facto standard for privacy measures are ε-differential privacy (DP) and its versatile relaxation (ε, δ)-approximate differential privacy (ADP). Recently, novel variants of (A)DP focused on giving tighter privacy bounds under continual observation. In this paper we unify many previous works via the privacy loss distribution (PLD) of a mechanism. We show that for non-adaptive mechanisms, the privacy loss under sequential composition undergoes a convolution and will converge to a Gauss distribution (the central limit theorem for DP). We derive several relevant insights: we can now characterize mechanisms by their privacy loss class, i.e., by the Gauss distribution to which their PLD converges, which allows us to give novel ADP bounds for mechanisms based on their privacy loss class; we derive exact analytical guarantees for the approximate randomized response mechanism and an exact analytical and closed formula for the Gauss mechanism, that, given ε, calculates δ, s.t., the mechanism is (ε, δ)-ADP (not an over-approximating bound).
Human mobility is often represented as a mobility network, or graph, with nodes representing places of significance which an individual visits, such as their home, work, places of social amenity, etc., and edge weights corresponding to probability estimates of movements between these places. Previous research has shown that individuals can be identified by a small number of geolocated nodes in their mobility network, rendering mobility trace anonymization a hard task. In this paper we build on prior work and demonstrate that even when all location and timestamp information is removed from nodes, the graph topology of an individual mobility network itself is often uniquely identifying. Further, we observe that a mobility network is often unique, even when only a small number of the most popular nodes and edges are considered. We evaluate our approach using a large dataset of cell-tower location traces from 1 500 smartphone handsets with a mean duration of 430 days. We process the data to derive the top−N places visited by the device in the trace, and find that 93% of traces have a unique top−10 mobility network, and all traces are unique when considering top−15 mobility networks. Since mobility patterns, and therefore mobility networks for an individual, vary over time, we use graph kernel distance functions, to determine whether two mobility networks, taken at different points in time, represent the same individual. We then show that our distance metrics, while imperfect predictors, perform significantly better than a random strategy and therefore our approach represents a significant loss in privacy.
Recent work has shown that Tor is vulnerable to attacks that manipulate inter-domain routing to compromise user privacy. Proposed solutions such as Counter-RAPTOR  attempt to ameliorate this issue by favoring Tor entry relays that have high resilience to these attacks. However, because these defenses bias Tor path selection on the identity of the client, they invariably leak probabilistic information about client identities. In this work, we make the following contributions. First, we identify a novel means to quantify privacy leakage in guard selection algorithms using the metric of Max-Divergence. Max-Divergence ensures that probabilistic privacy loss is within strict bounds while also providing composability over time. Second, we utilize Max-Divergence and multiple notions of entropy to understand privacy loss in the worst-case for Counter-RAPTOR. Our worst-case analysis provides a fresh perspective to the field, as prior work such as Counter-RAPTOR only analyzed average case-privacy loss. Third, we propose modifications to Counter-RAPTOR that incorporate worst-case Max-Divergence in its design. Specifically, we utilize the exponential mechanism (a mechanism for differential privacy) to guarantee a worst-case bound on Max-Divergence/privacy loss. For the quality function used in the exponential mechanism, we show that a Monte-Carlo sampling-based method for stochastic optimization can be used to improve multi-dimensional trade-offs between security, privacy, and performance. Finally, we demonstrate that compared to Counter-RAPTOR, our approach achieves an 83% decrease in Max-Divergence after one guard selection and a 245% increase in worst-case Shannon entropy after 5 guard selections. Notably, experimental evaluations using the Shadow emulator shows that our approach provides these privacy benefits with minimal impact on system performance.
Information about people’s movements and the locations they visit enables an increasing number of mobility analytics applications, e.g., in the context of urban and transportation planning, In this setting, rather than collecting or sharing raw data, entities often use aggregation as a privacy protection mechanism, aiming to hide individual users’ location traces. Furthermore, to bound information leakage from the aggregates, they can perturb the input of the aggregation or its output to ensure that these are differentially private.
In this paper, we set to evaluate the impact of releasing aggregate location time-series on the privacy of individuals contributing to the aggregation. We introduce a framework allowing us to reason about privacy against an adversary attempting to predict users’ locations or recover their mobility patterns. We formalize these attacks as inference problems, and discuss a few strategies to model the adversary’s prior knowledge based on the information she may have access to. We then use the framework to quantify the privacy loss stemming from aggregate location data, with and without the protection of differential privacy, using two real-world mobility datasets. We find that aggregates do leak information about individuals’ punctual locations and mobility profiles. The density of the observations, as well as timing, play important roles, e.g., regular patterns during peak hours are better protected than sporadic movements. Finally, our evaluation shows that both output and input perturbation offer little additional protection, unless they introduce large amounts of noise ultimately destroying the utility of the data.
Cardinality estimators like HyperLogLog are sketching algorithms that estimate the number of distinct elements in a large multiset. Their use in privacy-sensitive contexts raises the question of whether they leak private information. In particular, can they provide any privacy guarantees while preserving their strong aggregation properties?
We formulate an abstract notion of cardinality estimators, that captures this aggregation requirement: one can merge sketches without losing precision. We propose an attacker model and a corresponding privacy definition, strictly weaker than differential privacy: we assume that the attacker has no prior knowledge of the data. We then show that if a cardinality estimator satisfies this definition, then it cannot have a reasonable level of accuracy. We prove similar results for weaker versions of our definition, a nd a nalyze t he p rivacy o f existing algorithms, showing that their average privacy loss is significant, e ven f or m ultisets w ith l arge cardinalities. We conclude that strong aggregation requirements are incompatible with any reasonable definition o f privacy, and that cardinality estimators should be considered as sensitive as raw data. We also propose risk mitigation strategies for their real-world applications.
Machine learning algorithms have reached mainstream status and are widely deployed in many applications. The accuracy of such algorithms depends significantly on the size of the underlying training dataset; in reality a small or medium sized organization often does not have the necessary data to train a reasonably accurate model. For such organizations, a realistic solution is to train their machine learning models based on their joint dataset (which is a union of the individual ones). Unfortunately, privacy concerns prevent them from straightforwardly doing so. While a number of privacy-preserving solutions exist for collaborating organizations to securely aggregate the parameters in the process of training the models, we are not aware of any work that provides a rational framework for the participants to precisely balance the privacy loss and accuracy gain in their collaboration.
In this paper, by focusing on a two-player setting, we model the collaborative training process as a two-player game where each player aims to achieve higher accuracy while preserving the privacy of its own dataset. We introduce the notion of Price of Privacy, a novel approach for measuring the impact of privacy protection on the accuracy in the proposed framework. Furthermore, we develop a game-theoretical model for different player types, and then either find or prove the existence of a Nash Equilibrium with regard to the strength of privacy protection for each player. Using recommendation systems as our main use case, we demonstrate how two players can make practical use of the proposed theoretical framework, including setting up the parameters and approximating the non-trivial Nash Equilibrium.
likelihood reconstruction for emission tomography. IEEE transactions on medical imaging , 1(2):113–122, 1982.  David Sommer, Sebastian Meiser, and Esfandiar Mohammadi. Privacyloss classes: The central limit theorem in differential privacy. Proceedings on privacy enhancing technologies , 2019.  Shuang Song, Yizhen Wang, and Kamalika Chaudhuri. Pufferfish privacy mechanisms for correlated data. In Proceedings of the 2017 ACM International Conference on Management of Data , pages 1291–1306. ACM, 2017.  Jun Tang, Aleksandra Korolova, Xiaolong Bai, Xueqiang Wang
-70.  B. Krishnamurthy, D. Malandrino, and C. E. Wills, “Measuring PrivacyLoss and the Impact of Privacy Protection in Web Browsing,” in Proceedings of the 3rd Symposium on Usable Privacy and Security. ACM, 2007, pp. 52-63.  B. Krishnamurthy and C. Wills, “Privacy diffusion on the Web: a longitudinal perspective,” in Proceedings of the 18th International Conference on World Wide Web. ACM, 2009, pp. 541-550.  “Alexa Top Sites,” http://www.alexa.com/.  “Selenium WebDriver,” http://www.seleniumhq.org.  “Google Play Unofficial Python API,” https
you will do next summer. SIGCOMM Comput. Commun. Rev. , 40(5):65–70, 2010.  B. Krishnamurthy, D. Malandrino, and C. E. Wills. Measuring privacyloss and the impact of privacy protection in web browsing. In Proc. 3rd Symp. on Usable Privacy and Security (SOUPS ’07) , pages 52–63, 2007.  B. Krishnamurthy, K. Naryshkin, and C. E. Wills. Privacy leakage vs. protection measures: the growing disconnect. In Proc. Web 2.0 Security and Privacy Workshop , 2011.  T. Libert. Privacy implications of health information seeking on the web. Commun. ACM , 58