Damien Desfontaines, Andreas Lochbihler and David Basin
Cardinality estimators like HyperLogLog are sketching algorithms that estimate the number of distinct elements in a large multiset. Their use in privacy-sensitive contexts raises the question of whether they leak private information. In particular, can they provide any privacy guarantees while preserving their strong aggregation properties?
We formulate an abstract notion of cardinality estimators, that captures this aggregation requirement: one can merge sketches without losing precision. We propose an attacker model and a corresponding privacy definition, strictly weaker than differential privacy: we assume that the attacker has no prior knowledge of the data. We then show that if a cardinality estimator satisfies this definition, then it cannot have a reasonable level of accuracy. We prove similar results for weaker versions of our definition, a nd a nalyze t he p rivacy o f existing algorithms, showing that their average privacy loss is significant, e ven f or m ultisets w ith l arge cardinalities. We conclude that strong aggregation requirements are incompatible with any reasonable definition o f privacy, and that cardinality estimators should be considered as sensitive as raw data. We also propose risk mitigation strategies for their real-world applications.
Alexandra-Mihaela Olteanu, Mathias Humbert, Kévin Huguenin and Jean-Pierre Hubaux
Most popular location-based social networks, such as Facebook and Foursquare, let their (mobile) users post location and co-location (involving other users) information. Such posts bring social benefits to the users who post them but also to their friends who view them. Yet, they also represent a severe threat to the users’ privacy, as co-location information introduces interdependences between users. We propose the first game-theoretic framework for analyzing the strategic behaviors, in terms of information sharing, of users of OSNs. To design parametric utility functions that are representative of the users’ actual preferences, we also conduct a survey of 250 Facebook users and use conjoint analysis to quantify the users’ benefits o f sharing vs. viewing (co)-location information and their preference for privacy vs. benefits. Our survey findings expose the fact that, among the users, there is a large variation, in terms of these preferences. We extensively evaluate our framework through data-driven numerical simulations. We study how users’ individual preferences influence each other’s decisions, we identify several factors that significantly affect these decisions (among which, the mobility data of the users), and we determine situations where dangerous patterns can emerge (e.g., a vicious circle of sharing, or an incentive to over-share) – even when the users share similar preferences.
Hans Hanley, Yixin Sun, Sameer Wagh and Prateek Mittal
Recent work has shown that Tor is vulnerable to attacks that manipulate inter-domain routing to compromise user privacy. Proposed solutions such as Counter-RAPTOR  attempt to ameliorate this issue by favoring Tor entry relays that have high resilience to these attacks. However, because these defenses bias Tor path selection on the identity of the client, they invariably leak probabilistic information about client identities. In this work, we make the following contributions. First, we identify a novel means to quantify privacy leakage in guard selection algorithms using the metric of Max-Divergence. Max-Divergence ensures that probabilistic privacy loss is within strict bounds while also providing composability over time. Second, we utilize Max-Divergence and multiple notions of entropy to understand privacy loss in the worst-case for Counter-RAPTOR. Our worst-case analysis provides a fresh perspective to the field, as prior work such as Counter-RAPTOR only analyzed average case-privacy loss. Third, we propose modifications to Counter-RAPTOR that incorporate worst-case Max-Divergence in its design. Specifically, we utilize the exponential mechanism (a mechanism for differential privacy) to guarantee a worst-case bound on Max-Divergence/privacy loss. For the quality function used in the exponential mechanism, we show that a Monte-Carlo sampling-based method for stochastic optimization can be used to improve multi-dimensional trade-offs between security, privacy, and performance. Finally, we demonstrate that compared to Counter-RAPTOR, our approach achieves an 83% decrease in Max-Divergence after one guard selection and a 245% increase in worst-case Shannon entropy after 5 guard selections. Notably, experimental evaluations using the Shadow emulator shows that our approach provides these privacy benefits with minimal impact on system performance.
Traffic analysis is the process of extracting useful/sensitive information from observed network traffic. Typical use cases include malware detection and website fingerprinting attacks. High accuracy traffic analysis techniques use machine learning algorithms (e.g. SVM, kNN) and require to split the traffic into correctly separated blocks. Inspired by digital forensics techniques, we propose a new network traffic analysis approach based on similarity digest. The approach features several advantages compared to existing techniques, namely, fast signature generation, compact signature representation using Bloom filters, efficient similarity detection between packet traces of arbitrary sizes, and in particular dropping the traffic splitting requirement altogether. Experimental results show very promising results on VPN and malware traffic, but low results on Tor traffic due mainly to the single-size cells feature.
We present the design, implementation and evaluation of a system, called MATRIX, developed to protect the privacy of mobile device users from location inference and sensor side-channel attacks. MATRIX gives users control and visibility over location and sensor (e.g., Accelerometers and Gyroscopes) accesses by mobile apps. It implements a PrivoScope service that audits all location and sensor accesses by apps on the device and generates real-time notifications and graphs for visualizing these accesses; and a Synthetic Location service to enable users to provide obfuscated or synthetic location trajectories or sensor traces to apps they find useful, but do not trust with their private information. The services are designed to be extensible and easy for users, hiding all of the underlying complexity from them. MATRIX also implements a Location Provider component that generates realistic privacy-preserving synthetic identities and trajectories for users by incorporating traffic information using historical data from Google Maps Directions API, and accelerations using statistical information from user driving experiments. These mobility patterns are generated by modeling/solving user schedule using a randomized linear program and modeling/solving for user driving behavior using a quadratic program. We extensively evaluated MATRIX using user studies, popular location-driven apps and machine learning techniques, and demonstrate that it is portable to most Android devices globally, is reliable, has low-overhead, and generates synthetic trajectories that are difficult to differentiate from real mobility trajectories by an adversary.
Georgia Fragkouli, Katerina Argyraki and Bryan Ford
Can we improve Internet transparency without worsening user anonymity? For a long time, researchers have been proposing transparency systems, where traffic reports produced at strategic network points help assess network behavior and verify service-level agreements or neutrality compliance. However, such reports necessarily reveal when certain traffic appeared at a certain network point, and this information could, in principle, be used to compromise low-latency anonymity networks like Tor. In this paper, we examine whether more Internet transparency necessarily means less anonymity. We start from the information that a basic transparency solution would publish about a network and study how that would impact the anonymity of the network’s users. Then we study how to change, in real time, the time granularity of traffic reports in order to preserve both user anonymity and report utility. We evaluate with real and synthetic data and show that our algorithm can offer a good anonymity/utility balance, even in adversarial scenarios where aggregates consist of very few flows.
Paul Schmitt, Anne Edmundson, Allison Mankin and Nick Feamster
Virtually every Internet communication typically involves a Domain Name System (DNS) lookup for the destination server that the client wants to communicate with. Operators of DNS recursive resolvers—the machines that receive a client’s query for a domain name and resolve it to a corresponding IP address—can learn significant information about client activity. Past work, for example, indicates that DNS queries reveal information ranging from web browsing activity to the types of devices that a user has in their home. Recognizing the privacy vulnerabilities associated with DNS queries, various third parties have created alternate DNS services that obscure a user’s DNS queries from his or her Internet service provider. Yet, these systems merely transfer trust to a different third party. We argue that no single party ought to be able to associate DNS queries with a client IP address that issues those queries. To this end, we present Oblivious DNS (ODNS), which introduces an additional layer of obfuscation between clients and their queries. To do so, ODNS uses its own authoritative namespace; the authoritative servers for the ODNS namespace act as recursive resolvers for the DNS queries that they receive, but they never see the IP addresses for the clients that initiated these queries. We present an initial deployment of ODNS; our experiments show that ODNS introduces minimal performance overhead, both for individual queries and for web page loads. We design ODNS to be compatible with existing DNS protocols and infrastructure, and we are actively working on an open standard with the IETF.
Christiane Kuhn, Martin Beck, Stefan Schiffner, Eduard Jorswieck and Thorsten Strufe
Many anonymous communication networks (ACNs) with different privacy goals have been developed. Still, there are no accepted formal definitions of privacy goals, and ACNs often define their goals ad hoc. However, the formal definition of privacy goals benefits the understanding and comparison of different flavors of privacy and, as a result, the improvement of ACNs. In this paper, we work towards defining and comparing privacy goals by formalizing them as privacy notions and identifying their building blocks. For any pair of notions we prove whether one is strictly stronger, and, if so, which. Hence, we are able to present a complete hierarchy. Using this rigorous comparison between notions, we revise inconsistencies between the existing works and improve the understanding of privacy goals.
David M. Sommer, Sebastian Meiser and Esfandiar Mohammadi
Quantifying the privacy loss of a privacy-preserving mechanism on potentially sensitive data is a complex and well-researched topic; the de-facto standard for privacy measures are ε-differential privacy (DP) and its versatile relaxation (ε, δ)-approximate differential privacy (ADP). Recently, novel variants of (A)DP focused on giving tighter privacy bounds under continual observation. In this paper we unify many previous works via the privacy loss distribution (PLD) of a mechanism. We show that for non-adaptive mechanisms, the privacy loss under sequential composition undergoes a convolution and will converge to a Gauss distribution (the central limit theorem for DP). We derive several relevant insights: we can now characterize mechanisms by their privacy loss class, i.e., by the Gauss distribution to which their PLD converges, which allows us to give novel ADP bounds for mechanisms based on their privacy loss class; we derive exact analytical guarantees for the approximate randomized response mechanism and an exact analytical and closed formula for the Gauss mechanism, that, given ε, calculates δ, s.t., the mechanism is (ε, δ)-ADP (not an over-approximating bound).