Browse

You are looking at 1 - 10 of 296 items for :

  • Databases and Data Mining x
Clear All
Open access

Irwin Reyes, Primal Wijesekera, Joel Reardon, Amit Elazari Bar On, Abbas Razaghpanah, Narseo Vallina-Rodriguez and Serge Egelman

Abstract

We present a scalable dynamic analysis framework that allows for the automatic evaluation of the privacy behaviors of Android apps. We use our system to analyze mobile apps’ compliance with the Children’s Online Privacy Protection Act (COPPA), one of the few stringent privacy laws in the U.S. Based on our automated analysis of 5,855 of the most popular free children’s apps, we found that a majority are potentially in violation of COPPA, mainly due to their use of thirdparty SDKs. While many of these SDKs offer configuration options to respect COPPA by disabling tracking and behavioral advertising, our data suggest that a majority of apps either do not make use of these options or incorrectly propagate them across mediation SDKs. Worse, we observed that 19% of children’s apps collect identifiers or other personally identifiable information (PII) via SDKs whose terms of service outright prohibit their use in child-directed apps. Finally, we show that efforts by Google to limit tracking through the use of a resettable advertising ID have had little success: of the 3,454 apps that share the resettable ID with advertisers, 66% transmit other, non-resettable, persistent identifiers as well, negating any intended privacy-preserving properties of the advertising ID.

Open access

Matthew Smith, Daniel Moser, Martin Strohmeier, Vincent Lenders and Ivan Martinovic

Abstract

Despite the Aircraft Communications, Addressing and Reporting System (ACARS) being widely deployed for over twenty years, little scrutiny has been applied to it outside of the aviation community. Whilst originally utilized by commercial airlines to track their flights and provide automated timekeeping on crew, today it serves as a multi-purpose air-ground data link for many aviation stakeholders including private jet owners, state actors and military. Such a change has caused ACARS to be used far beyond its original mandate; to date no work has been undertaken to assess the extent of this especially with regard to privacy and the various stakeholder groups which use it. In this paper, we present an analysis of ACARS usage by privacy sensitive actors-military, government and business. We conduct this using data from the VHF (both traditional ACARS, and VDL mode 2) and satellite communications subnetworks. Based on more than two million ACARS messages collected over the course of 16 months, we demonstrate that current ACARS usage systematically breaches location privacy for all examined aviation stakeholder groups, explaining the types of messages used to cause this problem.We illustrate the challenges with three case studies-one for each stakeholder group-to show how much privacy sensitive information can be constructed with a handful of ACARS messages. We contextualize our findings with opinions on the issue of privacy in ACARS from 40 aviation industry professionals. From this, we explore recommendations for how to address these issues, including use of encryption and policy measures.

Open access

Takao Murakami, Hideitsu Hino and Jun Sakuma

Abstract

A number of studies have recently been made on discrete distribution estimation in the local model, in which users obfuscate their personal data (e.g., location, response in a survey) by themselves and a data collector estimates a distribution of the original personal data from the obfuscated data. Unlike the centralized model, in which a trusted database administrator can access all users’ personal data, the local model does not suffer from the risk of data leakage. A representative privacy metric in this model is LDP (Local Differential Privacy), which controls the amount of information leakage by a parameter ∈ called privacy budget. When ∈ is small, a large amount of noise is added to the personal data, and therefore users’ privacy is strongly protected. However, when the number of users ℕ is small (e.g., a small-scale enterprise may not be able to collect large samples) or when most users adopt a small value of ∈, the estimation of the distribution becomes a very challenging task. The goal of this paper is to accurately estimate the distribution in the cases explained above. To achieve this goal, we focus on the EM (Expectation-Maximization) reconstruction method, which is a state-of-the-art statistical inference method, and propose a method to correct its estimation error (i.e., difference between the estimate and the true value) using the theory of Rilstone et al. We prove that the proposed method reduces the MSE (Mean Square Error) under some assumptions.We also evaluate the proposed method using three largescale datasets, two of which contain location data while the other contains census data. The results show that the proposed method significantly outperforms the EM reconstruction method in all of the datasets when ℕ or ∈ is small.

Open access

Ryan Wails, Yixin Sun, Aaron Johnson, Mung Chiang and Prateek Mittal

Abstract

Many recent proposals for anonymous communication omit from their security analyses a consideration of the effects of time on important system components. In practice, many components of anonymity systems, such as the client location and network structure, exhibit changes and patterns over time. In this paper, we focus on the effect of such temporal dynamics on the security of anonymity networks. We present Tempest, a suite of novel attacks based on (1) client mobility, (2) usage patterns, and (3) changes in the underlying network routing. Using experimental analysis on real-world datasets, we demonstrate that these temporal attacks degrade user privacy across a wide range of anonymity networks, including deployed systems such as Tor; pathselection protocols for Tor such as DeNASA, TAPS, and Counter-RAPTOR; and network-layer anonymity protocols for Internet routing such as Dovetail and HORNET. The degradation is in some cases surprisingly severe. For example, a single host failure or network route change could quickly and with high certainty identify the client’s ISP to a malicious host or ISP. The adversary behind each attack is relatively weak – generally passive and in control of one network location or a small number of hosts. Our findings suggest that designers of anonymity systems should rigorously consider the impact of temporal dynamics when analyzing anonymity.

Open access

Cecylia Bocovich and Ian Goldberg

Abstract

Censorship circumvention is often characterized as a cat-and-mouse game between a nation-state censor and the developers of censorship resistance systems. Decoy routing systems offer a solution to censor- ship resistance that has the potential to tilt this race in the favour of the censorship resistor by using real connections to unblocked, overt sites to deliver censored content to users. This is achieved by employing the help of Internet Service Providers (ISPs) or Autonomous Systems (ASes) that own routers in the middle of the net- work. However, the deployment of decoy routers has yet to reach fruition. Obstacles to deployment such as the heavy requirements on routers that deploy decoy router relay stations, and the impact on the quality of service for customers that pass through these routers have deterred potential participants from deploying existing systems. Furthermore, connections from clients to overt sites often follow different paths in the upstream and downstream direction, making some existing designs impractical. Although decoy routing systems that lessen the burden on participating routers and accommodate asymmetric flows have been proposed, these arguably more deployable systems suffer from security vulnerabilities that put their users at risk of discovery or make them prone to censorship or denial of service attacks. In this paper, we propose a technique for supporting route asymmetry in previously symmetric decoy routing systems. The resulting asymmetric solution is more secure than previous asymmetric proposals and provides an option for tiered deployment, allowing more cautious ASes to deploy a lightweight, non-blocking relay station that aids in defending against routing-capable adversaries. We also provide an experimental evaluation of relay station performance on off-the-shelf hardware and additional security improvements to recently proposed systems.

Open access

Dionysis Manousakas, Cecilia Mascolo, Alastair R. Beresford, Dennis Chan and Nikhil Sharma

Abstract

Human mobility is often represented as a mobility network, or graph, with nodes representing places of significance which an individual visits, such as their home, work, places of social amenity, etc., and edge weights corresponding to probability estimates of movements between these places. Previous research has shown that individuals can be identified by a small number of geolocated nodes in their mobility network, rendering mobility trace anonymization a hard task. In this paper we build on prior work and demonstrate that even when all location and timestamp information is removed from nodes, the graph topology of an individual mobility network itself is often uniquely identifying. Further, we observe that a mobility network is often unique, even when only a small number of the most popular nodes and edges are considered. We evaluate our approach using a large dataset of cell-tower location traces from 1 500 smartphone handsets with a mean duration of 430 days. We process the data to derive the top−N places visited by the device in the trace, and find that 93% of traces have a unique top−10 mobility network, and all traces are unique when considering top−15 mobility networks. Since mobility patterns, and therefore mobility networks for an individual, vary over time, we use graph kernel distance functions, to determine whether two mobility networks, taken at different points in time, represent the same individual. We then show that our distance metrics, while imperfect predictors, perform significantly better than a random strategy and therefore our approach represents a significant loss in privacy.

Open access

Ehsan Hesamifard, Hassan Takabi, Mehdi Ghasemi and Rebecca N. Wright

Abstract

Machine learning algorithms based on deep Neural Networks (NN) have achieved remarkable results and are being extensively used in different domains. On the other hand, with increasing growth of cloud services, several Machine Learning as a Service (MLaaS) are offered where training and deploying machine learning models are performed on cloud providers’ infrastructure. However, machine learning algorithms require access to the raw data which is often privacy sensitive and can create potential security and privacy risks. To address this issue, we present CryptoDL, a framework that develops new techniques to provide solutions for applying deep neural network algorithms to encrypted data. In this paper, we provide the theoretical foundation for implementing deep neural network algorithms in encrypted domain and develop techniques to adopt neural networks within practical limitations of current homomorphic encryption schemes. We show that it is feasible and practical to train neural networks using encrypted data and to make encrypted predictions, and also return the predictions in an encrypted form. We demonstrate applicability of the proposed CryptoDL using a large number of datasets and evaluate its performance. The empirical results show that it provides accurate privacy-preserving training and classification.

Open access

Alex Davidson, Ian Goldberg, Nick Sullivan, George Tankersley and Filippo Valsorda

Abstract

The growth of content delivery networks (CDNs) has engendered centralized control over the serving of internet content. An unwanted by-product of this growth is that CDNs are fast becoming global arbiters for which content requests are allowed and which are blocked in an attempt to stanch malicious traffic. In particular, in some cases honest users-especially those behind shared IP addresses, including users of privacy tools such as Tor, VPNs, and I2P - can be unfairly targeted by attempted ‘catch-all solutions’ that assume these users are acting maliciously. In this work, we provide a solution to prevent users from being exposed to a disproportionate amount of internet challenges such as CAPTCHAs. These challenges are at the very least annoying and at their worst - when coupled with bad implementations - can completely block access from web resources. We detail a 1-RTT cryptographic protocol (based on an implementation of an oblivious pseudorandom function) that allows users to receive a significant amount of anonymous tokens for each challenge solution that they provide. These tokens can be exchanged in the future for access without having to interact with a challenge. We have implemented our initial solution in a browser extension named “Privacy Pass”, and have worked with the Cloudflare CDN to deploy compatible server-side components in their infrastructure. However, we envisage that our solution could be used more generally for many applications where anonymous and honest access can be granted (e.g., anonymous wiki editing). The anonymity guarantee of our solution makes it immediately appropriate for use by users of Tor/VPNs/ I2P. We also publish figures from Cloudflare indicating the potential impact from the global release of Privacy Pass.

Open access

Malte Möser, Kyle Soska, Ethan Heilman, Kevin Lee, Henry Heffan, Shashvat Srivastava, Kyle Hogan, Jason Hennessey, Andrew Miller, Arvind Narayanan and Nicolas Christin

Abstract

Monero is a privacy-centric cryptocurrency that allows users to obscure their transactions by including chaff coins, called “mixins,” along with the actual coins they spend. In this paper, we empirically evaluate two weaknesses in Monero’s mixin sampling strategy. First, about 62% of transaction inputs with one or more mixins are vulnerable to “chain-reaction” analysis - that is, the real input can be deduced by elimination. Second, Monero mixins are sampled in such a way that they can be easily distinguished from the real coins by their age distribution; in short, the real input is usually the “newest” input. We estimate that this heuristic can be used to guess the real input with 80% accuracy over all transactions with 1 or more mixins. Next, we turn to the Monero ecosystem and study the importance of mining pools and the former anonymous marketplace AlphaBay on the transaction volume. We find that after removing mining pool activity, there remains a large amount of potentially privacy-sensitive transactions that are affected by these weaknesses. We propose and evaluate two countermeasures that can improve the privacy of future transactions.

Open access

Rachel Greenstadt, Damon McCoy and Carmela Troncoso