Ruben Recabarren and Bogdan Carbunar
Providing reliable and surreptitious communications is difficult in the presence of adaptive and resourceful state level censors. In this paper we introduce Tithonus, a framework that builds on the Bitcoin blockchain and network to provide censorship-resistant communication mechanisms. In contrast to previous approaches, we do not rely solely on the slow and expensive blockchain consensus mechanism but instead fully exploit Bitcoin’s peer-to-peer gossip protocol. We develop adaptive, fast and cost effective data communication solutions that camouflage client requests into inconspicuous Bitcoin transactions. We propose solutions to securely request and transfer content, with unobservability and censorship resistance, and free, pay-per-access and subscription based payment options. When compared to state-of-the-art Bitcoin writing solutions, Tithonus reduces the cost of transferring data to censored clients by 2 orders of magnitude and increases the goodput by 3 to 5 orders of magnitude. We show that Tithonus client initiated transactions are hard to detect, while server initiated transactions cannot be censored without creating split world problems to the Bit-coin blockchain.
Alexandros Mittos, Bradley Malin and Emiliano De Cristofaro
Rapid advances in human genomics are enabling researchers to gain a better understanding of the role of the genome in our health and well-being, stimulating hope for more effective and cost efficient healthcare. However, this also prompts a number of security and privacy concerns stemming from the distinctive characteristics of genomic data. To address them, a new research community has emerged and produced a large number of publications and initiatives. In this paper, we rely on a structured methodology to contextualize and provide a critical analysis of the current knowledge on privacy-enhancing technologies used for testing, storing, and sharing genomic data, using a representative sample of the work published in the past decade. We identify and discuss limitations, technical challenges, and issues faced by the community, focusing in particular on those that are inherently tied to the nature of the problem and are harder for the community alone to address. Finally, we report on the importance and difficulty of the identified challenges based on an online survey of genome data privacy experts.
Thee Chanyaswad, Changchang Liu and Prateek Mittal
A key challenge facing the design of differential privacy in the non-interactive setting is to maintain the utility of the released data. To overcome this challenge, we utilize the Diaconis-Freedman-Meckes (DFM) effect, which states that most projections of high-dimensional data are nearly Gaussian. Hence, we propose the RON-Gauss model that leverages the novel combination of dimensionality reduction via random orthonormal (RON) projection and the Gaussian generative model for synthesizing differentially-private data. We analyze how RON-Gauss benefits from the DFM effect, and present multiple algorithms for a range of machine learning applications, including both unsupervised and supervised learning. Furthermore, we rigorously prove that (a) our algorithms satisfy the strong ɛ-differential privacy guarantee, and (b) RON projection can lower the level of perturbation required for differential privacy. Finally, we illustrate the effectiveness of RON-Gauss under three common machine learning applications – clustering, classification, and regression – on three large real-world datasets. Our empirical results show that (a) RON-Gauss outperforms previous approaches by up to an order of magnitude, and (b) loss in utility compared to the non-private real data is small. Thus, RON-Gauss can serve as a key enabler for real-world deployment of privacy-preserving data release.
Anselme Tueno, Florian Kerschbaum and Stefan Katzenbeisser
Decision trees are widespread machine learning models used for data classification and have many applications in areas such as healthcare, remote diagnostics, spam filtering, etc. In this paper, we address the problem of privately evaluating a decision tree on private data. In this scenario, the server holds a private decision tree model and the client wants to classify its private attribute vector using the server’s private model. The goal is to obtain the classification while preserving the privacy of both – the decision tree and the client input. After the computation, only the classification result is revealed to the client, while nothing is revealed to the server. Many existing protocols require a constant number of rounds. However, some of these protocols perform as many comparisons as there are decision nodes in the entire tree and others transform the whole plaintext decision tree into an oblivious program, resulting in higher communication costs. The main idea of our novel solution is to represent the tree as an array. Then we execute only d – the depth of the tree – comparisons. Each comparison is performed using a small garbled circuit, which output secret-shares of the index of the next node. We get the inputs to the comparison by obliviously indexing the tree and the attribute vector. We implement oblivious array indexing using either garbled circuits, Oblivious Transfer or Oblivious RAM (ORAM). Using ORAM, this results in the first protocol with sub-linear cost in the size of the tree. We implemented and evaluated our solution using the different array indexing procedures mentioned above. As a result, we are not only able to provide the first protocol with sublinear cost for large trees, but also reduce the communication cost for the large real-world data set “Spambase” from 18 MB to 1[triangleright]2 MB and the computation time from 17 seconds to less than 1 second in a LAN setting, compared to the best related work.
Ahmed Salem, Pascal Berrang, Mathias Humbert and Michael Backes
The decreasing costs of molecular profiling have fueled the biomedical research community with a plethora of new types of biomedical data, enabling a breakthrough towards more precise and personalized medicine. Naturally, the increasing availability of data also enables physicians to compare patients’ data and treatments easily and to find similar patients in order to propose the optimal therapy. Such similar patient queries (SPQs) are of utmost importance to medical practice and will be relied upon in future health information exchange systems. While privacy-preserving solutions have been previously studied, those are limited to genomic data, ignoring the different newly available types of biomedical data.
In this paper, we propose new cryptographic techniques for finding similar patients in a privacy-preserving manner with various types of biomedical data, including genomic, epigenomic and transcriptomic data as well as their combination. We design protocols for two of the most common similarity metrics in biomedicine: the Euclidean distance and Pearson correlation coefficient. Moreover, unlike previous approaches, we account for the fact that certain locations contribute differently to a given disease or phenotype by allowing to limit the query to the relevant locations and to assign them different weights. Our protocols are specifically designed to be highly efficient in terms of communication and bandwidth, requiring only one or two rounds of communication and thus enabling scalable parallel queries. We rigorously prove our protocols to be secure based on cryptographic games and instantiate our technique with three of the most important types of biomedical data – namely DNA, microRNA expression, and DNA methylation. Our experimental results show that our protocols can compute a similarity query over a typical number of positions against a database of 1,000 patients in a few seconds. Finally, we propose and formalize strategies to mitigate the threat of malicious users or hospitals.
Chen Chen, Anrin Chakraborti and Radu Sion
Encryption protects sensitive data from unauthorized access, yet is not sufficient when users are forced to surrender keys under duress. In contrast, plausible deniability enables users to not only encrypt data but also deny its existence when challenged. Most existing plausible deniability work (e.g. the successful and unfortunately now-defunct TrueCrypt) tackles “single snapshot” adversaries, and cannot handle the more realistic scenario of adversaries gaining access to a device at multiple time points. Such “multi-snapshot” adversaries can simply observe modifications between snapshots and detect the existence of hidden data. Existing ideas handling “multi-snapshot” scenarios feature prohibitive overheads when deployed on practically-sized disks. This is mostly due to a lack of data locality inherent in certain standard access-randomization mechanisms, one of the building blocks used to ensure plausible deniability.
In this work, we show that such randomization is not necessary for strong plausible deniability. Instead, it can be replaced by a canonical form that permits most of writes to be done sequentially. This has two key advantages: 1) it reduces the impact of seek due to random accesses; 2) it reduces the overall number of physical blocks that need to be written for each logical write. As a result, PD-DM increases I/O throughput by orders of magnitude (10–100× in typical setups) over existing work while maintaining strong plausible deniability against multi-snapshot adversaries.
Notably, PD-DM is the first plausible-deniable system getting within reach of the performance of standard encrypted volumes (dm-crypt) for random I/O.
Nisarg Raval, Ashwin Machanavajjhala and Jerry Pan
Personal data garnered from various sensors are often offloaded by applications to the cloud for analytics. This leads to a potential risk of disclosing private user information. We observe that the analytics run on the cloud are often limited to a machine learning model such as predicting a user’s activity using an activity classifier. We present Olympus, a privacy framework that limits the risk of disclosing private user information by obfuscating sensor data while minimally affecting the functionality the data are intended for. Olympus achieves privacy by designing a utility aware obfuscation mechanism, where privacy and utility requirements are modeled as adversarial networks. By rigorous and comprehensive evaluation on a real world app and on benchmark datasets, we show that Olympus successfully limits the disclosure of private information without significantly affecting functionality of the application.
Dominic Deuber, Christoph Egger, Katharina Fech, Giulio Malavolta, Dominique Schröder, Sri Aravinda Krishnan Thyagarajan, Florian Battke and Claudia Durand
An individual’s genetic information is possibly the most valuable personal information. While knowledge of a person’s DNA sequence can facilitate the diagnosis of several heritable diseases and allow personalized treatment, its exposure comes with significant threats to the patient’s privacy. Currently known solutions for privacy-respecting computation require the owner of the DNA to either be heavily involved in the execution of a cryptographic protocol or to completely outsource the access control to a third party. This motivates the demand for cryptographic protocols which enable computation over encrypted genomic data while keeping the owner of the genome in full control. We envision a scenario where data owners can exercise arbitrary and dynamic access policies, depending on the intended use of the analysis results and on the credentials of who is conducting the analysis. At the same time, data owners are not required to maintain a local copy of their entire genetic data and do not need to exhaust their computational resources in an expensive cryptographic protocol.
In this work, we present METIS, a system that assists the computation over encrypted data stored in the cloud while leaving the decision on admissible computations to the data owner. It is based on garbled circuits and supports any polynomially-computable function. A critical feature of our system is that the data owner is free from computational overload and her communication complexity is independent of the size of the input data and only linear in the size of the circuit’s output. We demonstrate the practicality of our approach with an implementation and an evaluation of several functions over real datasets.
Jamie Hayes, Luca Melis, George Danezis and Emiliano De Cristofaro
Generative models estimate the underlying distribution of a dataset to generate realistic samples according to that distribution. In this paper, we present the first membership inference attacks against generative models: given a data point, the adversary determines whether or not it was used to train the model. Our attacks leverage Generative Adversarial Networks (GANs), which combine a discriminative and a generative model, to detect overfitting and recognize inputs that were part of training datasets, using the discriminator’s capacity to learn statistical differences in distributions. We present attacks based on both white-box and black-box access to the target model, against several state-of-the-art generative models, over datasets of complex representations of faces (LFW), objects (CIFAR-10), and medical images (Diabetic Retinopathy). We also discuss the sensitivity of the attacks to different training parameters, and their robustness against mitigation strategies, finding that defenses are either ineffective or lead to significantly worse performances of the generative models in terms of training stability and/or sample quality.