The decreasing costs of molecular profiling have fueled the biomedical research community with a plethora of new types of biomedical data, enabling a breakthrough towards more precise and personalized medicine. Naturally, the increasing availability of data also enables physicians to compare patients’ data and treatments easily and to find similar patients in order to propose the optimal therapy. Such similar patient queries (SPQs) are of utmost importance to medical practice and will be relied upon in future health information exchange systems. While privacy-preserving solutions have been previously studied, those are limited to genomic data, ignoring the different newly available types of biomedical data.
In this paper, we propose new cryptographic techniques for finding similar patients in a privacy-preserving manner with various types of biomedical data, including genomic, epigenomic and transcriptomic data as well as their combination. We design protocols for two of the most common similarity metrics in biomedicine: the Euclidean distance and Pearson correlation coefficient. Moreover, unlike previous approaches, we account for the fact that certain locations contribute differently to a given disease or phenotype by allowing to limit the query to the relevant locations and to assign them different weights. Our protocols are specifically designed to be highly efficient in terms of communication and bandwidth, requiring only one or two rounds of communication and thus enabling scalable parallel queries. We rigorously prove our protocols to be secure based on cryptographic games and instantiate our technique with three of the most important types of biomedical data – namely DNA, microRNA expression, and DNA methylation. Our experimental results show that our protocols can compute a similarity query over a typical number of positions against a database of 1,000 patients in a few seconds. Finally, we propose and formalize strategies to mitigate the threat of malicious users or hospitals.
If the inline PDF is not rendering correctly, you can download the PDF file here.
 Shirley E. Poduslo Rong Huang Jie Huang and Sierra M. Smith. Genome screen of late-onset alzheimer’s extended pedigrees identifies trpc4ap by haplotype analysis. American Journal of Medical Genetics Part B: Neuropsychiatric Genetics 150B(1):50–55 2009.
 Andrew P Feinberg and M Daniele Fallin. Epigenetics at the crossroads of genes and the environment. JAMA 314:1129–1130 2015.
 Peter A Jones and Stephen B Baylin. The epigenomics of cancer. Cell 128:683–692 2007.
 Irfan A Qureshi and Mark F Mehler. Advances in epigenetics and epigenomics for neurodegenerative diseases. Current neurology and neuroscience reports 11:464–473 2011.
 Manel Esteller and James G. Herman. Cancer as an epigenetic disease: Dna methylation and chromatin alterations in human tumours. The Journal of Pathology 196(1):1–7 2002.
 Jun Lu Gad Getz Eric A Miska Ezequiel Alvarez-Saavedra Justin Lamb David Peck Alejandro Sweet-Cordero Benjamin L Ebert Raymond H Mak Adolfo A Ferrando et al. Microrna expression profiles classify human cancers. nature 435(7043):834–838 2005.
 Mohamed Hamed Christian Spaniol Alexander Zapp and Volkhard Helms. Integrative network-based approach identifies key genetic elements in breast invasive carcinoma. BMC Genomics 16(5) 2015.
 Nora K. Speicher and Nico Pfeifer. Towards multiple kernel principal component analysis for integrative analysis of tumor samples. ArXiv e-prints January 2017.
 Nora K. Speicher and Nico Pfeifer. Integrating different data types by regularized unsupervised multiple kernel learning with application to cancer subtype discovery. Bioinformatics 31(12):i268 2015.
 Anthony A Philippakis Danielle R Azzariti Sergi Beltran Anthony J Brookes Catherine A Brownstein Michael Brudno Han G Brunner Orion J Buske Knox Carey Cassie Doll et al. The matchmaker exchange: a platform for rare disease gene discovery. Human mutation 36(10):915–921 2015.
 Zhen Lin Art B Owen and Russ B Altman. Genomic research and human subject privacy. Science pages 183–183 2004.
 Erman Ayday Emiliano De Cristofaro Jean-Pierre Hubaux and Gene Tsudik. Whole genome sequencing: Revolutionary medicine or privacy nightmare? Computer pages 58–66 2015.
 Muhammad Naveed Erman Ayday Ellen W Clayton Jacques Fellay Carl A Gunter Jean-Pierre Hubaux Bradley A Malin and XiaoFeng Wang. Privacy in the genomic era. ACM Computing Surveys (CSUR) 48:6 2015.
 Yaniv Erlich and Arvind Narayanan. Routes for breaching and protecting genetic privacy. Nature Reviews Genetics 15:409–421 2014.
 Mathias Humbert Kévin Huguenin Joachim Hugonot Erman Ayday and Jean-Pierre Hubaux. De-anonymizing genomic databases using phenotypic traits. Proceedings on Privacy Enhancing Technologies 2015(2):99–114 2015.
 Michael Backes Pascal Berrang Mathias Humbert Xiaoyu Shen and Verena Wolf. Simulating the large-scale erosion of genomic privacy over time. IEEE/ACM transactions on computational biology and bioinformatics 2018.
 Eric E Schadt Sangsoon Woo and Ke Hao. Bayesian method to predict individual SNP genotypes from gene expression data. Nature genetics 44:603–608 2012.
 Michael Backes Pascal Berrang Anne Hecksteden Mathias Humbert Andreas Keller and Tim Meyer. Privacy in epigenetics: Temporal linkability of MicroRNA expression profiles. In Proceedings of the 25th USENIX Security Symposium 2016.
 Michael Backes Pascal Berrang Mathias Humbert and Praveen Manoharan. Membership privacy in MicroRNA-based studies. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security pages 319–330. ACM 2016.
 Michael Backes Pascal Berrang Matthias Bieg Roland Eils Carl Herrmann Mathias Humbert and Irina Lehmann. Identifying personal dna methylation profiles by genotype inference. In Security and Privacy (SP) 2017 IEEE Symposium on pages 957–976. IEEE 2017.
 Pascal Berrang Mathias Humbert Yang Zhang Irina Lehmann Roland Eils and Michael Backes. Dissecting privacy risks in biomedical data. In Proceedings of the 3rd IEEE European Symposium on Security and Privacy (Euro S&P). IEEE 2018.
 Xiao Shaun Wang Yan Huang Yongan Zhao Haixu Tang XiaoFeng Wang and Diyue Bu. Efficient genome-wide privacy-preserving similar patient query based on private edit distance. In Proceedings of the 22Nd ACM SIGSAC Conference on Computer and Communications Security CCS ‘15 pages 492–503 New York NY USA 2015. ACM.
 Gilad Asharov Shai Halevi Yehuda Lindell and Tal Rabin. Privacy-preserving search of similar patients in genomic data. Cryptology ePrint Archive Report 2017/144 2017. http://eprint.iacr.org/2017/144.
 Muhammad Naveed Shashank Agrawal Manoj Prabhakaran XiaoFeng Wang Erman Ayday Jean-Pierre Hubaux and Carl Gunter. Controlled functional encryption. In Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security CCS ‘14 pages 1280–1291 New York NY USA 2014. ACM.
 Yadong Yang Edward Ruiz-Narvaez Peter Kraft and Hannia Campos. Effect of apolipoprotein e genotype and saturated fat intake on plasma lipids and myocardial infarction in the central valley of costa rica. Human Biology 79(6):637–647 2017/06/23 2007.
 María J Artiga María J Bullido Isabel Sastre María Recuero Miguel A García Jesús Aldudo Jesús Vázquez and Fernando Valdivieso. Allelic polymorphisms in the transcriptional regulatory region of apolipoprotein e gene. FEBS Letters 421(2):105–108 1998.
 Gerwin Roks Marc Cruts Jeanine J. Houwing-Duistermaat Bart Dermaut Sally Serneels Louis M. Havekes Albert Hofman Monique M. B. Breteler Christine Van Broeckhoven and Cornelia M van Duijn. Effect of the apoe-491a/t promoter polymorphism on apolipoprotein e levels and risk of alzheimer disease: The rotterdam study. American Journal of Medical Genetics 114(5):570–573 2002.
 Simon M. Laws Eugene Hone Sam Gandy and Ralph N. Martins. Expanding the association between the apoe gene and the risk of alzheimer’s disease: possible roles for apoe promoter polymorphisms and alterations in apoe transcription. Journal of Neurochemistry 84(6):1215–1236 2003.
 June E. Eichner S. Terence Dunn Ghazala Perveen David M. Thompson Kenneth E. Stewart and Berrit C. Stroehla. Apolipoprotein e polymorphism and cardiovascular disease: A huge review. American Journal of Epidemiology 155(6):487 2002.
 Anna Danielsson Szilárd Nemes Magnus Tisell Birgitta Lannering Claes Nordborg Magnus Sabel and Helena Carén. Methped: a dna methylation classifier tool for the identification of pediatric brain tumor subtypes. Clinical Epigenetics 7(1):62 2015.
 Dario Catalano and Dario Fiore. Using linearly-homomorphic encryption to evaluate degree-2 functions on encrypted data. In Proceedings of the 22Nd ACM SIGSAC Conference on Computer and Communications Security CCS ‘15 pages 1518–1529 New York NY USA 2015. ACM.
 John Quackenbush. Computational genetics: computational analysis of microarray data. Nature reviews genetics 2(6):418 2001.
 Bo Wang Aziz M Mezlini Feyyaz Demir Marc Fiume Zhuowen Tu Michael Brudno Benjamin Haibe-Kains and Anna Goldenberg. Similarity network fusion for aggregating data types on a genomic scale. Nature methods 11(3):333–337 2014.
 Burkhard Morgenstern Bingyao Zhu Sebastian Horwege and Chris André Leimeister. Estimating evolutionary distances between genomic sequences from spaced-word matches. Algorithms for Molecular Biology 10(1):5 Feb 2015.
 Jianchao Yao Chunqi Chang Mari L. Salmi Yeung Sam Hung Ann Loraine and Stanley J. Roux. Genome-scale cluster analysis of replicated microarrays using shrinkage correlation coefficient. BMC Bioinformatics 9(1):288 Jun 2008.
 Michael B. Eisen Paul T. Spellman Patrick O. Brown and David Botstein. Cluster analysis and display of genome-wide expression patterns. Proceedings of the National Academy of Sciences 95(25):14863–14868 1998.
 dbSNP. https://www.ncbi.nlm.nih.gov/SNP/.
 Raphael Bost Raluca Ada Popa Stephen Tu and Shafi Goldwasser. Machine learning classification over encrypted data. In 22nd Network and Distributed System Security Symposium (NDSS’ 15) 2015.
 Paul J McLaren Jean Louis Raisaro Manel Aouri Margalida Rotger Erman Ayday István Bartha Maria B Delgado Yannick Vallet Huldrych F Günthard Matthias Cavassini et al. Privacy-preserving genomic testing in the clinic: a model using HIV treatment. Genetics in Medicine 2016.
 George Danezis and Emiliano De Cristofaro. Fast and private genomic testing for disease susceptibility. In Proceedings of the 13th Workshop on Privacy in the Electronic Society pages 31–34. ACM 2014.
 Whitfield Diffie and Martin E. Hellman. New directions in cryptography. IEEE Trans. Inf. Theor. 22(6):644–654 September 2006.
 Pascal Paillier. Public-key cryptosystems based on composite degree residuosity classes. In Proceedings of the 17th International Conference on Theory and Application of Cryptographic Techniques EUROCRYPT’99 pages 223–238 Berlin Heidelberg 1999. Springer-Verlag.
 Florian Kerschbaum and Orestis Terzidis. Filtering for private collaborative benchmarking. In Günter Müller editor Emerging Trends in Information and Communication Security pages 409–422 Berlin Heidelberg 2006. Springer
 Personal genomes project (PGP) platform. https://my.pgphms.org.
 Gene expression omnibus (GEO). https://www.ncbi.nlm.nih.gov/geo/.
 Sally R. Lambert Hendrik Witt Volker Hovestadt Manuela Zucknick Marcel Kool Danita M. Pearson Andrey Korshunov Marina Ryzhova Koichi Ichimura Nada Jabado Adam M. Fontebasso Peter Lichter Stefan M. Pfister V. Peter Collins and David T. W. Jones. Differential expression and methylation of brain developmental genes define location-specific subsets of pilocytic astrocytoma. Acta Neuropathologica 126(2):291–301 Aug 2013.
 Petra Leidinger Valentina Galata Christina Backes Cord Stähler Stefanie Rheinheimer Hanno Huwer Eckart Meese and Andreas Keller. Longitudinal study on circulating mirnas in patients after lung cancer resection. In Oncotarget 2015.
 Christine Jost Ha Lam Alexander Maximov and Ben J. M. Smeets. Encryption performance improvements of the paillier cryptosystem. IACR Cryptology ePrint Archive 2015:864 2015.
 Cynthia Dwork Aaron Roth et al. The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9(3–4):211–407 2014.
 Ivan Damgård Mads Jurik and Jesper Buus Nielsen. A generalization of paillier’s public-key system with applications to electronic voting. International Journal of Information Security 9(6):371–385 2010.
 Amos Fiat and Adi Shamir. How to prove yourself: Practical solutions to identification and signature problems. In Andrew M. Odlyzko editor Advances in Cryptology — CRYPTO’ 86 pages 186–194 Berlin Heidelberg 1987. Springer Berlin Heidelberg.
 Md Momin Al Aziz Dima Alhadidi and Noman Mohammed. Secure approximation of edit distance on genomic data. BMC Medical Genomics 10(2):41 Jul 2017.
 Yan Huang David Evans and Jonathan Katz. Private set intersection: Are garbled circuits better than custom protocols? In NDSS. The Internet Society 2012.
 Bristena Oprisanu and Emilliano De Cristofaro. Anonimme: Bringing anonymity to the matchmaker exchange platform for rare disease gene discovery. bioRxiv 2018.
 Per Hallgren Claudio Orlandi and Andrei Sabelfeld. Privatepool: Privacy-preserving ridesharing. In 2017 IEEE 30th Computer Security Foundations Symposium (CSF) pages 276–291 Aug 2017.
 Ge Zhong Ian Goldberg and Urs Hengartner. Louis lester and pierre: Three protocols for location privacy. In Nikita Borisov and Philippe Golle editors Privacy Enhancing Technologies pages 62–76 Berlin Heidelberg 2007. Springer Berlin Heidelberg.