Linking Health Records for Federated Query Processing

Rinku Dewri 1 , Toan Ong 2  and Ramakrishna Thurimella 1
  • 1 University of Denver
  • 2 University of Colorado, Denver


A federated query portal in an electronic health record infrastructure enables large epidemiology studies by combining data from geographically dispersed medical institutions. However, an individual’s health record has been found to be distributed across multiple carrier databases in local settings. Privacy regulations may prohibit a data source from revealing clear text identifiers, thereby making it non-trivial for a query aggregator to determine which records correspond to the same underlying individual. In this paper, we explore this problem of privately detecting and tracking the health records of an individual in a distributed infrastructure. We begin with a secure set intersection protocol based on commutative encryption, and show how to make it practical on comparison spaces as large as 1010 pairs. Using bigram matching, precomputed tables, and data parallelism, we successfully reduced the execution time to a matter of minutes, while retaining a high degree of accuracy even in records with data entry errors. We also propose techniques to prevent the inference of identifier information when knowledge of underlying data distributions is known to an adversary. Finally, we discuss how records can be tracked utilizing the detection results during query processing.

If the inline PDF is not rendering correctly, you can download the PDF file here.

  • [1] N. Adam, T. White, B. Shafiq, J. Vaidya, and X. He. Privacy preserving integration of health care data. In AMIA Annual Symposium Proceedings, pages 1-5, 2007.

  • [2] N. Adly. Efficient record linkage using a double embedding scheme. In International Conference on Data Mining, pages

  • [3] R. Agrawal, A. Evfimievski, and R. Srikant. Information sharing across private databases. In ACM SIGMOD International Conference on Management of Data, pages 86-97, 2003.

  • [4] H. Brenner. Application of capture-recapture methods for disease monitoring: Potential effects of imperfect record linkage. Methods of Information in Medicine, 33(5):502-506, 1994.

  • [5] P. Christen. Probabilistic data generation for deduplication and data linkage. In International Conference on Intelligent Data Engineering and Automated Learning, pages 109-116, 2005.

  • [6] P. Christen. Febrl -: an open source data cleaning, deduplication and record linkage system with a graphical user interface. In ACM SIGKDD International Conference on Knowledge Discovery and Data mining, pages 1065-68, 2008.

  • [7] P. Christen. A survey of indexing techniques for scalable record linkage and deduplication. IEEE Transactions on Knowledge and Data Engineering, 24(9):1537-1555, 2012.

  • [8] T. Churches and P. Christen. Blind data linkage using ngrams similarity comparisons. In Advances in Knowledge Discovery and Data Mining, pages 121-126, 2004.

  • [9] T. Churches and P. Christen. Some methods for blindfolded record linkage. BMC Medical Informatics and Decision Making, 4:9, 2004.

  • [10] E. Durham, Y. Xue, M. Kantarcioglu, and B. Malin. Private medical record linkage with approximate matching. In AMIA Annual Symposium Proceedings, pages 182-186, 2010.

  • [11] E. A. Durham et al. Composite bloom filters for secure record linkage. IEEE Transactions on Knowledge and Data Engineering, 26(12):2956-2968, 2013.

  • [12] S. B. Dusetzina et al. Linking data for health services research: A framework and instructional guide. Technical Report 14-EHC033-EF, Agency for Healthcare Research and Quality (US), 2014.

  • [13] L. Dusserre, C. Quantin, and H. Bouzelat. A one way public key cryptosystem for the linkage of nominal files in epidemiological studies. MedInfo, 8 (Pt 1):644-647, 1995.

  • [14] S. Duvall, R. Kerber, and A. Thomas. Extending the Fellegi- Sunter probabilistic record linkage method for approximate field comparators. Journal of Biomedical Informatics, 43(1):24-30, 2010.

  • [15] I. Fellegi and A. Sunter. A theory for record linkage. Journal of the American Statistical Association, 64:1183-1210, 1969.

  • [16] J. T. Finnell. In support of emergency department health information technology. In AMIA Annual Proceedings Symposium, pages 246-250, 2005.

  • [17] M. J. Freedman, K. Nissim, and B. Pinkas. Efficient private matching and set intersection. In International Conference on the Theory and Applications of Cryptographic Techniques, pages 1-19, 2004.

  • [18] S. J. Grannis, J. M. Overhage, S. Hui, and C. J. McDonald. Analysis of a probabilistic record linkage technique without human review. In AMIA Annual Symposium Proceedings, pages 259-63, 2003.

  • [19] S. J. Grannis, J. M. Overhage, and C. McDonald. Analysis of identifier performance using a deterministic linkage algorithm. In AMIA Annual Symposium Proceedings, pages 305-309, 2002.

  • [20] S. J. Grannis, J. M. Overhage, and C. McDonald. Real world performance of approximate string comparators for use in patient matching. Studies in Health Technology and Informatics, 107(Pt 1):43-7, 2004.

  • [21] A. Gruenheid, X. L. Dong, and D. Srivastava. Incremental record linkage. Proceedings of the VLDB Endowment, 7(9):697-708, 2014.

  • [22] A. Inan, M. Kantarcioglu, E. Bertino, and M. Scannapieco. A hybird approach to private record linkage. In International Conference in Data Engineering, pages 496-505, 2008.

  • [23] A. Inan, M. Kantarcioglu, G. Ghinita, and E. Bertino. Private record matching using differential privacy. In International Conference on Extending Database Technology, pages 123-134, 2010.

  • [24] A. Karakasidis and V. S. Verykios. Privacy preserving record linkage using phonetic codes. In Balkan Conference in Informatics, pages 101-106, 2009.

  • [25] A. Karakasidis and V. S. Verykios. Secure blocking + Secure matching = Secure record linkage. Journal of Computing Science and Engineering, 5(3):223-235, 2011.

  • [26] M. Kuzu, M. Kantarcioglu, E. Durham, and B. Malin. A constraint satisfaction cryptanalysis of bloom filters in private record linkage. In International Conference on Privacy Enhancing Technologies, pages 226-245, 2011.

  • [27] M. Kuzu, M. Kantarcioglu, E. Durham, C. Toth, and B. Malin. A practical approach to achieve private medical record linkage in light of public resources. Journal of the American Medical Informatics Association, 20(2):285-292, 2013.

  • [28] D. V. LaBorde, J. A. Griffin, H. K. Smalley, P. Keskinocak, and G. Matthew. A framework for assessing patient crossover and health information exchange value. Journal of American Medical Informatics Association, 18(5):698-703, 2011.

  • [29] I. Lazrig et al. Privacy preserving record matching using automated semi-trusted broker. In Annual Working Conference in Data and Applications Security and Privacy, pages 103-118, 2015.

  • [30] B. Malin and E. Airoldi. Confidentiality preserving audits of electronic medical record access. Studies in Health Technology and Informatics, 129(1):320-324, 2007.

  • [31] F. Niedermeyer, S. Steinmetzer, M. Kroll, and R. Schnell. Cryptanalysis of basic bloom filters used for privacy preserving record linkage. Journal of Privacy and Confidentiality, 6(2):59-79, 2014.

  • [32] B. Pinkas, T. Schneider, and M. Zoner. Faster private set intersection based on ot extension. In 23rd USENIX Conference on Security Symposium, pages 797-812, 2014.

  • [33] S. C. Pohlig and M. E. Hellman. An improved algorithm for computing logarithms over GF(p) and its cryptographic significance. IEEE Transactions on Information Theory, 24(1):106-110, 1978.

  • [34] S. M. Randall, A. M. Ferrante, J. H. Boyd, J. K. Bauer, and J. B. Semmens. Privacy-preserving record linkage on large real world datasets. Journal of Biomedical Informatics, 50:205-212, 2014.

  • [35] P. Ravikumar, W. W. Cohen, and S. E. Fienberg. A secure protocol for computing string distance metrics. In PSDM held at ICDM, pages 40-46, 2004.

  • [36] M. Scannapieco, I. Figotin, E. Bertino, and A. K. Elmagarmid. Privacy preserving schema and data matching. In ACM SIGMOD International Conference on Management of Data, pages 653-64, 2007.

  • [37] L. M. Schilling et al. Scalable Architecture for Federated Translational Inquiries Network (SAFTINet) technology infrastructure for a distributed data network. eGEMs (Generating Evidence & Methods to improve patient outcomes), 1(1):1027, 2013.

  • [38] K. Schmidlin, K. M. Clough-Gorr, and A. Spoerri. Privacy Preserving Probabilistic Record Linkage (P3RL): A novel method for linking existing health-related data and maintaining participant confidentiality. BMC Medical Research Methodology, 15(46):open access, 2015.

  • [39] R. Schnell, T. Bachteler, and J. Reiher. Privacy-preserving record linkage using bloom filters. BMC Medical Informatics and Decision Making, 9:41, 2009.

  • [40] X. S. Wang et al. Efficient genome-wide, privacy-preserving similar patient query based on private edit distance. In 22nd ACM SIGSAC Conference on Computer and Communications Security, pages 492-503, 2015.

  • [41] G. M. Weber. Federated queries of clinical data repositories: The sum of the parts does not equal the whole. Journal of American Medical Informatics Association, 20:e155-e161, 2013.

  • [42] W. E. Winkler. The state of record linkage and current research problems. Technical report, Statistical Research Division, U.S. Census Bureau of the Census, 1999.

  • [43] M. Yakout, M. J. Atallah, and A. K. Elmagarmid. Efficient private record linkage. In International Conference in Data Engineering, pages 1283-1286, 2009.


Journal + Issues