Survey on privacy preserving data mining techniques in health care databases

Open access


In health care databases, there are tireless and antagonistic interests between data mining research and privacy preservation, the more you try to hide sensitive private information, the less valuable it is for analysis. In this paper, we give an outlook on data anonymization problems by case studies. We give a summary on the state-of-the-art health care data anonymization issues including legal environment and expectations, the most common attacking strategies on privacy, and the proposed metrics for evaluating usefulness and privacy preservation for anonymization. Finally, we summarize the strength and the shortcomings of different approaches and techniques from the literature based on these evaluations.

[1] C. C. Aggarwal, P. S. Yu, An introduction to privacy-preserving data mining. in: Privacy-Preserving Data Mining, (Eds.: C. C. Aggarwal and P. S. Yu) chapter 1, pp. 1-9. Springer-Verlag, 2008. ⇒40, 49

[2] D. Agrawal, C. C. Aggarwal, On the design and quantification of privacy preserving data mining algorithms. Proc. 20th ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pp. 247-255. ACM, 2002. ⇒41, 49, 51

[3] R. Agrawal, R. Srikant, Privacy-preserving data mining. ACM Sigmod Record 29, 2 (2000) 439-450. ⇒41

[4] M. Barbaro, T. J. Zeller, S. Hansell, A face is exposed for aol searcher no. 4417749. The New York Times, 9 Aug. 1, 2006. ⇒36

[5] R. J. Bayardo, R. Agrawal, Data privacy through optimal k-anonymization. Proc. 21st International Conference on Data Engineering, ICDE ’05, pp. 217-228, Washington, DC, USA, 2005. IEEE Computer Society. ⇒44

[6] D. Benatar, Indiscretion and other threats to confidentiality. South African J. Bioethics and Law, 3, 2 (2010) 59-62. ⇒36

[7] J. J. Berman, Confidentiality issues for medical data miners. Artificial Intelligence in Medicine 26, 1 (2001) 25-36. ⇒36

[8] E. Bertino, D. Lin, W. Jiang, Privacy Preserving Data Mining, chapter A survey of quantification of privacy preserving data mining algorithms, pp. 183-205. Springer, 2008. ⇒49, 50

[9] C. Dwork, K. Nissim, Privacy-preserving datamining on vertically partitioned databases. in: Advances in Cryptology - CRYPTO 2004 pp. 134-138. Springer, 2004. ⇒49

[10] K. El Emam, D. Buckerdige, A. Neisa, E. Jonker, A. Verma, The re-identification risk of canadians from longitudinal demographics. BMC Medical Informatics and Decision Making 11, 46 (2011) 1-12. ⇒36, 42

[11] A. V. Evfimievski, Randomization in privacy preserving data mining. ACM SIGKDD Explorations Newsletter 4, 2 (2002) 43-48. ⇒41

[12] U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, From data mining to knowledge discovery in databases. AI Magazine 17, 3 (1996)37-54. ⇒35

[13] R. Gellman, The story of the banker, state commission, health records, and the called loans: An urban legend?, 2011. ⇒35

[14] A. Gionis, A. Mazza, T. Tassa, k-anonymization revisited. Proc. IEEE 24th Int. Conf. Data Engineering ICDE 2008, pp. 744-753, 2008. ⇒41

[15] M. Kantarcıoˇglu, J. Jin, C. Clifton, When do data mining results violate privacy? Proc. 2004 ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Seattle, WA, pp. 599-604, 2004. ⇒50

[16] H. Kargupta, S. Datta, Q. Wang, K. Sivakumar, On the privacy preserving properties of random data perturbation techniques. Third IEEE International Conference on Data Mining, 2003. ICDM 2003, pp. 99-106. IEEE, 2003. ⇒49

[17] D. Kifer, Attacks on privacy and deFinetti’s theorem. Proc. 2009 ACM SIGMOD International Conference on Management of Data, pp. 127-138. ACM, 2009. ⇒ 50

[18] M. R. Koot, G. van’t Noordende, C. de Laat, A study on the re-identifiability of dutch citizens. in: Workshop on 3rd Hot Topics in Privacy Enhancing Technologies, HotPETs 2010, 2010. ⇒36, 42, 44

[19] K. LeFevre, D. J. DeWitt, R. Ramakrishnan, Incognito: Efficient full-domain kanonymity. Proc. 2005 ACM SIGMOD International Conference on Management of Data, pp. 49-60. ACM, 2005. ⇒44

[20] K. LeFevre, D. J. DeWitt, R. Ramakrishnan, Mondrian multidimensional kanonymity. Proc. 22nd International Conference on Data Engineering, 2006. ICDE’06., pp. 25-25. IEEE, 2006. ⇒44

[21] J.-L. Lin, J. Y.-C. Liu, Privacy preserving itemset mining through fake transactions. SAC, pp. 375-379, 2007. ⇒45, 46

[22] A. Machanavajjhala, D. Kifer, J. Gehrke, M. Venkitasubramaniam, L-diversity: Privacy beyond k-anonymity. ACM Transactions on Knowledge Discovery from Data 1, 1(2007) 3-3. ⇒41, 44, 45

[23] G. Miklau, D. Suciu, A formal analysis of information disclosure in data exchange. Proc. 2004 ACM SIGMOD International Conference on Management of Data, SIGMOD ’04, pp. 575-586, New York, NY, USA, 2004. ACM. ⇒41

[24] M. Miller, J. Seberry, Relative compromise of statistical databases. Austral. Computer J. 21, 2 (1989) 56-61. ⇒41

[25] A. Narayanan, V. Shmatikov, Robust de-anonymization of large sparse datasets. Proc. 2008 IEEE Symposium on Security and Privacy, SP ’08, pp. 111-125, Washington, DC, USA, 2008. IEEE Computer Society. ⇒36, 40

[26] S. R. M. Oliveira, O. R. Za¨ıane, A privacy-preserving clustering approach toward secure and effective data analysis for business collaboration. Computers & Security 26, 1 (2007) 81-93. ⇒49

[27] L. Sweeney, Datafly: A system for providing anonymity in medical data. Proc IFIP TC11 WG11.3 Eleventh International Conference on Database Securty XI: Status and Prospects, pp. 356-381, 1997. ⇒34, 35, 36, 49

[28] L. Sweeney, Achieving k-anonymity privacy protection using generalization and suppression. Intern. J. Uncertainty, Fuzziness and Knowledge-Based Systems 10, 5 (2002) 571-588. ⇒44, 50

[29] L. Sweeney, k-anonymity: A model for protecting privacy. Intern. J. Uncertainty, Fuzziness and Knowledge-Based Systems 10, 5 (2002) 557-570. ⇒41, 42, 44

[30] J. Vaidya, Y. Zhu, C. W. Clifton, Privacy preserving data mining. Advances in Information Security 19 (2006) 1-121. ⇒37, 40

[31] K. Wahlstrom, J. F. Roddick, R. Sarre, V. Estivill-Castro, D. de Vries, Encyclopedia of Data Warehousing and Mining, volume 2, chapter Legal and technical issues of privacy preservation in data mining, pp. 1158-1163. IGI Publishing, 2nd edition, 2008. ⇒40

[32] X. Xiao, Y. Tao, Anatomy: Simple and effective privacy preservation. Proc. 32nd International Conference on Very Large Data Bases (VLDB), pp. 139-150. VLDB Endowment, 2006. ⇒47

[33] Q. Zhang, N. Koudas, D. Srivastava, T. Yu, Aggregate query answering on anonymized tables. IEEE 23rd International Conference on Data Engineering, 2007. ICDE 2007, pp. 116-125, Istanbul, Turkey, 2007. IEEE Xplore. ⇒47

Acta Universitatis Sapientiae, Informatica

The Journal of "Sapientia" Hungarian University of Transylvania

Journal Information

Cited By


All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 190 190 30
PDF Downloads 67 67 11