Students and Taxes: a Privacy-Preserving Study Using Secure Computation

Open access

Abstract

We describe the use of secure multi-party computation for performing a large-scale privacy-preserving statistical study on real government data. In 2015, statisticians from the Estonian Center of Applied Research (CentAR) conducted a big data study to look for correlations between working during university studies and failing to graduate in time. The study was conducted by linking the database of individual tax payments from the Estonian Tax and Customs Board and the database of higher education events from the Ministry of Education and Research. Data collection, preparation and analysis were conducted using the Share-mind secure multi-party computation system that provided end-to-end cryptographic protection to the analysis. Using ten million tax records and half a million education records in the analysis, this is the largest cryptographically private statistical study ever conducted on real data.

[1] Isikuandmete kaitse seadus (Personal Data Protection Act of Estonia). Passed 15.02.2007 - RT I 2007, 24, 127; RT I, 12.07.2014, 51. English translation available at https://www.riigiteataja.ee/en/eli/509072014018/consolide.

[2] Maksukorralduse seadus (Taxation Act of Estonia). Passed 20.02.2002 - RT I 2002, 26, 150; RT I, 11.07.2014, 11. Taxation Act, English translation available at https://www.riigiteataja.ee/en/eli/501092014002/consolide.

[3] Sten Anspal, Dan Bogdanov, Liina Kamm, Baldur Kubo, Ville Sokk, and Riivo Talviste. The working habits of ICT students. Overview of study results (in Estonian). http://www.centar.ee/case-studies/ikt-erialade-tudengitetootamine/, 2015.

[4] Dan Bogdanov. Sharemind: programmable secure computations with practical applications. PhD thesis, University of Tartu, 2013.

[5] Dan Bogdanov, Liina Kamm, Sven Laur, and Ville Sokk. Rmind: a tool for cryptographically secure statistical analysis. Cryptology ePrint Archive, Report 2014/512, 2014. http://eprint.iacr.org/.

[6] Dan Bogdanov, Peeter Laud, and Jaak Randmets. Domainpolymorphic programming of privacy-preserving applications. In Proceedings of the Ninth Workshop on Programming Languages and Analysis for Security, PLAS’14, pages 53-65. ACM, 2014.

[7] Dan Bogdanov, Riivo Talviste, and Jan Willemson. Deploying secure multi-party computation for financial data analysis (short paper). In Proceedings of FC 2012, pages 57-64, 2012.

[8] Peter Bogetoft, Dan Lund Christensen, Ivan Damgård, Martin Geisler, Thomas P. Jakobsen, Mikkel Krøigaard, Janus Dam Nielsen, Jesper Buus Nielsen, Kurt Nielsen, Jakob Pagter, Michael I. Schwartzbach, and Tomas Toft. Secure Multiparty Computation Goes Live. In Proceedings of FC 2009, pages 325-343, 2009.

[9] Koji Chida, Gembu Morohashi, Hitoshi Fuji, Fumihiko Magata, Akiko Fujimura, Koki Hamada, Dai Ikarashi, and Ryuichi Yamamoto. Implementation and evaluation of an efficient secure computation system using ‘R’ for healthcare statistics. Journal of the American Medical Informatics Association, 04, 2014.

[10] Ivan Damgård, Kasper Damgård, Kurt Nielsen, Peter Sebastian Nordholt, and Tomas Toft. Confidential Benchmarking based on Multiparty Computation. Cryptology ePrint Archive, Report 2015/1006, 2015.

[11] Ernesto Damiani, Valerio Bellandi, Stelvio Cimato, Gabriele Gianini, Gerald Spindler, Matthis Grenzer, Christopher Schwanitz, David Koppe, Niklas Heitmüller, Sonja Hagenhoff, and Tim Kostka. D31.1 Risk assessment and current legal status on data protection. http://practiceproject.eu/downloads/publications/D31.1-Risk-assessmentlegal-status-PU-M12.pdf, 2014.

[12] Data Protection Inspectorate of Estonia. Notification for the application to use delicate personal data in a study. January 27th, 2014. Document 2.2.-7/13/557r registered in the document management system of the DPI (in Estonian)., 2014. http://adr.rik.ee/aki/dokument/3679385/.

[13] Khaled El Emam, Saeed Samet, Jun Hu, Liam Peyton, Craig Earle, Gayatri C. Jayaraman, Tom Wong, Murat Kantarcioglu, Fida Dankar, and Aleksander Essex. A Protocol for the Secure Linking of Registries for HPV Surveillance. PLoS ONE, 7(7):e39915, 07 2012.

[14] Joan Feigenbaum, Benny Pinkas, Raphael Ryger, and Felipe Saint-Jean. Secure computation of surveys. In EU Workshop on Secure Multiparty Protocols, 2004.

[15] Liina Kamm. Privacy-preserving statistical analysis using secure multi-party computation. PhD thesis, University of Tartu, 2015.

[16] Sven Laur, Riivo Talviste, and Jan Willemson. From Oblivious AES to Efficient and Secure Database Join in the Multiparty Setting. In Proceedings of ACNS’13, volume 7954 of LNCS, pages 84-101. Springer, 2013.

[17] Latanya Sweeney. k-anonymity: a model for protecting privacy. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 10(5):557-570, 2002.

Journal Information

Cited By

Metrics

All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 317 317 22
PDF Downloads 104 104 6