We propose privacy-preserving protocols for computing linear regression models, in the setting where the training dataset is vertically distributed among several parties. Our main contribution is a hybrid multi-party computation protocol that combines Yao’s garbled circuits with tailored protocols for computing inner products. Like many machine learning tasks, building a linear regression model involves solving a system of linear equations. We conduct a comprehensive evaluation and comparison of different techniques for securely performing this task, including a new Conjugate Gradient Descent (CGD) algorithm. This algorithm is suitable for secure computation because it uses an efficient fixed-point representation of real numbers while maintaining accuracy and convergence rates comparable to what can be obtained with a classical solution using floating point numbers. Our technique improves on Nikolaenko et al.’s method for privacy-preserving ridge regression (S&P 2013), and can be used as a building block in other analyses. We implement a complete system and demonstrate that our approach is highly scalable, solving data analysis problems with one million records and one hundred features in less than one hour of total running time.

[1] G. Asharov, Y. Lindell, T. Schneider, and M. Zohner. More efficient oblivious transfer and extensions for faster secure computation. In ACM Conference on Computer and Communications Security, pages 535–548. ACM, 2013.

[2] D. Beaver. Efficient multiparty protocols using circuit randomization. In CRYPTO, volume 576 of Lecture Notes in Computer Science, pages 420–432. Springer, 1991.

[3] M. Bellare, V. T. Hoang, S. Keelveedhi, and P. Rogaway. Efficient garbling from a fixed-key blockcipher. In IEEE Symposium on Security and Privacy, pages 478–492. IEEE Computer Society, 2013.

[4] T. Bertin-Mahieux. YearPredictionMSD data set. https://archive.ics.uci.edu/ml/datasets/YearPredictionMSD, 2011.

[5] M. Blanton, A. Steele, and M. Aliasgari. Data-oblivious graph algorithms for secure computation and outsourcing. In ASIACCS, pages 207–218. ACM, 2013.

[6] D. Bogdanov, L. Kamm, S. Laur, and V. Sokk. Rmind: a tool for cryptographically secure statistical analysis. IEEE Transactions on Dependable and Secure Computing, 2016.

[7] D. Bogdanov, S. Laur, and J. Willemson. Sharemind: A framework for fast privacy-preserving computations. In ESORICS, pages 192–206. Springer, 2008.

[8] K. Buza. Feedback prediction for blogs. In GfKl, Studies in Classification, Data Analysis, and Knowledge Organization, pages 145–152. Springer, 2012.

[9] K. Buza. BlogFeedback data set. https://archive.ics.uci.edu/ml/datasets/BlogFeedback, 2014.

[10] M. D. Cock, R. Dowsley, A. C. A. Nascimento, and S. C. Newman. Fast, privacy preserving linear regression over distributed datasets based on pre-distributed data. In AISec@CCS, pages 3–14. ACM, 2015.

[11] P. Cortez. Student performance data set. https://archive.ics.uci.edu/ml/datasets/Student+Performance, 2014.

[12] P. Cortez, A. Cerdeira, F. Almeida, T. Matos, and J. Reis. Modeling wine preferences by data mining from physicochemical properties. Decision Support Systems, 47(4):547–553, 2009.

[13] P. Cortez, A. Cerdeira, F. Almeida, T. Matos, and J. Reis. Wine quality data set. https://archive.ics.uci.edu/ml/datasets/Wine+Quality, 2009.

[14] P. Cortez and A. M. G. Silva. Using data mining to predict secondary school student performance. In Future Business Technology Conference, pages 5–12. EUROSIS, 2008.

[15] I. Damgård, V. Pastro, N. P. Smart, and S. Zakarias. Multiparty computation from somewhat homomorphic encryption. In CRYPTO, pages 643–662. Springer, 2012.

[16] S. de Hoogh, B. Schoenmakers, and M. Veeningen. Certificate validation in secure computation and its use in verifiable linear programming. In AFRICACRYPT, volume 9646 of Lecture Notes in Computer Science, pages 265–284. Springer, 2016.

[17] D. Demmler, G. Dessouky, F. Koushanfar, A. Sadeghi, T. Schneider, and S. Zeitouni. Automated synthesis of optimized circuits for secure computation. In ACM Conference on Computer and Communications Security, pages 1504–1517. ACM, 2015.

[18] D. Demmler, T. Schneider, and M. Zohner. ABY - A framework for efficient mixed-protocol secure two-party computation. In NDSS. The Internet Society, 2015.

[19] W. Du and M. J. Atallah. Privacy-preserving cooperative scientific computations. In CSFW, pages 273–294. IEEE Computer Society, 2001.

[20] W. Du and M. J. Atallah. Protocols for secure remote database access with approximate matching. In ECommerce Security and Privacy, volume 2 of Advances in Information Security, pages 87–111. Springer, 2001.

[21] W. Du, Y. S. Han, and S. Chen. Privacy-preserving multivariate statistical analysis: Linear regression and classification. In SDM, pages 222–233. SIAM, 2004.

[22] C. Dwork and K. Nissim. Privacy-preserving datamining on vertically partitioned databases. In CRYPTO, volume 3152 of Lecture Notes in Computer Science, pages 528–544. Springer, 2004.

[23] C. Dwork and A. Roth. The algorithmic foundations of differential privacy. Foundations and Trends in Theoretical Computer Science, 9(3-4):211–407, 2014.

[24] M. D. Ercegovac and T. Lang. Digital arithmetic. Elsevier, 2004.

[25] H. Fanaee-T and J. Gama. Bike sharing dataset data set. https://archive.ics.uci.edu/ml/datasets/Bike+Sharing+Dataset, 2013.

[26] H. Fanaee-T and J. Gama. Event labeling combining ensemble detectors and background knowledge. Progress in AI, 2(2-3):113–127, 2014.

[27] J. Fonollosa and R. Huerta. Gas sensor array under dynamic gas mixtures data set. https://archive.ics.uci.edu/ml/datasets/Gas+sensor+array+under+dynamic+gas+mixtures, 2015.

[28] J. Fonollosa, S. Sheik, R. Huerta, and S. Marco. Reservoir computing compensates slow response of chemosensor arrays exposed to fast varying gas concentrations in continuous monitoring. Sensors and Actuators B: Chemical, 215:618–629, 2015.

[29] N. Gilboa. Two party RSA key generation. In CRYPTO, volume 1666 of Lecture Notes in Computer Science, pages 116–129. Springer, 1999.

[30] O. Goldreich. The Foundations of Cryptography - Volume 2, Basic Applications. Cambridge University Press, 2004.

[31] O. Goldreich, S. Micali, and A. Wigderson. How to play any mental game or A completeness theorem for protocols with honest majority. In STOC, pages 218–229. ACM, 1987.

[32] S. D. Gordon, J. Katz, V. Kolesnikov, F. Krell, T. Malkin, M. Raykova, and Y. Vahlis. Secure two-party computation in sublinear (amortized) time. In ACM Conference on Computer and Communications Security, pages 513–524. ACM, 2012.

[33] T. Graepel, K. E. Lauter, and M. Naehrig. ML confidential: Machine learning on encrypted data. In ICISC, pages 1–21. Springer, 2012.

[34] F. Graf, H.-P. Kriegel, M. Schubert, S. Poelsterl, and A. Cavallaro. Relative location of ct slices on axial axis data set. https://archive.ics.uci.edu/ml/datasets/Relative+location+of+CT+slices+on+axial+axis, 2011.

[35] R. Hall, S. E. Fienberg, and Y. Nardi. Secure multiple linear regression based on homomorphic encryption. Journal of Official Statistics, 27(4):669, 2011.

[36] W. Henecka, S. Kögl, A. Sadeghi, T. Schneider, and I. Wehrenberg. TASTY: tool for automating secure two-party computations. In ACM Conference on Computer and Communications Security, pages 451–462. ACM, 2010.

[37] Y. Huang, D. Evans, J. Katz, and L. Malka. Faster secure two-party computation using garbled circuits. In USENIX Security Symposium. USENIX Association, 2011.

[38] Y. Huang, C. Shen, D. Evans, J. Katz, and A. Shelat. Efficient secure computation with garbled circuits. In ICISS, volume 7093 of Lecture Notes in Computer Science, pages 28–48. Springer, 2011.

[39] A. Karatsuba and Y. Ofman. Multiplication of many-digital numbers by automatic computers. In Proceedings of the USSR Academy of Sciences 145, pages 293–294, 1962.

[40] A. F. Karr, X. Lin, A. P. Sanil, and J. P. Reiter. Regression on distributed databases via secure multi-party computation. In DG.O, ACM International Conference Proceeding Series. Digital Government Research Center, 2004.

[41] M. Keller, E. Orsini, and P. Scholl. MASCOT: faster malicious arithmetic secure computation with oblivious transfer. In ACM Conference on Computer and Communications Security, pages 830–842. ACM, 2016.

[42] M. Keller and P. Scholl. Efficient, oblivious data structures for MPC. In ASIACRYPT (2), pages 506–525. Springer, 2014.

[43] D. E. Knuth. The Art of Computer Programming, Volume 2 (3rd Ed.): Seminumerical Algorithms. Addison-Wesley, 1997.

[44] V. Kolesnikov and T. Schneider. Improved garbled circuit: Free XOR gates and applications. In ICALP (2), pages 486–498. Springer, 2008.

[45] P. Laud and M. Pettai. Secure multiparty sorting protocols with covert privacy. In NordSec, volume 10014 of Lecture Notes in Computer Science, pages 216–231, 2016.

[46] M. Lichman. UCI machine learning repository. http://archive.ics.uci.edu/ml, 2013.

[47] Y. Lindell. How to simulate it - A tutorial on the simulation proof technique. IACR Cryptology ePrint Archive, 2016:46, 2016.

[48] Y. Lindell and B. Pinkas. Privacy preserving data mining. J. Cryptology, 15(3):177–206, 2002.

[49] Y. Lindell and B. Pinkas. A proof of security of Yao’s protocol for two-party computation. J. Cryptology, 22(2):161–188, 2009.

[50] Y. Lindell, B. Pinkas, N. P. Smart, and A. Yanai. Efficient constant round multi-party computation combining BMR and SPDZ. In CRYPTO (2), pages 319–338. Springer, 2015.

[51] C. Liu, X. S. Wang, K. Nayak, Y. Huang, and E. Shi. ObliVM: A programming framework for secure computation. In IEEE Symposium on Security and Privacy, pages 359–376. IEEE Computer Society, 2015.

[52] G. Meurant. The Lanczos and conjugate gradient algorithms: from theory to finite precision computations, volume 19. SIAM, 2006.

[53] P. Mohassel and M. K. Franklin. Efficiency tradeoffs for malicious two-party computation. In Public Key Cryptography, volume 3958 of Lecture Notes in Computer Science, pages 458–473. Springer, 2006.

[54] K. P. Murphy. Machine learning: a probabilistic perspective. MIT press, 2012.

[55] M. Naor and B. Pinkas. Efficient oblivious transfer protocols. In SODA, pages 448–457. ACM/SIAM, 2001.

[56] M. Naor, B. Pinkas, and R. Sumner. Privacy preserving auctions and mechanism design. In EC, pages 129–139, 1999.

[57] K. Nayak, X. S. Wang, S. Ioannidis, U. Weinsberg, N. Taft, and E. Shi. Graphsc: Parallel secure computation made easy. In IEEE Symposium on Security and Privacy, pages 377–394. IEEE Computer Society, 2015.

[58] J. B. Nielsen, P. S. Nordholt, C. Orlandi, and S. S. Burra. A new approach to practical active-secure two-party computation. In CRYPTO, volume 7417 of Lecture Notes in Computer Science, pages 681–700. Springer, 2012.

[59] V. Nikolaenko, U. Weinsberg, S. Ioannidis, M. Joye, D. Boneh, and N. Taft. Privacy-preserving ridge regression on hundreds of millions of records. In IEEE Symposium on Security and Privacy, pages 334–348. IEEE Computer Society, 2013.

[60] J. Nocedal and S. Wright. Numerical optimization. Springer Science & Business Media, 2006.

[61] P. Pullonen and S. Siim. Combining secret sharing and garbled circuits for efficient private IEEE 754 floating-point computations. In Financial Cryptography Workshops, pages 172–183. Springer, 2015.

[62] M. Rabin. How to Exchange Secrets by Oblivious Transfer. Technical Report TR-81, Harvard Aiken Computation Laboratory, 1981.

[63] M. Redmond. Communities and crime data set. https://archive.ics.uci.edu/ml/datasets/Communities+and+Crime, 2009.

[64] M. Redmond and A. Baveja. A data-driven software tool for enabling cooperative information sharing among police departments. European Journal of Operational Research, 141(3):660–678, 2002.

[65] A. P. Sanil, A. F. Karr, X. Lin, and J. P. Reiter. Privacy preserving regression modelling via distributed computation. In KDD, pages 677–682. ACM, 2004.

[66] E. M. Songhori, S. U. Hussain, A. Sadeghi, T. Schneider, and F. Koushanfar. Tinygarble: Highly compressed and scalable sequential garbled circuits. In IEEE Symposium on Security and Privacy, pages 411–428. IEEE Computer Society, 2015.

[67] X. Wang, A. J. Malozemoff, and J. Katz. Faster secure two-party computation in the single-execution setting. In EUROCRYPT (3), volume 10212 of Lecture Notes in Computer Science, pages 399–424, 2017.

[68] M. H. Weik. A third survey of domestic electronic digital computing systems. Technical report, DTIC Document, 1961.

[69] J. H. Wilkinson. The algebraic eigenvalue problem. Clarendon Press Oxford, 1988.

[70] A. C. Yao. How to generate and exchange secrets (extended abstract). In FOCS, pages 162–167. IEEE Computer Society, 1986.

[71] H. Yu, J. Vaidya, and X. Jiang. Privacy-preserving SVM classification on vertically partitioned data. In PAKDD, pages 647–656. Springer, 2006.

[72] S. Zahur and D. Evans. Obliv-C: A language for extensible data-oblivious computation. IACR Cryptology ePrint Archive, 2015:1153, 2015.

[73] S. Zahur, M. Rosulek, and D. Evans. Two halves make a whole - reducing data transfer in garbled circuits using half gates. In EUROCRYPT (2), pages 220–250. Springer, 2015.

[75] Secure distributed linear regression. https://github.com/schoppmp/linreg-mpc.

[76] Auto MPG data set. https://archive.ics.uci.edu/ml/datasets/Auto+MPG, 1993.

Journal Information

Cited By


All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 646 429 29
PDF Downloads 368 260 9