We propose privacy-preserving protocols for computing linear regression models, in the setting where the training dataset is vertically distributed among several parties. Our main contribution is a hybrid multi-party computation protocol that combines Yao’s garbled circuits with tailored protocols for computing inner products. Like many machine learning tasks, building a linear regression model involves solving a system of linear equations. We conduct a comprehensive evaluation and comparison of different techniques for securely performing this task, including a new Conjugate Gradient Descent (CGD) algorithm. This algorithm is suitable for secure computation because it uses an efficient fixed-point representation of real numbers while maintaining accuracy and convergence rates comparable to what can be obtained with a classical solution using floating point numbers. Our technique improves on Nikolaenko et al.’s method for privacy-preserving ridge regression (S&P 2013), and can be used as a building block in other analyses. We implement a complete system and demonstrate that our approach is highly scalable, solving data analysis problems with one million records and one hundred features in less than one hour of total running time.
If the inline PDF is not rendering correctly, you can download the PDF file here.
 G. Asharov Y. Lindell T. Schneider and M. Zohner. More efficient oblivious transfer and extensions for faster secure computation. In ACM Conference on Computer and Communications Security pages 535–548. ACM 2013.
 D. Beaver. Efficient multiparty protocols using circuit randomization. In CRYPTO volume 576 of Lecture Notes in Computer Science pages 420–432. Springer 1991.
 M. Bellare V. T. Hoang S. Keelveedhi and P. Rogaway. Efficient garbling from a fixed-key blockcipher. In IEEE Symposium on Security and Privacy pages 478–492. IEEE Computer Society 2013.
 T. Bertin-Mahieux. YearPredictionMSD data set. https://archive.ics.uci.edu/ml/datasets/YearPredictionMSD 2011.
 M. Blanton A. Steele and M. Aliasgari. Data-oblivious graph algorithms for secure computation and outsourcing. In ASIACCS pages 207–218. ACM 2013.
 D. Bogdanov L. Kamm S. Laur and V. Sokk. Rmind: a tool for cryptographically secure statistical analysis. IEEE Transactions on Dependable and Secure Computing 2016.
 D. Bogdanov S. Laur and J. Willemson. Sharemind: A framework for fast privacy-preserving computations. In ESORICS pages 192–206. Springer 2008.
 K. Buza. Feedback prediction for blogs. In GfKl Studies in Classification Data Analysis and Knowledge Organization pages 145–152. Springer 2012.
 K. Buza. BlogFeedback data set. https://archive.ics.uci.edu/ml/datasets/BlogFeedback 2014.
 M. D. Cock R. Dowsley A. C. A. Nascimento and S. C. Newman. Fast privacy preserving linear regression over distributed datasets based on pre-distributed data. In AISec@CCS pages 3–14. ACM 2015.
 P. Cortez. Student performance data set. https://archive.ics.uci.edu/ml/datasets/Student+Performance 2014.
 P. Cortez A. Cerdeira F. Almeida T. Matos and J. Reis. Modeling wine preferences by data mining from physicochemical properties. Decision Support Systems 47(4):547–553 2009.
 P. Cortez A. Cerdeira F. Almeida T. Matos and J. Reis. Wine quality data set. https://archive.ics.uci.edu/ml/datasets/Wine+Quality 2009.
 P. Cortez and A. M. G. Silva. Using data mining to predict secondary school student performance. In Future Business Technology Conference pages 5–12. EUROSIS 2008.
 I. Damgård V. Pastro N. P. Smart and S. Zakarias. Multiparty computation from somewhat homomorphic encryption. In CRYPTO pages 643–662. Springer 2012.
 S. de Hoogh B. Schoenmakers and M. Veeningen. Certificate validation in secure computation and its use in verifiable linear programming. In AFRICACRYPT volume 9646 of Lecture Notes in Computer Science pages 265–284. Springer 2016.
 D. Demmler G. Dessouky F. Koushanfar A. Sadeghi T. Schneider and S. Zeitouni. Automated synthesis of optimized circuits for secure computation. In ACM Conference on Computer and Communications Security pages 1504–1517. ACM 2015.
 D. Demmler T. Schneider and M. Zohner. ABY - A framework for efficient mixed-protocol secure two-party computation. In NDSS. The Internet Society 2015.
 W. Du and M. J. Atallah. Privacy-preserving cooperative scientific computations. In CSFW pages 273–294. IEEE Computer Society 2001.
 W. Du and M. J. Atallah. Protocols for secure remote database access with approximate matching. In ECommerce Security and Privacy volume 2 of Advances in Information Security pages 87–111. Springer 2001.
 W. Du Y. S. Han and S. Chen. Privacy-preserving multivariate statistical analysis: Linear regression and classification. In SDM pages 222–233. SIAM 2004.
 C. Dwork and K. Nissim. Privacy-preserving datamining on vertically partitioned databases. In CRYPTO volume 3152 of Lecture Notes in Computer Science pages 528–544. Springer 2004.
 C. Dwork and A. Roth. The algorithmic foundations of differential privacy. Foundations and Trends in Theoretical Computer Science 9(3-4):211–407 2014.
 M. D. Ercegovac and T. Lang. Digital arithmetic. Elsevier 2004.
 H. Fanaee-T and J. Gama. Bike sharing dataset data set. https://archive.ics.uci.edu/ml/datasets/Bike+Sharing+Dataset 2013.
 H. Fanaee-T and J. Gama. Event labeling combining ensemble detectors and background knowledge. Progress in AI 2(2-3):113–127 2014.
 J. Fonollosa and R. Huerta. Gas sensor array under dynamic gas mixtures data set. https://archive.ics.uci.edu/ml/datasets/Gas+sensor+array+under+dynamic+gas+mixtures 2015.
 J. Fonollosa S. Sheik R. Huerta and S. Marco. Reservoir computing compensates slow response of chemosensor arrays exposed to fast varying gas concentrations in continuous monitoring. Sensors and Actuators B: Chemical 215:618–629 2015.
 N. Gilboa. Two party RSA key generation. In CRYPTO volume 1666 of Lecture Notes in Computer Science pages 116–129. Springer 1999.
 O. Goldreich. The Foundations of Cryptography - Volume 2 Basic Applications. Cambridge University Press 2004.
 O. Goldreich S. Micali and A. Wigderson. How to play any mental game or A completeness theorem for protocols with honest majority. In STOC pages 218–229. ACM 1987.
 S. D. Gordon J. Katz V. Kolesnikov F. Krell T. Malkin M. Raykova and Y. Vahlis. Secure two-party computation in sublinear (amortized) time. In ACM Conference on Computer and Communications Security pages 513–524. ACM 2012.
 T. Graepel K. E. Lauter and M. Naehrig. ML confidential: Machine learning on encrypted data. In ICISC pages 1–21. Springer 2012.
 F. Graf H.-P. Kriegel M. Schubert S. Poelsterl and A. Cavallaro. Relative location of ct slices on axial axis data set. https://archive.ics.uci.edu/ml/datasets/Relative+location+of+CT+slices+on+axial+axis 2011.
 R. Hall S. E. Fienberg and Y. Nardi. Secure multiple linear regression based on homomorphic encryption. Journal of Official Statistics 27(4):669 2011.
 W. Henecka S. Kögl A. Sadeghi T. Schneider and I. Wehrenberg. TASTY: tool for automating secure two-party computations. In ACM Conference on Computer and Communications Security pages 451–462. ACM 2010.
 Y. Huang D. Evans J. Katz and L. Malka. Faster secure two-party computation using garbled circuits. In USENIX Security Symposium. USENIX Association 2011.
 Y. Huang C. Shen D. Evans J. Katz and A. Shelat. Efficient secure computation with garbled circuits. In ICISS volume 7093 of Lecture Notes in Computer Science pages 28–48. Springer 2011.
 A. Karatsuba and Y. Ofman. Multiplication of many-digital numbers by automatic computers. In Proceedings of the USSR Academy of Sciences 145 pages 293–294 1962.
 A. F. Karr X. Lin A. P. Sanil and J. P. Reiter. Regression on distributed databases via secure multi-party computation. In DG.O ACM International Conference Proceeding Series. Digital Government Research Center 2004.
 M. Keller E. Orsini and P. Scholl. MASCOT: faster malicious arithmetic secure computation with oblivious transfer. In ACM Conference on Computer and Communications Security pages 830–842. ACM 2016.
 M. Keller and P. Scholl. Efficient oblivious data structures for MPC. In ASIACRYPT (2) pages 506–525. Springer 2014.
 D. E. Knuth. The Art of Computer Programming Volume 2 (3rd Ed.): Seminumerical Algorithms. Addison-Wesley 1997.
 V. Kolesnikov and T. Schneider. Improved garbled circuit: Free XOR gates and applications. In ICALP (2) pages 486–498. Springer 2008.
 P. Laud and M. Pettai. Secure multiparty sorting protocols with covert privacy. In NordSec volume 10014 of Lecture Notes in Computer Science pages 216–231 2016.
 M. Lichman. UCI machine learning repository. http://archive.ics.uci.edu/ml 2013.
 Y. Lindell. How to simulate it - A tutorial on the simulation proof technique. IACR Cryptology ePrint Archive 2016:46 2016.
 Y. Lindell and B. Pinkas. Privacy preserving data mining. J. Cryptology 15(3):177–206 2002.
 Y. Lindell and B. Pinkas. A proof of security of Yao’s protocol for two-party computation. J. Cryptology 22(2):161–188 2009.
 Y. Lindell B. Pinkas N. P. Smart and A. Yanai. Efficient constant round multi-party computation combining BMR and SPDZ. In CRYPTO (2) pages 319–338. Springer 2015.
 C. Liu X. S. Wang K. Nayak Y. Huang and E. Shi. ObliVM: A programming framework for secure computation. In IEEE Symposium on Security and Privacy pages 359–376. IEEE Computer Society 2015.
 G. Meurant. The Lanczos and conjugate gradient algorithms: from theory to finite precision computations volume 19. SIAM 2006.
 P. Mohassel and M. K. Franklin. Efficiency tradeoffs for malicious two-party computation. In Public Key Cryptography volume 3958 of Lecture Notes in Computer Science pages 458–473. Springer 2006.
 K. P. Murphy. Machine learning: a probabilistic perspective. MIT press 2012.
 M. Naor and B. Pinkas. Efficient oblivious transfer protocols. In SODA pages 448–457. ACM/SIAM 2001.
 M. Naor B. Pinkas and R. Sumner. Privacy preserving auctions and mechanism design. In EC pages 129–139 1999.
 K. Nayak X. S. Wang S. Ioannidis U. Weinsberg N. Taft and E. Shi. Graphsc: Parallel secure computation made easy. In IEEE Symposium on Security and Privacy pages 377–394. IEEE Computer Society 2015.
 J. B. Nielsen P. S. Nordholt C. Orlandi and S. S. Burra. A new approach to practical active-secure two-party computation. In CRYPTO volume 7417 of Lecture Notes in Computer Science pages 681–700. Springer 2012.
 V. Nikolaenko U. Weinsberg S. Ioannidis M. Joye D. Boneh and N. Taft. Privacy-preserving ridge regression on hundreds of millions of records. In IEEE Symposium on Security and Privacy pages 334–348. IEEE Computer Society 2013.
 J. Nocedal and S. Wright. Numerical optimization. Springer Science & Business Media 2006.
 P. Pullonen and S. Siim. Combining secret sharing and garbled circuits for efficient private IEEE 754 floating-point computations. In Financial Cryptography Workshops pages 172–183. Springer 2015.
 M. Rabin. How to Exchange Secrets by Oblivious Transfer. Technical Report TR-81 Harvard Aiken Computation Laboratory 1981.
 M. Redmond. Communities and crime data set. https://archive.ics.uci.edu/ml/datasets/Communities+and+Crime 2009.
 M. Redmond and A. Baveja. A data-driven software tool for enabling cooperative information sharing among police departments. European Journal of Operational Research 141(3):660–678 2002.
 A. P. Sanil A. F. Karr X. Lin and J. P. Reiter. Privacy preserving regression modelling via distributed computation. In KDD pages 677–682. ACM 2004.
 E. M. Songhori S. U. Hussain A. Sadeghi T. Schneider and F. Koushanfar. Tinygarble: Highly compressed and scalable sequential garbled circuits. In IEEE Symposium on Security and Privacy pages 411–428. IEEE Computer Society 2015.
 X. Wang A. J. Malozemoff and J. Katz. Faster secure two-party computation in the single-execution setting. In EUROCRYPT (3) volume 10212 of Lecture Notes in Computer Science pages 399–424 2017.
 M. H. Weik. A third survey of domestic electronic digital computing systems. Technical report DTIC Document 1961.
 J. H. Wilkinson. The algebraic eigenvalue problem. Clarendon Press Oxford 1988.
 A. C. Yao. How to generate and exchange secrets (extended abstract). In FOCS pages 162–167. IEEE Computer Society 1986.
 H. Yu J. Vaidya and X. Jiang. Privacy-preserving SVM classification on vertically partitioned data. In PAKDD pages 647–656. Springer 2006.
 S. Zahur and D. Evans. Obliv-C: A language for extensible data-oblivious computation. IACR Cryptology ePrint Archive 2015:1153 2015.
 S. Zahur M. Rosulek and D. Evans. Two halves make a whole - reducing data transfer in garbled circuits using half gates. In EUROCRYPT (2) pages 220–250. Springer 2015.