Privately Evaluating Decision Trees and Random Forests

David J. Wu 1 , Tony Feng 1 , Michael Naehrig 2 ,  and Kristin Lauter 2
  • 1 Stanford University
  • 2 Microsoft Research


Decision trees and random forests are common classifiers with widespread use. In this paper, we develop two protocols for privately evaluating decision trees and random forests. We operate in the standard two-party setting where the server holds a model (either a tree or a forest), and the client holds an input (a feature vector). At the conclusion of the protocol, the client learns only the model’s output on its input and a few generic parameters concerning the model; the server learns nothing. The first protocol we develop provides security against semi-honest adversaries. We then give an extension of the semi-honest protocol that is robust against malicious adversaries. We implement both protocols and show that both variants are able to process trees with several hundred decision nodes in just a few seconds and a modest amount of bandwidth. Compared to previous semi-honest protocols for private decision tree evaluation, we demonstrate a tenfold improvement in computation and bandwidth.

If the inline PDF is not rendering correctly, you can download the PDF file here.

  • [1] BigML.

  • [2] Microsoft Azure Machine Learning.

  • [3] R. Agrawal and R. Srikant. Privacy-preserving data mining. SIGMOD Rec., 29(2), 2000.

  • [4] G. Asharov, Y. Lindell, T. Schneider, and M. Zohner. More efficient oblivious transfer and extensions for faster secure computation. In CCS, pages 535-548, 2013.

  • [5] A. T. Azar and S. M. El-Metwally. Decision tree classifiers for automated medical diagnosis. Neural Computing and Applications, 23(7-8):2387-2403, 2013.

  • [6] K. Bache and M. Lichman. UCI machine learning repository, 2013.

  • [7] M. Barni, P. Failla, V. Kolesnikov, R. Lazzeretti, A. Sadeghi, and T. Schneider. Secure evaluation of private linear branching programs with medical applications. In ESORICS, pages 424-439, 2009.

  • [8] M. Bellare and O. Goldreich. On defining proofs of knowledge. In CRYPTO, pages 390-420, 1992.

  • [9] M. Bellare, V. T. Hoang, S. Keelveedhi, and P. Rogaway. Efficient garbling from a fixed-key blockcipher. In IEEE Symposium on Security and Privacy, pages 478-492, 2013.

  • [10] M. Bellare, V. T. Hoang, and P. Rogaway. Foundations of garbled circuits. Cryptology ePrint Archive, Report 2012/265, 2012.

  • [11] I. F. Blake and V. Kolesnikov. Strong conditional oblivious transfer and computing on intervals. In ASIACRYPT, 2004.

  • [12] D. Bogdanov, S. Laur, and J. Willemson. Sharemind: A framework for fast privacy-preserving computations. In ESORICS, pages 192-206, 2008.

  • [13] D. Boneh. The decision Diffie-Hellman problem. In ANTS, pages 48-63, 1998.

  • [14] J. Bos, C. Costello, P. Longa, and M. Naehrig. Specification of curve selection and supported curve parameters in MSR ECCLib. Technical Report MSR-TR-2014-92, Microsoft Research, June 2014.

  • [15] J. W. Bos, C. Costello, P. Longa, and M. Naehrig. Selecting elliptic curves for cryptography: An efficiency and security analysis. IACR Cryptology ePrint Archive, 2014:130, 2014.

  • [16] J. W. Bos, K. E. Lauter, and M. Naehrig. Private predictive analysis on encrypted medical data. Journal of Biomedical Informatics, 50:234-243, 2014.

  • [17] R. Bost, R. A. Popa, S. Tu, and S. Goldwasser. Machine learning classification over encrypted data. In NDSS, 2015.

  • [18] Z. Brakerski, C. Gentry, and V. Vaikuntanathan. (Leveled) fully homomorphic encryption without bootstrapping. In ITCS, pages 309-325, 2012.

  • [19] L. Breiman. Random forests. Machine Learning, 45(1):5-32, 2001.

  • [20] J. Brickell, D. E. Porter, V. Shmatikov, and E. Witchel. Privacy-preserving remote diagnostics. In CCS, pages 498-507, 2007.

  • [21] J. Camenisch and M. Stadler. Efficient group signature schemes for large groups (extended abstract). In CRYPTO, pages 410-424, 1997.

  • [22] R. Canetti. Security and composition of multiparty cryptographic protocols. J. Cryptology, 13(1):143-202, 2000.

  • [23] R. Canetti. Security and composition of cryptographic protocols: a tutorial (part I). SIGACT News, 37(3):67-92, 2006.

  • [24] D. Chaum and T. P. Pedersen. Wallet databases with observers. In CRYPTO, pages 89-105, 1992.

  • [25] R. Cramer, I. Damgård, and B. Schoenmakers. Proofs of partial knowledge and simplified design of witness hiding protocols. In CRYPTO, pages 174-187, 1994.

  • [26] R. Cramer, R. Gennaro, and B. Schoenmakers. A secure and optimally efficient multi-authority election scheme. In EUROCRYPT, pages 103-118, 1997.

  • [27] G. D. Crescenzo, R. Ostrovsky, and S. Rajagopalan. Conditional oblivious transfer and timed-release encryption. In EUROCRYPT, pages 74-89, 1999.

  • [28] I. Damgård, M. Geisler, and M. Krøigaard. Efficient and secure comparison for on-line auctions. In ACISP, pages 416-430, 2007.

  • [29] I. Damgård, M. Jurik, and J. B. Nielsen. A generalization of Paillier’s public-key system with applications to electronic voting. Int. J. Inf. Sec., 9(6):371-385, 2010.

  • [30] D. Demmler, T. Schneider, and M. Zohner. ABY - A framework for efficient mixed-protocol secure two-party computation. In NDSS, 2015.

  • [31] W. Du and Z. Zhan. Building decision tree classifier on private data. In CRPIT ’14, 2002.

  • [32] Z. Erkin, M. Franz, J. Guajardo, S. Katzenbeisser, I. Lagendijk, and T. Toft. Privacy-preserving face recognition. In PETS, pages 235-253, 2009.

  • [33] A. Fiat and A. Shamir. How to prove yourself: Practical solutions to identification and signature problems. In CRYPTO, pages 186-194, 1986.

  • [34] M. Fredrikson, S. Jha, and T. Ristenpart. Model inversion attacks that exploit confidence information and basic countermeasures. In ACM SIGSAC, pages 1322-1333, 2015.

  • [35] C. Gentry. A fully homomorphic encryption scheme. PhD thesis, Stanford University, 2009.

  • [36] O. Goldreich. Foundations of Cryptography: Volume 2, Basic Applications. Cambridge University Press, New York, NY, USA, 2004.

  • [37] S. Goldwasser and S. Micali. Probabilistic encryption. Journal of Computer and System Sciences, 28(2):270-299, April 1984.

  • [38] S. Goldwasser, S. Micali, and C. Rackoff. The knowledge complexity of interactive proof-systems (extended abstract). In ACM STOC, pages 291-304, 1985.

  • [39] T. Graepel, K. E. Lauter, and M. Naehrig. ML confidential: Machine learning on encrypted data. In ICISC, pages 1-21, 2012.

  • [40] T. Granlund and the GMP development team. GNU MP: The GNU Multiple Precision Arithmetic Library, 5.0.5 edition, 2012.

  • [41] S. Halevi and V. Shoup. Algorithms in HElib. In CRYPTO, pages 554-571, 2014.

  • [42] J. Håstad, R. Impagliazzo, L. A. Levin, and M. Luby. A pseudorandom generator from any one-way function. SIAM J. Comput., 28(4):1364-1396, 1999.

  • [43] T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning. Springer Series in Statistics. Springer New York Inc., 2001.

  • [44] C. Hazay and Y. Lindell. Efficient Secure Two-Party Protocols - Techniques and Constructions. Information Security and Cryptography. Springer, 2010.

  • [45] B. A. Huberman, M. K. Franklin, and T. Hogg. Enhancing privacy and trust in electronic communities. In EC, pages 78-86, 1999.

  • [46] Y. Ishai, J. Katz, E. Kushilevitz, Y. Lindell, and E. Petrank. On achieving the "best of both worlds" in secure multiparty computation. SIAM J. Comput., 40(1):122-141, 2011.

  • [47] J. Katz and L. Malka. Constant-round private function evaluation with linear complexity. In ASIACRYPT, pages 556-571, 2011.

  • [48] J. Kilian. Founding cryptography on oblivious transfer. In STOC, pages 20-31, 1988.

  • [49] H. C. Koh, W. C. Tan, and C. P. Goh. A two-step method to construct credit scoring models with data mining techniques. International Journal of Business and Information, 1:96-118, 2006.

  • [50] V. Kolesnikov, A.-R. Sadeghi, and T. Schneider. Improved garbled circuit building blocks and applications to auctions and computing minima. Cryptology ePrint Archive, Report 2009/411, 2009.

  • [51] V. Kolesnikov and T. Schneider. Improved garbled circuit: Free XOR gates and applications. In ICALP, pages 486-498, 2008.

  • [52] Y. Lindell and B. Pinkas. Privacy preserving data mining. In CRYPTO, pages 36-54, 2000.

  • [53] Y. Lindell and B. Pinkas. A proof of security of Yao’s protocol for two-party computation. J. Cryptology, 22(2):161-188, 2009.

  • [54] P. Mohassel and S. Niksefat. Oblivious decision programs from oblivious transfer: Efficient reductions. Financial Cryptography, 2014:269-284, 2012.

  • [55] P. Mohassel and S. S. Sadeghian. How to hide circuits in MPC: An efficient framework for private function evaluation. In EUROCRYPT, pages 557-574, 2013.

  • [56] M. Naor and B. Pinkas. Oblivious transfer and polynomial evaluation. In STOC, pages 245-254, 1999.

  • [57] M. Naor and B. Pinkas. Efficient oblivious transfer protocols. In SODA, pages 448-457, 2001.

  • [58] A. Narayanan, N. Thiagarajan, M. Lakhani, M. Hamburg, and D. Boneh. Location privacy via private proximity testing. In NDSS, 2011.

  • [59] P. Paillier. Public-key cryptosystems based on composite degree residuosity classes. In EUROCRYPT, volume 1592, pages 223-238. 1999.

  • [60] M. O. Rabin. How to exchange secrets with oblivious transfer. IACR Cryptology ePrint Archive, 2005:187, 2005.

  • [61] V. Shoup. NTL: A library for doing number theory.

  • [62] A. Singh and J. V. Guttag. A comparison of non-symmetric entropy-based classification trees and support vector machine for cardiovascular risk stratification. Annual International Conference of the IEEE Engineering in Medicine and Biology Society, pages 79-82, 2011.

  • [63] D. J. Wu, T. Feng, M. Naehrig, and K. E. Lauter. Privately evaluating decision trees and random forests. IACR Cryptology ePrint Archive, 2015:386, 2015.

  • [64] A. C. Yao. How to generate and exchange secrets (extended abstract). In FOCS, pages 162-167, 1986.

  • [65] S. Zahur, M. Rosulek, and D. Evans. Two halves make a whole - reducing data transfer in garbled circuits using half gates. In EUROCRYPT, pages 220-250, 2015.


Journal + Issues