A key challenge facing the design of differential privacy in the non-interactive setting is to maintain the utility of the released data. To overcome this challenge, we utilize the Diaconis-Freedman-Meckes (DFM) effect, which states that most projections of high-dimensional data are nearly Gaussian. Hence, we propose the RON-Gauss model that leverages the novel combination of dimensionality reduction via random orthonormal (RON) projection and the Gaussian generative model for synthesizing differentially-private data. We analyze how RON-Gauss benefits from the DFM effect, and present multiple algorithms for a range of machine learning applications, including both unsupervised and supervised learning. Furthermore, we rigorously prove that (a) our algorithms satisfy the strong ɛ-differential privacy guarantee, and (b) RON projection can lower the level of perturbation required for differential privacy. Finally, we illustrate the effectiveness of RON-Gauss under three common machine learning applications – clustering, classification, and regression – on three large real-world datasets. Our empirical results show that (a) RON-Gauss outperforms previous approaches by up to an order of magnitude, and (b) loss in utility compared to the non-private real data is small. Thus, RON-Gauss can serve as a key enabler for real-world deployment of privacy-preserving data release.
 Martin Abadi, Andy Chu, Ian Goodfellow, H. Brendan McMahan, Ilya Mironov, Kunal Talwar, and Li Zhang. Deep learning with differential privacy. In CCS, pages 308–318. ACM, 2016.
 Gergely Acs, Claude Castelluccia, and Rui Chen. Differentially private histogram publishing through lossy compression. In ICDM, pages 1–10. IEEE, 2012.
 Gergely Acs, Luca Melis, Claude Castelluccia, and Emil-iano De Cristofaro. Differentially private mixture of generative neural networks. arXiv preprint arXiv:1709.04514, 2017.
 Michael Backes, Pascal Berrang, Anne Hecksteden, Mathias Humbert, Andreas Keller, and Tim Meyer. Privacy in epigenetics: Temporal linkability of microrna expression profiles. In Proceedings of the 25th USENIX Security Symposium, 2016.
 Raghavendran Balu, Teddy Furon, and Sébastien Gambs. Challenging differential privacy: the case of non-interactive mechanisms. In European Symposium on Research in Computer Security, pages 146–164. Springer, 2014.
 Oresti Banos, Miguel Damas, Hector Pomares, Ignacio Rojas, Mate Attila Toth, and Oliver Amft. A benchmark dataset to evaluate sensor displacement in activity recognition. In UBICOMP, pages 1026–1035. ACM, 2012.
 Oresti Banos, Claudia Villalonga, Rafael Garcia, Alejandro Saez, Miguel Damas, Juan A. Holgado-Terriza, Sungyong Lee, Hector Pomares, and Ignacio Rojas. Design, implementation and validation of a novel open framework for agile development of mobile health applications. Biomedical engineering online, 14(2):1, 2015.
 Kevin Beyer, Jonathan Goldstein, Raghu Ramakrishnan, and Uri Shaft. When is “nearest neighbor” meaningful? In International conference on database theory, pages 217–235. Springer, 1999.
 Raffael Bild, Klaus A. Kuhn, and Fabian Prasser. Safepub: A truthful data anonymization algorithm with strong privacy guarantees. Proceedings on Privacy Enhancing Technologies, 1:67–87, 2018.
 Vincent Bindschaedler, Reza Shokri, and Carl A. Gunter. Plausible deniability for privacy-preserving data synthesis. PVLDB, 10(5), 2017.
 Christopher M. Bishop. Pattern recognition. Machine Learning, 128, 2006.
 Jeremiah Blocki, Avrim Blum, Anupam Datta, and Or Sheffet. The johnson-lindenstrauss transform itself preserves differential privacy. In FOCS, pages 410–419. IEEE, 2012.
 Jeremiah Blocki, Anupam Datta, and Joseph Bonneau. Differentially private password frequency lists. In NDSS, 2016.
 Avrim Blum, Cynthia Dwork, Frank McSherry, and Kobbi Nissim. Practical privacy: the sulq framework. In Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pages 128–138. ACM, 2005.
 Avrim Blum, Katrina Ligett, and Aaron Roth. A learning theory approach to noninteractive database privacy. JACM, 60(2):12, 2013.
 Claire McKay Bowen and Fang Liu. Differentially private data synthesis methods. arXiv preprint arXiv:1602.01063, 2016.
 George EP Box. Science and statistics. Journal of the American Statistical Association, 71(356):791–799, 1976.
 David S Broomhead and David Lowe. Radial basis functions, multi-variable functional interpolation and adaptive networks. Technical report, Royal Signals and Radar Establishment Malvern (United Kingdom), 1988.
 Andreas Buja, Dianne Cook, and Deborah F. Swayne. Interactive high-dimensional data visualization. Journal of computational and graphical statistics, 5(1):78–99, 1996.
 Mark Bun, Jonathan Ullman, and Salil Vadhan. Fingerprinting codes and the price of approximate differential privacy. In Proceedings of the forty-sixth annual ACM symposium on Theory of computing, pages 1–10. ACM, 2014.
 Hyeran Byun and Seong-Whan Lee. Applications of support vector machines for pattern recognition: A survey. In Pattern recognition with support vector machines, pages 213–236. Springer, 2002.
 Joseph A. Calandrino, Ann Kilzer, Arvind Narayanan, Edward W. Felten, and Vitaly Shmatikov. “you might also like:” privacy risks of collaborative filtering. In S&P, pages 231–246. IEEE, 2011.
 Augustin-Louis Cauchy. Sur les formules qui resultent de l’emploie du signe et sur> ou<, et sur les moyennes entre plusieurs quantites. Cours d’Analyse, 1er Partie: Analyse algebrique, pages 373–377, 1821.
 Kamalika Chaudhuri, Claire Monteleoni, and Anand D. Sarwate. Differentially private empirical risk minimization. JMLR, 12(Mar):1069–1109, 2011.
 Kamalika Chaudhuri, Anand Sarwate, and Kaushik Sinha. Near-optimal differentially private principal components. In NIPS, pages 989–997, 2012.
 Graham Cormode, Cecilia Procopiuc, Divesh Srivastava, Entong Shen, and Ting Yu. Differentially private spatial decompositions. In ICDE, pages 20–31. IEEE, 2012.
 Corinna Cortes and Vladimir Vapnik. Support-vector networks. Machine Learning, 20(3):273–297, 1995.
 Paul Cuff and Lanqing Yu. Differential privacy as a mutual information constraint. In CCS, pages 43–54. ACM, 2016.
 Wei-Yen Day and Ninghui Li. Differentially private publishing of high-dimensional data using sensitivity control. In CCS, pages 451–462. ACM, 2015.
 Fernando de Almeida Freitas, Sarajane Marques Peres, Clodoaldo Aparecido de Moraes Lima, and Felipe Venancio Barbosa. Grammatical facial expressions recognition with machine learning. In FLAIRS Conference, 2014.
 Persi Diaconis and David Freedman. Asymptotics of graphical projection pursuit. The annals of statistics, pages 793–815, 1984.
 David L Donoho et al. High-dimensional data analysis: The curses and blessings of dimensionality. AMS math challenges lecture, 1(2000):32, 2000.
 Cynthia Dwork. Differential privacy, pages 1–12. Automata, languages and programming. Springer, 2006.
 Cynthia Dwork. Differential privacy: A survey of results. In International Conference on Theory and Applications of Models of Computation, pages 1–19. Springer, 2008.
 Cynthia Dwork, Krishnaram Kenthapadi, Frank McSherry, Ilya Mironov, and Moni Naor. Our data, ourselves: Privacy via distributed noise generation. In Annual International Conference on the Theory and Applications of Cryptographic Techniques, pages 486–503. Springer, 2006.
 Cynthia Dwork and Jing Lei. Differential privacy and robust statistics. In Proceedings of the forty-first annual ACM symposium on Theory of computing, pages 371–380. ACM, 2009.
 Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. Calibrating noise to sensitivity in private data analysis. In Theory of Cryptography Conference, pages 265–284. Springer, 2006.
 Cynthia Dwork, Moni Naor, Omer Reingold, Guy N. Roth-blum, and Salil Vadhan. On the complexity of differentially private data release: efficient algorithms and hardness results. In STOC, pages 381–390. ACM, 2009.
 Cynthia Dwork and Aaron Roth. The algorithmic foundations of differential privacy. Foundations and Trends in Theoretical Computer Science, 9(3-4):211–407, 2014.
 Cynthia Dwork and Adam Smith. Differential privacy for statistics: What we know and what we want to learn. Journal of Privacy and Confidentiality, 1(2):2, 2010.
 Cynthia Dwork, Kunal Talwar, Abhradeep Thakurta, and Li Zhang. Analyze gauss: optimal bounds for privacy-preserving principal component analysis. In STOC, pages 11–20. ACM, 2014.
 Ulfar Erlingsson, Vasyl Pihur, and Aleksandra Korolova. Rappor: Randomized aggregatable privacy-preserving ordinal response. In CCS, pages 1054–1067. ACM, 2014.
 Giulia Fanti, Vasyl Pihur, and Ulfar Erlingsson. Building a rappor with the unknown: Privacy-preserving learning of associations and data dictionaries. Proceedings on Privacy Enhancing Technologies, 2016(3):41–61, 2016.
 Ronald A. Fisher. The use of multiple measurements in taxonomic problems. Annals of eugenics, 7(2):179–188, 1936.
 Ronald Aylmer Fisher. Theory of statistical estimation. In Mathematical Proceedings of the Cambridge Philosophical Society, volume 22, pages 700–725. Cambridge University Press, 1925.
 Jerome Friedman, Trevor Hastie, and Robert Tibshirani. The elements of statistical learning, volume 1. Springer series in statistics Springer, Berlin, 2001.
 Alan Genz, Frank Bretz, Tetsuhisa Miwa, Xuefei Mi, Friedrich Leisch, Fabian Scheip, Bjoern Bornkamp, Martin Maechler, and Torsten Hothorn. Package mvtnorm, 02/02/2016 2016.
 Jorgen Pedersen Gram. Uber die entwicklung reeller funktionen in reihen mittels der methode der kleinsten quadrate. Journal fur reihe und angewandte Mathematik, 94:41–73, 1883.
 Saikat Guha, Mudit Jain, and Venkata N. Padmanabhan. Koi: A location-privacy platform for smartphone apps. In Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation, pages 14–14. USENIX Association, 2012.
 Anupam Gupta, Aaron Roth, and Jonathan Ullman. Iterative constructions and private data release. Theory of Cryptography, pages 339–356, 2012.
 Andreas Haeberlen, Benjamin C. Pierce, and Arjun Narayan. Differential privacy under fire. In USENIX Security Symposium, 2011.
 Marjorie G. Hahn and Michael J. Klass. The multidimensional central limit theorem for arrays normed by affine transformations. The Annals of Probability, pages 611–623, 1981.
 Peter Hall and Ker-Chau Li. On almost linearity of low dimensional projections from high dimensional data. The annals of Statistics, pages 867–889, 1993.
 Rob Hall, Alessandro Rinaldo, and Larry Wasserman. Differential privacy for functions and functional data. Journal of Machine Learning Research, 14(Feb):703–727, 2013.
 Moritz Hardt, Katrina Ligett, and Frank McSherry. A simple and practical algorithm for differentially private data release. In NIPS, pages 2339–2347, 2012.
 Moritz Hardt and Guy N. Rothblum. A multiplicative weights mechanism for privacy-preserving data analysis. In FOCS, pages 61–70. IEEE, 2010.
 Moritz Hardt, Guy N. Rothblum, and Rocco A. Servedio. Private data release via learning thresholds. In Proceedings of the twenty-third annual ACM-SIAM symposium on Discrete Algorithms, pages 168–187. Society for Industrial and Applied Mathematics, 2012.
 Michael Hay, Ashwin Machanavajjhala, Gerome Miklau, Yan Chen, and Dan Zhang. Principled evaluation of differentially private algorithms using dpbench. In ICMD, pages 139–154. ACM, 2016.
 Michael Hay, Vibhor Rastogi, Gerome Miklau, and Dan Suciu. Boosting the accuracy of differentially private histograms through consistency. Proceedings of the VLDB Endowment, 3(1-2):1021–1032, 2010.
 Roger A. Horn and Charles R. Johnson. Matrix analysis. Cambridge university press, 2012.
 Harold Hotelling. Analysis of a complex of statistical variables into principal components. Journal of educational psychology, 24(6):417, 1933.
 Lawrence Hubert and Phipps Arabie. Comparing partitions. Journal of classification, 2(1):193–218, 1985.
 William James and Charles Stein. Estimation with quadratic loss. In Proceedings of the fourth Berkeley symposium on mathematical statistics and probability, volume 1, pages 361–379, 1961.
 X. Jiang, Z. Ji, S. Wang, N. Mohammed, S. Cheng, and L. Ohno-Machado. Differential-private data publishing through component analysis. Transactions on data privacy, 6(1):19–34, Apr 2013.
 Francois Kawala, Ahlame Douzal-Chouakria, Eric Gaussier, and Eustache Dimert. Predictions d’activite dans les reseaux sociaux en ligne. In 4ieme Conference sur les Modeles et l’Analyse des Reseaux: Approches Mathamatiques et Informatiques, page 16, 2013.
 Krishnaram Kenthapadi, Aleksandra Korolova, Ilya Mironov, and Nina Mishra. Privacy via the johnson-lindenstrauss transform. Journal of Privacy and Confidentiality, 5(1):2, 2013.
 Ross D. King, Cao Feng, and Alistair Sutherland. Statlog: comparison of classification algorithms on large real-world problems. Applied Artificial Intelligence an International Journal, 9(3):289–333, 1995.
 Mario Köppen. The curse of dimensionality. In 5th Online World Conference on Soft Computing in Industrial Applications (WSC5), volume 1, pages 4–8, 2000.
 S. Y. Kung. Kernel Methods and Machine Learning. Cambridge University Press, Cambridge, UK, 2014.
 Yann LeCun, Leon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.
 David Leoni. Non-interactive differential privacy: a survey. In Proceedings of the First International Workshop on Open Data, pages 40–52. ACM, 2012.
 Chao Li, Michael Hay, Gerome Miklau, and Yue Wang. A data-and workload-aware algorithm for range queries under differential privacy. Proceedings of the VLDB Endowment, 7(5):341–352, 2014.
 Chao Li, Gerome Miklau, Michael Hay, Andrew McGregor, and Vibhor Rastogi. The matrix mechanism: optimizing linear counting queries under differential privacy. The VLDB Journal, 24(6):757–781, 2015.
 H. Li, L. Xiong, and X. Jiang. Differentially private synthesization of multi-dimensional data using copula functions. Advances in database technology : proceedings.International Conference on Extending Database Technology, 2014:475–486, 2014.
 Yang D. Li, Zhenjie Zhang, Marianne Winslett, and Yin Yang. Compressive mechanism: Utilizing sparse representation in differential privacy. In WPES, pages 177–182. ACM, 2011.
 Changchang Liu and Prateek Mittal. Linkmirage: Enabling privacy-preserving analytics on social relationships. In 23nd Annual Network and Distributed System Security Symposium, NDSS, pages 21–24, 2016.
 Ashwin Machanavajjhala, Daniel Kifer, John Abowd, Johannes Gehrke, and Lars Vilhuber. Privacy: Theory meets practice on the map. In Data Engineering, 2008. ICDE 2008. IEEE 24th International Conference on, pages 277–286. IEEE, 2008.
 David McClure and Jerome P. Reiter. Differential privacy and statistical disclosure risk measures: An investigation with binary synthetic data. Trans.Data Privacy, 5(3):535–552, 2012.
 Frank McSherry and Ilya Mironov. Differentially private recommender systems: building privacy into the netflix prize contenders. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 627–636. ACM, 2009.
 Frank McSherry and Kunal Talwar. Mechanism design via differential privacy. In FOCS, pages 94–103. IEEE, 2007.
 Frank D McSherry. Privacy integrated queries: an extensible platform for privacy-preserving data analysis. In Proceedings of the 2009 ACM SIGMOD International Conference on Management of data, pages 19–30. ACM, 2009.
 Elizabeth Meckes. Approximation of projections of random vectors. Journal of Theoretical Probability, 25(2):333–352, 2012.
 Elizabeth Meckes. Projections of probability distributions: A measure-theoretic Dvoretzky theorem, pages 317–326. Geometric Aspects of Functional Analysis. Springer, 2012.
 Carl D. Meyer. Matrix analysis and applied linear algebra, volume 2. Siam, 2000.
 Noman Mohammed, Rui Chen, Benjamin Fung, and Philip S. Yu. Differentially private data release for data mining. In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 493–501. ACM, 2011.
 Kevin P. Murphy. Machine Learning: A Probabilistic Perspective. The MIT Press, One Rogers Street Cambridge MA 02142-1209, 2012.
 Arvind Narayanan, Hristo Paskov, Neil Zhenqiang Gong, John Bethencourt, Emil Stefanov, Eui Chul Richard Shin, and Dawn Song. On the feasibility of internet-scale author identification. In S&P, pages 300–314. IEEE, 2012.
 David CL Ngo, Andrew BJ Teoh, and Alwyn Goh. Bio-metric hash: high-confidence face recognition. IEEE transactions on circuits and systems for video technology, 16(6):771–775, 2006.
 Kobbi Nissim, Sofya Raskhodnikova, and Adam Smith. Smooth sensitivity and sampling in private data analysis. In STOC, pages 75–84. ACM, 2007.
 II Ororbia, G. Alexander, Fridolin Linder, and Joshua Snoke. Privacy protection for natural language records: Neural generative models for releasing synthetic twitter data. arXiv preprint arXiv:1606.01151, 2016.
 Fabian Pedregosa, Gael Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, and Vincent Dubourg. Scikit-learn: Machine learning in python. JMLR, 12(Oct):2825–2830, 2011.
 Nicholas G. Polson and Steven L. Scott. Data augmentation for support vector machines. Bayesian Analysis, 6(1):1–23, 2011.
 Davide Proserpio, Sharon Goldberg, and Frank McSherry. Calibrating data to sensitivity in private data analysis: a platform for differentially-private analysis of weighted datasets. PVLDB, 7(8):637–648, 2014.
 Wahbeh Qardaji, Weining Yang, and Ninghui Li. Differentially private grids for geospatial data. In ICDE, pages 757–768. IEEE, 2013.
 Wahbeh Qardaji, Weining Yang, and Ninghui Li. Understanding hierarchical methods for differentially private histograms. Proceedings of the VLDB Endowment, 6(14):1954–1965, 2013.
 Raul Rojas. Why the normal distribution. Freis Universitat Berlin lecture notes, 2010.
 Andrew Rosenberg and Julia Hirschberg. V-measure: A conditional entropy-based external cluster evaluation measure. In EMNLP-CoNLL, volume 7, pages 410–420, 2007.
 Gunter Rote. A new metric between polygons. In Werner Kuich, editor, ICALP: International Colloquium on Automata, Languages, and Programming, pages 404–415. Springer, Berlin, Heidelberg, July 1992.
 Peter J. Rousseeuw. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics, 20:53–65, 1987.
 Havard Rue and Leonhard Held. Gaussian Markov random fields: theory and applications. CRC press, 2005.
 Alessandra Sala, Xiaohan Zhao, Christo Wilson, Haitao Zheng, and Ben Y. Zhao. Sharing graphs using differentially private graph models. In Proceedings of the 2011 ACM SIGCOMM conference on Internet measurement conference, pages 81–98. ACM, 2011.
 Erhard Schmidt. Zur theorie der linearen und nichtlinearen integralgleichungen. Mathematische Annalen, 63(4):433–476, 1907.
 Bernhard Scholkopf and Alexander J. Smola. Learning with kernels: support vector machines, regularization, optimization, and beyond. MIT press, 2002.
 Hermann Amandus Schwarz. Uber ein die Flachen klein-sten Flacheninhalts betreffendes Problem der Variation-srechnung, pages 223–269. Gesammelte Mathematische Abhandlungen. Springer, 1890.
 Reza Shokri, Marco Stronati, Congzheng Song, and Vitaly Shmatikov. Membership inference attacks against machine learning models. In Security and Privacy (SP), 2017 IEEE Symposium on, pages 3–18. IEEE, 2017.
 Jasper Snoek, Hugo Larochelle, and Ryan P Adams. Practical bayesian optimization of machine learning algorithms. In Advances in neural information processing systems, pages 2951–2959, 2012.
 Florian Tramer, Zhicong Huang, Jean-Pierre Hubaux, and Erman Ayday. Differential privacy with bounded priors: reconciling utility and privacy in genome-wide association studies. In CCS, pages 1286–1297. ACM, 2015.
 Jonathan Ullman and Salil Vadhan. Pcps and the hardness of generating private synthetic data. In Theory of Cryptography Conference, pages 400–416. Springer, 2011.
 Oliver Williams and Frank McSherry. Probabilistic inference and differential privacy. In NIPS, pages 2451–2459, 2010.
 Xiaokui Xiao, Guozhang Wang, and Johannes Gehrke. Differential privacy via wavelet transforms. IEEE Transactions on Knowledge and Data Engineering, 23(8):1200–1214, 2011.
 Yonghui Xiao and Li Xiong. Protecting locations with differential privacy under temporal correlations. In CCS, pages 1298–1309. ACM, 2015.
 Yonghui Xiao, Li Xiong, Liyue Fan, and Slawomir Goryczka. Dpcube: differentially private histogram release through multidimensional partitioning. arXiv preprint arXiv:1202.5358, 2012.
 Junyuan Xie, Ross Girshick, and Ali Farhadi. Unsupervised deep embedding for clustering analysis. In ICML, pages 478–487, 2016.
 Chugui Xu, Ju Ren, Yaoxue Zhang, Zhan Qin, and Kui Ren. Dppro: Differentially private high-dimensional data release via random projection. IEEE Transactions on Information Forensics and Security, 2017.
 Jia Xu, Zhenjie Zhang, Xiaokui Xiao, Yin Yang, Ge Yu, and Marianne Winslett. Differentially private histogram publication. The VLDB Journal, 22(6):797–822, 2013.
 Jun Zhang, Graham Cormode, Cecilia M. Procopiuc, Divesh Srivastava, and Xiaokui Xiao. Privbayes: Private data release via bayesian networks. In Proceedings of the 2014 ACM SIGMOD international conference on Management of data, pages 1423–1434. ACM, 2014.
 Tong Zhang. Learning bounds for kernel regression using effective data dimensionality. Neural Computation, 17(9):2077–2098, 2005.
 Xiaojian Zhang, Rui Chen, Jianliang Xu, Xiaofeng Meng, and Yingtao Xie. Towards accurate histogram publication under differential privacy. In SDM, pages 587–595. SIAM, 2014.
 Shuheng Zhou, Katrina Ligett, and Larry Wasserman. Differential privacy with compression. In ISIT, pages 2718–2722. IEEE, 2009.