Publishing Community-Preserving Attributed Social Graphs with a Differential Privacy Guarantee

Xihui Chen 1 , Sjouke Mauw 2 ,  and Yunior Ramírez-Cruz 3
  • 1 SnT, University of Luxembourg, , Esch-sur-Alzette, Luxembourg
  • 2 DCS, University of Luxembourg, , Esch-sur-Alzette, Luxembourg
  • 3 SnT, University of Luxembourg, , Esch-sur-Alzette, Luxembourg

Abstract

We present a novel method for publishing differentially private synthetic attributed graphs. Our method allows, for the first time, to publish synthetic graphs simultaneously preserving structural properties, user attributes and the community structure of the original graph. Our proposal relies on CAGM, a new community-preserving generative model for attributed graphs. We equip CAGM with efficient methods for attributed graph sampling and parameter estimation. For the latter, we introduce differentially private computation methods, which allow us to release communitypreserving synthetic attributed social graphs with a strong formal privacy guarantee. Through comprehensive experiments, we show that our new model outperforms its most relevant counterparts in synthesising differentially private attributed social graphs that preserve the community structure of the original graph, as well as degree sequences and clustering coefficients.

If the inline PDF is not rendering correctly, you can download the PDF file here.

  • [1] Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing directive 95/46/ec (general data protection regulation). OJ, L 119:1–88, 4.5.2016.

  • [2] Réka Albert and Albert-László Barabási. Statistical mechanics of complex networks. Review of Modern Physics, 74:47–97, 2002.

  • [3] Jeremiah Blocki, Avrim Blum, Anupam Datta, and Or Sheffet. Differentially private data analysis of social networks via restricted sensitivity. In Proc. 4th Innovations in Theoretical Computer Science (ITCS), pages 87–96. ACM Press, 2013.

  • [4] Vincent D Blondel, Jean-Loup Guillaume, Renaud Lambiotte, and Etienne Lefebvre. Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment, 2008(10):P10008, 2008.

  • [5] Jordi Casas-Roma, Jordi Herrera-Joancomartí, and Vicenç Torra. An algorithm for k-degree anonymity on large networks. In Procs. of the 2013 IEEE/ACM Int’l Conf. on Advances in Social Networks Analysis and Mining, pages 671–675, 2013.

  • [6] Jordi Casas-Roma, Jordi Herrera-Joancomartí, and Vicenç Torra. Anonymizing graphs: measuring quality for clustering. Knowledge and Information Systems, 44(3):507–528, 2015.

  • [7] Jordi Casas-Roma, Jordi Herrera-Joancomartí, and Vicenç Torra. k-degree anonymity and edge selection: improving data utility in large networks. Knowledge and Information Systems, 50(2):447–474, 2017.

  • [8] Lauren E. Charles-Smith, Tera L. Reynolds, Mark A. Cameron, Mike Conway, Eric H. Y. Lau, Jennifer M. Olsen, Julie A. Pavlin, Mika Shigematsu, Laura C. Streichert, Katie J. Suda, and Courtney D. Corley. Using social media for actionable disease surveillance and outbreak management: A systematic literature review. PLOS ONE, 10(10):1–20, 10 2015.

  • [9] James Cheng, Ada Wai-chee Fu, and Jia Liu. Kisomorphism: privacy preserving network publication against structural attacks. In Procs. of the 2010 ACM SIGMOD Int’l Conf. on Management of Data, pages 459–470, 2010.

  • [10] Sean Chester, Bruce M Kapron, Ganesh Ramesh, Gautam Srivastava, Alex Thomo, and S Venkatesh. Why waldo befriended the dummy? k-anonymization of social networks with pseudo-nodes. Social Network Analysis and Mining, 3(3):381–399, 2013.

  • [11] Fan Chung and Linyuan Lu. The average distances in random graphs with given expected degrees. Proceedings of the National Academy of Sciences, 99(25):15879–15882, 2002.

  • [12] Aaron Clauset, Cristopher Moore, and M. E. J. Newman. Hierarchical structure and the prediction of missing links in networks. Nature, 453:98–101, 2008.

  • [13] Cynthia Dwork. Differential privacy. In Proc. 33rd International Colloquium on Automata, Languages and Programming (ICALP), volume 4052 of Lecture Notes in Computer Science, pages 1–12. Springer, 2006.

  • [14] Luoyi Fu, Xinzhe Fu, Zhongzhao Hu, Zhiying Xu, and Xinbing Wang. De-anonymization of social networks with communities: When quantifications meet algorithms. arXiv preprint arXiv:1703.09028, 2017.

  • [15] Michael Hay, Chao Li, Gerome Miklau, and David D. Jensen. Accurate estimation of the degree distribution of private networks. In Proc. 19th IEEE International Conference on Data Mining (ICDM), pages 169–178. IEEE Computer Society, 2009.

  • [16] Michael Hay, Gerome Miklau, David D. Jensen, Donald F. Towsley, and Philipp Weis. Resisting structural re-identification in anonymized social networks. PVLDB, 1(1):102–114, 2008.

  • [17] Joseph J. Pfeiffer III, Timothy La Fond, Sebastián Moreno, and Jennifer Neville. Fast generation of large scale social networks while incorporating transitive closures. In Proc. 4th International Conference on Privacy, Security, Risk and Trust, (PASSAT), pages 154–165. IEEE Computer Society, 2012.

  • [18] Joseph J. Pfeiffer III, Sebastián Moreno, Timothy La Fond, Jennifer Neville, and Brian Gallagher. Attributed graph models: modeling network structure with correlated attributes. In Proc. 23rd International World Wide Web Conference (WWW), pages 831–842. ACM Press, 2014.

  • [19] Shouling Ji, Weiqing Li, Prateek Mittal, Xin Hu, and Raheem Beyah. Secgraph: A uniform and open-source evaluation system for graph data anonymization and deanonymization. In 24th {USENIX} Security Symposium ({USENIX} Security 15), pages 303–318, 2015.

  • [20] Zach Jorgensen, Ting Yu, and Graham Cormode. Publishing attributed social graphs with formal privacy guarantees. In Proc. 2016 International Conference on Management of Data (SIGMOD), pages 107–122. ACM Press, 2016.

  • [21] Kundan Kandhway and Joy Kuri. Using node centrality and optimal control to maximize information diffusion in social networks. IEEE Trans. Systems, Man, and Cybernetics: Systems, 47(7):1099–1110, 2017.

  • [22] Brian Karrer and Mark EJNewman. Stochastic blockmodels and community structure in networks. Physical review. E, 83(1):016107, 2011.

  • [23] Vishesh Karwa, Sofya Raskhodnikova, Adam D. Smith, and Grigory Yaroslavtsev. Private analysis of graph structure. ACM Transactions on Database Systems, 39(3):22:1–22:33, 2014.

  • [24] Vishesh Karwa and Aleksandra B. Slavkovic. Differentially private graphical degree sequences and synthetic graphs. In Proc. 2012 International Conference on Privacy in Statistical Databases (PSD), volume 7556 of Lecture Notes in Computer Science, pages 273–285. Springer, 2012.

  • [25] Daniel Kifer and Bing-Rong Lin. Towards an axiomatization of statistical privacy and utility. In Proc. 29th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS), pages 147–158. ACM Press, 2010.

  • [26] Tamara G. Kolda, Ali Pinar, Todd D. Plantenga, and C. Seshadhri. A scalable generative graph model with community structure. SIAM J. Scientific Computing, 36(5), 2014.

  • [27] Jérôme Kunegis. KONECT: the koblenz network collection. In Proc. 22nd International World Wide Web Conference (WWW), pages 1343–1350. ACM Press, 2013.

  • [28] Christine Largeron, Pierre-Nicolas Mougel, Oualid Benyahia, and Osmar R Zaïane. Dancer: dynamic attributed networks with community structure generation. Knowledge and Information Systems, 53(1):109–151, 2017.

  • [29] Christine Largeron, Pierre-Nicolas Mougel, Reihaneh Rabbany, and Osmar R Zaïane. Generating attributed networks with communities. PloS one, 10(4), 2015.

  • [30] Jure Leskovec and Christos Faloutsos. Scalable modeling of real graphs using kronecker multiplication. In Proc. 24th International Conference on Machine Learning (ICML), pages 497–504. ACM Press, 2007.

  • [31] Jure Leskovec and Andrej Krevl. SNAP Datasets: Stanford large network dataset collection. http://snap.stanford.edu/data, 2014.

  • [32] Changchang Liu, Supriyo Chakraborty, and Prateek Mittal. Dependence makes you vulnberable: Differential privacy under dependent tuples. In Procs. of NDSS 2016, volume 16, pages 21–24, 2016.

  • [33] Kun Liu and Evimaria Terzi. Towards identity anonymization on graphs. In Proc. 2008 ACM SIGMOD International Conference on Management of Data (SIGMOD), pages 93–106. ACM Press, 2008.

  • [34] Xuesong Lu, Yi Song, and Stéphane Bressan. Fast identity anonymization on graphs. In Procs. of the Int’l Conf. on Database and Expert Systems Applications, pages 281–295, 2012.

  • [35] Tinghuai Ma, Yuliang Zhang, Jie Cao, Jian Shen, Meili Tang, Yuan Tian, Abdullah Al-Dhelaan, and Mznah Al-Rodhaan. Kdvem: a k-degree anonymity with vertex and edge modification algorithm. Computing, 97(12):1165–1184, 2015.

  • [36] Nelly Marquetoux, Mark A. Stevenson, Peter Wilson, Anne Ridler, and Cord Heuer. Using social network analysis to inform disease control interventions. Preventive Veterinary Medicine, 126:94–104, 2016.

  • [37] Paolo Massa and Paolo Avesani. Trust-aware recommender systems. In Proc. 2007 ACM Conference on Recommender Systems (RecSys), pages 17–24. ACM Press, 2007.

  • [38] Sjouke Mauw, Yunior Ramírez-Cruz, and Rolando Trujillo-Rasua. Anonymising social graphs in the presence of active attackers. Transactions on Data Privacy, 11(2):169–198, 2018.

  • [39] Sjouke Mauw, Yunior Ramírez-Cruz, and Rolando Trujillo-Rasua. Conditional adjacency anonymity in social graphs under active attacks. Knowledge and Information Systems, 61(1):485–511, 2018.

  • [40] Frank McSherry. Privacy integrated queries: an extensible platform for privacy-preserving data analysis. Communications of the ACM, 53(9):89–97, 2010.

  • [41] Frank McSherry and Kunal Talwar. Mechanism design via differential privacy. In Proc. 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS), pages 94–103. IEEE Computer Society, 2007.

  • [42] Darakhshan J. Mir and Rebecca N. Wright. A differentially private graph estimator. In Proc. 2009 ICDM International Workshop on Privacy Aspects of Data Mining (ICDM), pages 122–129. IEEE Computer Society, 2009.

  • [43] Prateek Mittal, Charalampos Papamanthou, and Dawn Xiaodong Song. Preserving link privacy in social network based systems. In Procs. of NDSS 2013. The Internet Society, 2013.

  • [44] M. E. J. Newman and M. Girvan. Finding and evaluating community structure in networks. Physical Reviewe E, 69(2):026113, 2004.

  • [45] Mark E. J. Newman. Community detection in networks: Modularity optimization and maximum likelihood are equivalent. CoRR, abs/1606.02319, 2016.

  • [46] Hiep H. Nguyen, Abdessamad Imine, and Michaël Rusinowitch. Detecting communities under differential privacy. In Proc. 2016 ACM Workshop on Privacy in the Electronic Society (WPES), pages 83–93. ACM Press, 2016.

  • [47] Shirin Nilizadeh, Apu Kapadia, and Yong-Yeol Ahn. Community-enhanced de-anonymization of online social networks. In Proceedings of the 2014 acm sigsac conference on computer and communications security, pages 537–548, 2014.

  • [48] Kobbi Nissim, Sofya Raskhodnikova, and Adam Smith. Smooth sensitivity and sampling in private data analysis. In Proc. 39th Annual ACM Symposium on Theory of Computing (STOC), pages 75–84. ACM Press, 2007.

  • [49] Liudmila Ostroumova Prokhorenkova and Alexey Tikhonov. Community detection through likelihood optimization: In search of a sound model. In Proc. 30th World Wide Web Conference (WWW), pages 1498–1508. ACM Press, 2019.

  • [50] François Rousseau, Jordi Casas-Roma, and Michalis Vazirgiannis. Community-preserving anonymization of graphs. Knowledge and Information Systems, 54(2):315–343, 2017.

  • [51] Alessandra Sala, Xiaohan Zhao, Christo Wilson, Haitao Zheng, and Ben Y. Zhao. Sharing graphs using differentially private graph models. In Proc. 11th ACM SIGCOMM Internet Measurement Conference (IMC), pages 81–98. ACM Press, 2011.

  • [52] Julián Salas and Vicenç Torra. Graphic sequences, distances and k-degree anonymity. Discrete Applied Mathematics, 188:25–31, 2015.

  • [53] Yazhe Wang, Long Xie, Baihua Zheng, and Ken CK Lee. High utility k-anonymization for social network publishing. Knowledge and Information Systems, 41(3):697–725, 2014.

  • [54] Yue Wang and Xintao Wu. Preserving differential privacy in degree-correlation based graph generation. Transaction on Data Privacy, 6(2):127–145, 2013.

  • [55] Yue Wang, Xintao Wu, Jun Zhu, and Yang Xiang. On learning cluster coefficient of private networks. Social Network Analysis and Mining, 3(4):925–938, 2013.

  • [56] Paul W.Holland, Kathryn Blackmond Laskey, and Samuel Leinhardt. Stochastic blockmodels: First steps. Social Networks, 5(2):109–137, 1983.

  • [57] Gilbert Wondracek, Thorsten Holz, Engin Kirda, and Christopher Kruegel. A practical attack to de-anonymize social network users. In 2010 IEEE Symposium on Security and Privacy, pages 223–238. IEEE, 2010.

  • [58] Wentao Wu, Yanghua Xiao, Wei Wang, Zhenying He, and Zhihui Wang. K-symmetry model for identity anonymization in social networks. In Procs. of the 13th Int’l Conf. on Extending Database Technology, pages 111–122, 2010.

  • [59] Qian Xiao, Rui Chen, and Kian-Lee Tan. Differentially private network data release via structural inference. In Proc. 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pages 911–920. ACM Press, 2014.

  • [60] Jaewon Yang and Jure Leskovec. Overlapping community detection at scale: a nonnegative matrix factorization approach. In Proc. 6th ACM International Conference on Web Search and Data Mining (WSDM), pages 587–596. ACM Press, 2013.

  • [61] Jaewon Yang and Jure Leskovec. Defining and evaluating network communities based on ground-truth. Knowledge and Information Systems, 42(1):181–213, 2015.

  • [62] Jaewon Yang, Julian J. McAuley, and Jure Leskovec. Community detection in networks with node attributes. In Proc. 13th IEEE International Conference on Data Mining (ICDM), pages 1151–1156. IEEE Computer Society, 2013.

  • [63] Jun Zhang, Graham Cormode, Cecilia M. Procopiuc, Divesh Srivastava, and Xiaokui Xiao. Private release of graph statistics using ladder functions. In Proc. 36th ACM International Conference on Management of Data (SIGMOD), pages 731–745. ACM Press, 2015.

  • [64] Yang Zhang, Mathias Humbert, Bartlomiej Surma, Praveen Manoharan, Jilles Vreeken, and Michael Backes. Towards plausible graph anonymization. In Procs. of NDSS 2020, 2020.

  • [65] Bin Zhou and Jian Pei. The k-anonymity and l-diversity approaches for privacy preservation in social networks against neighborhood attacks. Knowledge Information Systems, 28(1):47–77, 2011.

  • [66] Dmitry Zinoviev. Information Diffusion in Social Networks, pages 146–163. 11 2011.

  • [67] Lei Zou, Lei Chen, and M. Tamer Özsu. K-automorphism: A general framework for privacy preserving network publication. PVLDB, 2(1):946–957, 2009.

OPEN ACCESS

Journal + Issues

Search