Quantifying Privacy Loss of Human Mobility Graph Topology

Open access

Abstract

Human mobility is often represented as a mobility network, or graph, with nodes representing places of significance which an individual visits, such as their home, work, places of social amenity, etc., and edge weights corresponding to probability estimates of movements between these places. Previous research has shown that individuals can be identified by a small number of geolocated nodes in their mobility network, rendering mobility trace anonymization a hard task. In this paper we build on prior work and demonstrate that even when all location and timestamp information is removed from nodes, the graph topology of an individual mobility network itself is often uniquely identifying. Further, we observe that a mobility network is often unique, even when only a small number of the most popular nodes and edges are considered. We evaluate our approach using a large dataset of cell-tower location traces from 1 500 smartphone handsets with a mean duration of 430 days. We process the data to derive the top−N places visited by the device in the trace, and find that 93% of traces have a unique top−10 mobility network, and all traces are unique when considering top−15 mobility networks. Since mobility patterns, and therefore mobility networks for an individual, vary over time, we use graph kernel distance functions, to determine whether two mobility networks, taken at different points in time, represent the same individual. We then show that our distance metrics, while imperfect predictors, perform significantly better than a random strategy and therefore our approach represents a significant loss in privacy.

[1] Charu C. Aggarwal and Philip S. Yu. 2008. A General Survey of Privacy-Preserving Data Mining Models and Algorithms. In Privacy-Preserving Data Mining, Charu C. Aggarwal, Philip S. Yu, and Ahmed K. Elmagarmid (Eds.). The Kluwer International Series on Advances in Database Systems, Vol. 34. Springer US, 11-52. DOI: http://dx.doi.org/10.1007/978-0-387-70992-5_2

[2] Berker Agir, Kévin Huguenin, Urs Hengartner, and Jean- Pierre Hubaux. 2016. On the Privacy Implications of Location Semantics. PoPETs 2016, 4 (2016), 165-183. DOI: http://dx.doi.org/10.1515/popets-2016-0034

[3] Kevin S. Beyer, Jonathan Goldstein, Raghu Ramakrishnan, and Uri Shaft. 1999. When Is ”Nearest Neighbor” Meaningful?. In Proceedings of the 7th International Conference on Database Theory (ICDT ’99). Springer-Verlag, London, UK, UK, 217-235. http://dl.acm.org/citation.cfm?id=645503.656271

[4] Petko Bogdanov, Misael Mongiovì, and Ambuj K. Singh. 2011. Mining Heavy Subgraphs in Time-Evolving Networks. In Proceedings of the 2011 IEEE 11th International Conference on Data Mining (ICDM ’11). IEEE Computer Society, Washington, DC, USA, 81-90. DOI: http://dx.doi.org/10.1109/ICDM.2011.101

[5] Karsten M. Borgwardt and Hans-Peter Kriegel. 2005. Shortest-Path Kernels on Graphs. In Proceedings of the Fifth IEEE International Conference on Data Mining (ICDM 2005). IEEE Computer Society, Washington, DC, USA, 74-81. http://dx.doi.org/10.1109/ICDM.2005.132

[6] Yves-Alexandre de Montjoye, César A. Hidalgo, Michel Verleysen, and Vincent D. Blondel. 2013. Unique in the Crowd: The privacy bounds of human mobility. Scientific reports 3, 1 (dec 2013), 1376. DOI: http://dx.doi.org/10.1038/srep01376

[7] Yoni De Mulder, George Danezis, Lejla Batina, and Bart Preneel. 2008. Identification via location-profiling in GSM networks. In Proceedings of the 2008 ACM Workshop on Privacy in the Electronic Society, WPES 2008, Alexandria, VA, USA, October 27, 2008. 23-32. DOI: http://dx.doi.org/10.1145/1456403.1456409

[8] Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. 2006. Calibrating noise to sensitivity in private data analysis. In Theory of Cryptography Conference. Springer, 265-284.

[9] Sébastien Gambs, Marc-Olivier Killijian, and Miguel Núñez Del Prado Cortez. 2014. De-anonymization Attack on Geolocated Data. J. Comput. Syst. Sci. 80, 8 (Dec. 2014), 1597-1614. DOI: http://dx.doi.org/10.1016/j.jcss.2014.04.024

[10] Philippe Golle and Kurt Partridge. 2009. On the Anonymity of Home/Work Location Pairs. Springer Berlin Heidelberg, Berlin, Heidelberg, 390-397. DOI: http://dx.doi.org/10.1007/978-3-642-01516-8_26

[11] Marco Gruteser and Dirk Grunwald. 2003. Anonymous Usage of Location-Based Services Through Spatial and Temporal Cloaking. In Proceedings of the 1st International Conference on Mobile Systems, Applications and Services (MobiSys ’03). ACM, New York, NY, USA, 31-42. DOI: http://dx.doi.org/10.1145/1066116.1189037

[12] David Haussler. 1999. Convolution kernels on discrete structures. Technical Report. Technical report, Department of Computer Science, University of California at Santa Cruz.

[13] Jong Hee Kang, William Welbourne, Benjamin Stewart, and Gaetano Borriello. 2005. Extracting places from traces of locations. ACM SIGMOBILE Mobile Computing and Communications Review 9, 3 (2005), 58. DOI: http://dx.doi.org/10.1145/1094549.1094558

[14] Juha K. Laurila, Jan Blom, Olivier Dousse, Daniel Gatica-Perez, Olivier Bornet, Julien Eberle, Imad Aad, and Markus Miettinen. The mobile data challenge: Big data for mobile computing research,” in Proc. MDC Workshop, 2012.

[15] Ninghui Li, Tiancheng Li, and Suresh Venkatasubramanian. 2007. t-closeness: Privacy beyond k-anonymity and l-diversity. In Data Engineering, 2007. ICDE 2007. IEEE 23rd International Conference on. IEEE, 106-115.

[16] Miao Lin, Hong Cao, Vincent W. Zheng, Kevin Chen-Chuan Chang, and Shonali Krishnaswamy. 2015. Mobile user verification/identification using statistical mobility profile. In 2015 International Conference on Big Data and Smart Computing, BIGCOMP 2015, Jeju, South Korea, February 9-11, 2015. 15-18. DOI: http://dx.doi.org/10.1109/35021BIGCOMP.2015.7072841

[17] Ashwin Machanavajjhala, Daniel Kifer, Johannes Gehrke, and Muthuramakrishnan Venkitasubramaniam. 2007. Ldiversity: Privacy Beyond K-anonymity. ACM Trans. Knowl. Discov. Data 1, 1, Article 3 (March 2007). DOI: http://dx.doi.org/10.1145/1217299.1217302

[18] Brendan D. McKay and Adolfo Piperno. 2014. Practical graph isomorphism, II. Journal of Symbolic Computation 60, 0 (2014), 94 - 112. DOI: http://dx.doi.org/10.1016/j.jsc.2013.09.003

[19] Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013).

[20] Steven Morse, Marta C. Gonzalez, and Natasha Markuzon. 2016. Persistent cascades: Measuring fundamental communication structure in social networks. In 2016 IEEE International Conference on Big Data, BigData 2016, Washington DC, USA, December 5-8, 2016. 969-975. DOI: http://dx.doi.org/10.1109/BigData.2016.7840695

[21] Farid M Naini, Jayakrishnan Unnikrishnan, Patrick Thiran, and Martin Vetterli. 2016. Where You Are Is Who You Are: User Identification by Matching Statistics. IEEE Transactions on Information Forensics and Security 11, 2 (feb 2016), 358-372. DOI: http://dx.doi.org/10.1109/TIFS.2015.2498131

[22] Arvind Narayanan and Vitaly Shmatikov. 2008. Robust Deanonymization of Large Sparse Datasets. In Proceedings of the 2008 IEEE Symposium on Security and Privacy (SP ’08). IEEE Computer Society, Washington, DC, USA, 111-125. DOI: http://dx.doi.org/10.1109/SP.2008.33

[23] Arvind Narayanan and Vitaly Shmatikov. 2009. Deanonymizing social networks. In Security and Privacy, 2009 30th IEEE Symposium on. IEEE, 173-187.

[24] Lukasz Olejnik, Claude Castelluccia, and Artur Janc. 2014. On the uniqueness of Web browsing history patterns. Annales des Télécommunications 69, 1-2 (2014), 63-74. DOI: http://dx.doi.org/10.1007/s12243-013-0392-5

[25] Andreas Pfitzmann and Marit Hansen. 2010. A terminology for talking about privacy by data minimization: Anonymity, Unlinkability, Undetectability, Unobservability, Pseudonymity, and Identity Management. http://dud.inf.tudresden.de/literatur/Anon_Terminology_v0.34.pdf. (Aug. 2010). http://dud.inf.tu-dresden.de/literatur/Anon_Terminology_v0.34.pdf v0.34.

[26] Apostolos Pyrgelis, Carmela Troncoso, and Emiliano De Cristofaro. 2017. What Does The Crowd Say About You? Evaluating Aggregation-based Location Privacy. arXiv preprint arXiv:1703.00366 (2017).

[27] Luca Rossi, Matthew J. Williams, Christoph Stich, and Mirco Musolesi. 2015. Privacy and the City: User Identification and Location Semantics in Location-Based Social Networks. In Proceedings of the Ninth International Conference on Web and Social Media, ICWSM 2015, University of Oxford, Oxford, UK, May 26-29, 2015. 387-396. http://www.aaai.org/ocs/index.php/ICWSM/ICWSM15/paper/view/10498

[28] Ingo Scholtes. 2017. When is a Network a Network?: Multi- Order Graphical Model Selection in Pathways and Temporal Networks. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’17). ACM, New York, NY, USA, 1037-1046. DOI: http://dx.doi.org/10.1145/3097983.3098145

[29] Kumar Sharad and George Danezis. 2014. An Automated Social Graph De-anonymization Technique. In Proceedings of the 13th Workshop on Privacy in the Electronic Society (WPES ’14). ACM, New York, NY, USA, 47-58. DOI:http: //dx.doi.org/10.1145/2665943.2665960

[30] Nino Shervashidze, Pascal Schweitzer, Van Leeuwen, Erik Jan, Kurt Mehlhorn, and Karsten Borgwardt. 2011. Weisfeiler-Lehman graph kernels. Journal of Machine Learning Research 12 (2011), 2539-2561. DOI:http: //dx.doi.org/10.1.1.232.1510

[31] Reza Shokri, Carmela Troncoso, Claudia Diaz, Julien Freudiger, and Jean-Pierre Hubaux. 2010. Unraveling an old cloak: k-anonymity for location privacy. In Proceedings of the 9th annual ACM workshop on Privacy in the electronic society. ACM, 115-118.

[32] Latanya Sweeney. 2002. k-anonymity: A model for protectingprivacy. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 10, 05 (2002), 557-570.

[33] Marisa Thoma, Hong Cheng, Arthur Gretton, Jiawei Han, Hans Peter Kriegel, Alex Smola, Le Song, Philip S. Yu, Xifeng Yan, and Karsten M. Borgwardt. 2010. Discriminative frequent subgraph mining with optimality guarantees. Statistical Analysis and Data Mining 3, 5 (2010), 302-318. DOI: http://dx.doi.org/10.1002/sam.10084

[34] S.V.N. Vishwanathan, Nicol Schraudolph, Risi Kondor, and K.M. Borgwardt. 2010. Graph Kenrels. Journal of Machine Learning Research 11 (2010), 1201-1242. DOI: http://dx.doi.org/10.1142/9789812772435_0002

[35] Daniel T. Wagner, Andrew Rice, and Alastair R. Beresford. 2014. Device Analyzer: Understanding Smartphone Usage. Springer International Publishing, Cham, 195-208. DOI: http://dx.doi.org/10.1007/978-3-319-11569-6_16

[36] Boris Weisfeiler and AA Lehman. 1968. A reduction of a graph to a canonical form and an algebra arising during this reduction. Nauchno-Technicheskaya Informatsia 2, 9 (1968), 12-16.

[37] Pascal Welke, Ionut Andone, Konrad Blaszkiewicz, and Alexander Markowetz. 2016. Differentiating Smartphone Users by App Usage. In Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing (UbiComp ’16). ACM, New York, NY, USA, 519-523. DOI: http://dx.doi.org/10.1145/2971648.2971707

[38] Fengli Xu, Zhen Tu, Yong Li, Pengyu Zhang, Xiaoming Fu, and Depeng Jin. 2017. Trajectory Recovery From Ash: User Privacy Is NOT Preserved in Aggregated Mobility Data. In Proceedings of the 26th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 1241-1250.

[39] Jian Xu, Thanuka L. Wickramarathne, and Nitesh V. Chawla. 2016. Representing higher-order dependencies in networks. Science Advances 2, 5 (2016), e1600028- e1600028. DOI: http://dx.doi.org/10.1126/sciadv.1600028

[40] Xifeng Yan and Jiawei Han. 2002. gSpan: Graph-Based Substructure Pattern Mining. In Proceedings of the 2002 IEEE International Conference on Data Mining (ICDM ’02). IEEE Computer Society, Washington, DC, USA, 721-. http://dl.acm.org/citation.cfm?id=844380.844811

[41] Pinar Yanardag and S.V.N. Vishwanathan. 2015. Deep Graph Kernels. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’15). ACM, New York, NY, USA, 1365-1374. DOI: http://dx.doi.org/10.1145/2783258.2783417

[42] Ting-Fang Yen, Yinglian Xie, Fang Yu, Roger (Peng) Yu, and Martin Abadi. 2012. Host Fingerprinting and Tracking on the Web:Privacy and Security Implications, In The 19th Annual Network and Distributed System Security Symposium (NDSS) 2012. (February 2012). https://www.microsoft.com/en-us/research/publication/hostfingerprinting-and-tracking-on-the-webprivacy-and-securityimplications/

[43] Hui Zang and Jean Bolot. 2011. Anonymization of Location Data Does Not Work: A Large-scale Measurement Study. In Proceedings of the 17th Annual International Conference on Mobile Computing and Networking (MobiCom ’11). ACM, New York, NY, USA, 145-156. DOI: http://dx.doi.org/10.1145/2030613.2030630

[44] Elena Zheleva and Lise Getoor. 2009. To join or not to join: the illusion of privacy in social networks with mixed public and private user profiles. In Proceedings of the 18th International Conference on World Wide Web, WWW 2009, Madrid, Spain, April 20-24, 2009. 531-540.

Journal Information

Metrics

All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 389 389 57
PDF Downloads 203 203 50