No Place to Hide: Inadvertent Location Privacy Leaks on Twitter

Open access

Abstract

There is a natural tension between the desire to share information and keep sensitive information private on online social media. Privacy seeking social media users may seek to keep their location private by avoiding the mentions of location revealing words such as points of interest (POIs), believing this to be enough. In this paper, we show that it is possible to uncover the location of a social media user’s post even when it is not geotagged and does not contain any POI information. Our proposed approach Jasoos achieves this by exploiting the shared vocabulary between users who reveal their location and those who do not. To this end, Jasoos uses a variant of the Naive Bayes algorithm to identify location revealing words or hashtags based on both temporal and atemporal perspectives. Our evaluation using tweets collected from four different states in the United States shows that Jasoos can accurately infer the locations of close to half a million tweets corresponding to more than 20,000 distinct users (i.e., more than 50% of the test users) from the four states. Our work demonstrates that location privacy leaks do occur despite due precautions by a privacy conscious user. We design and evaluate countermeasures based Jasoos to mitigate location privacy leaks.

[1] United States Board on Geographic Names - Domestic and Antarctic Names, U.S. Geological Survey. https://web.archive.org/web/20180912182706/https://geonames.usgs.gov/docs/stategaz/AllStates_20180801.zip.

[2] DHS’ Pilots for Social Media Screening Need Increased Rigor to Ensure Scalability and Long-term Success. https://www.oig.dhs.gov/sites/default/files/assets/2017/OIG-17-40-Feb17.pdf, 2017.

[3] Social Media Fact Sheet, Pew Research Center. http://www.pewinternet.org/fact-sheet/social-media/, 2018.

[4] B. Ağır, K. Huguenin, U. Hengartner, and J.-P. Hubaux. On the privacy implications of location semantics. Proceedings on Privacy Enhancing Technologies, 2016(4):165–183, 2016.

[5] M. Allen. Health Insurers Are Vacuuming Up Details About You – And It Could Raise Your Rates, NPR. https://www.npr.org/sections/health-shots/2018/07/17/629441555/health-insurers-are-vacuuming-up-details-about-you-and-it-could-raise-your-rates, 2018.

[6] J. Bakerman, K. Pazdernik, A. Wilson, G. Fairchild, and R. Bahran. Twitter geolocation: A hybrid approach. ACM Transactions on Knowledge Discovery from Data (TKDD), 12(3):34, 2018.

[7] T. Brewster. Beyond Cambridge Analytica – The Surveillance Companies Infiltrating And Manipulating Social Media, Forbes. https://www.forbes.com/sites/thomasbrewster/2018/04/18/cambridge-analytica-and-surveillance-companies-manipulate-facebook-and-social-media/6fced4e84053, 2018.

[8] B. Cao, F. Chen, and D. Joshi. Inferring crowd-sourced venues for tweets. In 2015 IEEE Int. Conf. on Big Data, 2015.

[9] A. Chaabane, G. Acs, and M. A. Kaafar. You Are What You Like! Information Leakage Through Users’ Interests. In NDSS, 2011.

[10] Z. Cheng, J. Caverlee, and K. Lee. You are where you tweet: A content-based approach to geo-locating twitter users. In Proceedings of the 19th ACM International Conference on Information and Knowledge Management, CIKM ’10, pages 759–768, New York, NY, USA, 2010. ACM.

[11] W. Chong and E. Lim. Tweet geolocation: Leveraging location, user and peer signals. In ACM Conf. on Information and Knowledge Management, 2017.

[12] W.-H. Chong and E.-P. Lim. Tweet geolocation: Leveraging location, user and peer signals. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, CIKM ’17, pages 1279–1288, New York, NY, USA, 2017. ACM.

[13] N. Confessore. Cambridge Analytica and Facebook: The Scandal and the Fallout So Far, The New York Times. https://www.nytimes.com/2018/04/04/us/politics/cambridge-analytica-scandal-fallout.html, 2018.

[14] M. Dredze, M. Osborne, and P. Kambadur. Geolocation for Twitter: Timing Matters. In NAACL-HLT, 2016.

[15] D. Flatow, M. Naaman, K. E. Xie, Y. Volkovich, and Y. Kanza. On the accuracy of hyper-local geotagging of social media content. In ACM Conf. on Web Search and Data Mining, 2015.

[16] J. Gelernter and N. Mushegian. Geoparsing Messages from Microtext. Transactions in GIS, 2011.

[17] C. Gibbons. The FBI Is Setting Up a Task Force to Monitor Social Media. https://www.thenation.com/article/the-fbi-is-setting-up-a-task-force-to-monitor-social-media/, 2018.

[18] J. D. Gonzalez Paule, Y. Moshfeghi, J. M. Jose, and P. V. Thakuriah. On fine-grained geolocalisation of tweets. In Proceedings of the ACM SIGIR International Conference on Theory of Information Retrieval, pages 313–316. ACM, 2017.

[19] S. Gouws, D. Metzler, C. Cai, and E. Hovy. Contextual bearing on linguistic variation in social media. In Workshop on Language in Social Media (LSM), 2011.

[20] S. Hahmann, R. S. Purves, and D. Burghardt. Twitter location (sometimes) matters: Exploring the relationship between georeferenced tweet content and nearby feature classes. J. Spatial Information Science,, 2014.

[21] B. Han, P. Cook, and T. Baldwin. Text-Based Twitter User Geolocation Prediction. Journal of Artificial Intelligence Research, 2014.

[22] B. Hecht, L. Hong, B. Suh, and E. Chi. Tweets from Justin Bieber’s heart: the dynamics of the location field in user profiles. In SIGCHI Conference on Human Factors in Computing Systems, 2011.

[23] L. Hong, A. Ahmed, S. Gurumurthy, A. J. Smola, and K. Tsioutsiouliklis. Discovering geographical topics in the twitter stream. In Conf. World Wide Web, 2012.

[24] M. Hulden, M. Silfverberg, and J. Francom. Kernel density estimation for text-based geolocation. In AAAI Conf. on Artificial Intelligence, 2015.

[25] H. Iso, S. Wakamiya, and E. Aramaki. Density estimation for geolocation via convolutional mixture density network. arXiv:1705.02750, 2017.

[26] D. Jurgens, T. Finethy, J. McCorriston, Y. T. Xu, and D. Ruths. Geolocation prediction in twitter using social networks: A critical analysis and review of current practice. In ICWSM, 2015.

[27] Y. Kim. Convolutional neural networks for sentence classification. arXiv:1408.5882, 2014.

[28] M. Kosinski, D. Stillwell, and T. Graepel. Private traits and attributes are predictable from digital records of human behavior. PNAS, 2013.

[29] C. Li and A. Sun. Fine-grained location extraction from tweets with temporal awareness. In Proceedings of the 37th International ACM SIGIR Conference on Research & Development in Information Retrieval, SIGIR ’14, pages 43–52, New York, NY, USA, 2014. ACM.

[30] J. Lingad, S. Karimi, and J. Yin. Location extraction from disaster-related microblogs. In 22nd international conference on World Wide Web companion International World Wide Web Conferences Steering Committee, 2013.

[31] Z. Liu and Y. Huang. Where are you tweeting?: A context and user movement based approach. In ACM Conf. on Information and Knowledge Management, 2016.

[32] J. Mahmud, J. Nichols, and C. Drews. Where Is This Tweet From? Inferring Home Locations of Twitter Users. In AAAI Conference on Weblogs and Social Media, 2012.

[33] H. Mao, X. Shuai, and A. Kapadia. Loose Tweets: An Analysis of Privacy Leaks on Twitter. In ACM Workshop on Privacy in the Electronic Society, 2011.

[34] Y. Miura, M. Taniguchi, T. Taniguchi, and T. Ohkuma. Unifying text, metadata, and user network representations with a neural network for geolocation prediction. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), volume 1, pages 1260–1272, 2017.

[35] R. Nauen. Number of Employers Using Social Media to Screen Candidates at All-Time High, Finds Latest CareerBuilder Study, PR Newswire. https://www.prnewswire.com/news-releases/number-of-employers-using-social-media-to-screen-candidates-at-all-time-high-finds-latest-careerbuilder-study-300474228.html, 2017.

[36] L. Newman. Feds Monitoring Social Media Does More Harm Than Good, Wired. https://www.wired.com/story/dhs-social-media-immigrants-green-card/, 2017.

[37] O. Ozdikis, H. Ramampiaro, and K. Nørvåg. Locality-adapted kernel densities for tweet localization. In 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, 2018.

[38] P. Paraskevopoulos and T. Palpanas. Fine-grained geolocalisation of non-geotagged tweets. In IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, 2015.

[39] P. Paraskevopoulos and T. Palpanas. Where has this tweet come from? Fast and fine-grained geolocalization of non-geotagged tweets. Soc. Netw. Anal. Min., 2016.

[40] A. Poulston, M. Stevenson, and K. Bontcheva. Hyperlocal home location identification of twitter profiles. In Proceedings of the 28th ACM Conference on Hypertext and Social Media, pages 45–54. ACM, 2017.

[41] R. Priedhorsky, A. Cullotta, and S. Y. D. Valle. Inferring the origin locations of tweets with quantitative confidence. In ACM Conf. on Computer Supported Cooperative Work and Social Computing, 2014.

[42] A. Rahimi, T. Baldwin, and T. Cohn. Continuous representation of location for geolocation and lexical dialectology using mixture density networks. arXiv:1708.04358, 2017.

[43] A. Rahimi, T. Cohn, and T. Baldwin. A neural model for user geolocation and lexical dialectology. arXiv:1704.04008, 2017.

[44] L. Rainie. Americans’ complicated feelings about social media in an era of privacy concerns, Pew Research Center. http://www.pewresearch.org/fact-tank/2018/03/27/americans-complicated-feelings-about-social-media-in-an-era-of-privacy-concerns/, 2018.

[45] E. Rodrigues, R. Assunção, G. L. Pappa, D. Renno, and W. Meira Jr. Exploring multiple evidence to infer users location in twitter. Neurocomputing, 171:30–38, 2016.

[46] A. Sadilek, H. Kautz, and J. P. Bigham. Finding your friends and following them to where you are. In Proceedings of the Fifth ACM International Conference on Web Search and Data Mining, WSDM ’12, pages 723–732, New York, NY, USA, 2012. ACM.

[47] A. Schulz, A. Hadjakos, H. Paulheim, J. Nachtwey, and M. Mühlhäuser. A Multi-Indicator Approach for Geolocalization of Tweets. In ICWSM, 2013.

[48] L. Sloan and J. Morgan. Who Tweets with Their Location? Understanding the Relationship between Demographic Characteristics and the Use of Geoservices and Geotagging on Twitter. PLoS One, 2015.

[49] Y. Yamaguchi, T. Amagasa, H. Kitagawa, and Y. Ikawa. Online user location inference exploiting spatiotemporal correlations in social streams. In Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, CIKM ’14, pages 1139–1148, New York, NY, USA, 2014. ACM.

[50] F. Zamal, W. Liu, and D. Ruths. Homophily and Latent Attribute Inference: Inferring Latent Attributes of Twitter Users from Neighbors. In AAAI Conference on Weblogs and Social Media, 2012.

[51] Y. Zhang, M. Humbert, T. Rahman, C.-T. Li, J. Pang, and M. Backes. Tagvisor: A privacy advisor for sharing hashtags. In Proceedings of the 2018 World Wide Web Conference on World Wide Web, pages 287–296. International World Wide Web Conferences Steering Committee, 2018.

[52] X. Zheng, J. Han, and A. Sun. A Survey of Location Prediction on Twitter. In IEEE Transactions on Knowledge and Data Engineering, 2018.

[53] A. Zubiaga, A. Voss, R. Procter, M. Liakata, B. Wang, and A. Tsakalidis. Towards Real-Time, Country-Level Location Classification of Worldwide Tweets. In IEEE Transactions on Knowledge and Data Engineering, 2017.

Journal Information

Metrics

All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 39 39 39
PDF Downloads 26 26 26