No Place to Hide: Inadvertent Location Privacy Leaks on Twitter

Open access

Abstract

There is a natural tension between the desire to share information and keep sensitive information private on online social media. Privacy seeking social media users may seek to keep their location private by avoiding the mentions of location revealing words such as points of interest (POIs), believing this to be enough. In this paper, we show that it is possible to uncover the location of a social media user’s post even when it is not geotagged and does not contain any POI information. Our proposed approach Jasoos achieves this by exploiting the shared vocabulary between users who reveal their location and those who do not. To this end, Jasoos uses a variant of the Naive Bayes algorithm to identify location revealing words or hashtags based on both temporal and atemporal perspectives. Our evaluation using tweets collected from four different states in the United States shows that Jasoos can accurately infer the locations of close to half a million tweets corresponding to more than 20,000 distinct users (i.e., more than 50% of the test users) from the four states. Our work demonstrates that location privacy leaks do occur despite due precautions by a privacy conscious user. We design and evaluate countermeasures based Jasoos to mitigate location privacy leaks.

If the inline PDF is not rendering correctly, you can download the PDF file here.

  • [1] United States Board on Geographic Names - Domestic and Antarctic Names U.S. Geological Survey. https://web.archive.org/web/20180912182706/https://geonames.usgs.gov/docs/stategaz/AllStates_20180801.zip.

  • [2] DHS’ Pilots for Social Media Screening Need Increased Rigor to Ensure Scalability and Long-term Success. https://www.oig.dhs.gov/sites/default/files/assets/2017/OIG-17-40-Feb17.pdf 2017.

  • [3] Social Media Fact Sheet Pew Research Center. http://www.pewinternet.org/fact-sheet/social-media/ 2018.

  • [4] B. Ağır K. Huguenin U. Hengartner and J.-P. Hubaux. On the privacy implications of location semantics. Proceedings on Privacy Enhancing Technologies 2016(4):165–183 2016.

  • [5] M. Allen. Health Insurers Are Vacuuming Up Details About You – And It Could Raise Your Rates NPR. https://www.npr.org/sections/health-shots/2018/07/17/629441555/health-insurers-are-vacuuming-up-details-about-you-and-it-could-raise-your-rates 2018.

  • [6] J. Bakerman K. Pazdernik A. Wilson G. Fairchild and R. Bahran. Twitter geolocation: A hybrid approach. ACM Transactions on Knowledge Discovery from Data (TKDD) 12(3):34 2018.

  • [7] T. Brewster. Beyond Cambridge Analytica – The Surveillance Companies Infiltrating And Manipulating Social Media Forbes. https://www.forbes.com/sites/thomasbrewster/2018/04/18/cambridge-analytica-and-surveillance-companies-manipulate-facebook-and-social-media/6fced4e84053 2018.

  • [8] B. Cao F. Chen and D. Joshi. Inferring crowd-sourced venues for tweets. In 2015 IEEE Int. Conf. on Big Data 2015.

  • [9] A. Chaabane G. Acs and M. A. Kaafar. You Are What You Like! Information Leakage Through Users’ Interests. In NDSS 2011.

  • [10] Z. Cheng J. Caverlee and K. Lee. You are where you tweet: A content-based approach to geo-locating twitter users. In Proceedings of the 19th ACM International Conference on Information and Knowledge Management CIKM ’10 pages 759–768 New York NY USA 2010. ACM.

  • [11] W. Chong and E. Lim. Tweet geolocation: Leveraging location user and peer signals. In ACM Conf. on Information and Knowledge Management 2017.

  • [12] W.-H. Chong and E.-P. Lim. Tweet geolocation: Leveraging location user and peer signals. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management CIKM ’17 pages 1279–1288 New York NY USA 2017. ACM.

  • [13] N. Confessore. Cambridge Analytica and Facebook: The Scandal and the Fallout So Far The New York Times. https://www.nytimes.com/2018/04/04/us/politics/cambridge-analytica-scandal-fallout.html 2018.

  • [14] M. Dredze M. Osborne and P. Kambadur. Geolocation for Twitter: Timing Matters. In NAACL-HLT 2016.

  • [15] D. Flatow M. Naaman K. E. Xie Y. Volkovich and Y. Kanza. On the accuracy of hyper-local geotagging of social media content. In ACM Conf. on Web Search and Data Mining 2015.

  • [16] J. Gelernter and N. Mushegian. Geoparsing Messages from Microtext. Transactions in GIS 2011.

  • [17] C. Gibbons. The FBI Is Setting Up a Task Force to Monitor Social Media. https://www.thenation.com/article/the-fbi-is-setting-up-a-task-force-to-monitor-social-media/ 2018.

  • [18] J. D. Gonzalez Paule Y. Moshfeghi J. M. Jose and P. V. Thakuriah. On fine-grained geolocalisation of tweets. In Proceedings of the ACM SIGIR International Conference on Theory of Information Retrieval pages 313–316. ACM 2017.

  • [19] S. Gouws D. Metzler C. Cai and E. Hovy. Contextual bearing on linguistic variation in social media. In Workshop on Language in Social Media (LSM) 2011.

  • [20] S. Hahmann R. S. Purves and D. Burghardt. Twitter location (sometimes) matters: Exploring the relationship between georeferenced tweet content and nearby feature classes. J. Spatial Information Science 2014.

  • [21] B. Han P. Cook and T. Baldwin. Text-Based Twitter User Geolocation Prediction. Journal of Artificial Intelligence Research 2014.

  • [22] B. Hecht L. Hong B. Suh and E. Chi. Tweets from Justin Bieber’s heart: the dynamics of the location field in user profiles. In SIGCHI Conference on Human Factors in Computing Systems 2011.

  • [23] L. Hong A. Ahmed S. Gurumurthy A. J. Smola and K. Tsioutsiouliklis. Discovering geographical topics in the twitter stream. In Conf. World Wide Web 2012.

  • [24] M. Hulden M. Silfverberg and J. Francom. Kernel density estimation for text-based geolocation. In AAAI Conf. on Artificial Intelligence 2015.

  • [25] H. Iso S. Wakamiya and E. Aramaki. Density estimation for geolocation via convolutional mixture density network. arXiv:1705.02750 2017.

  • [26] D. Jurgens T. Finethy J. McCorriston Y. T. Xu and D. Ruths. Geolocation prediction in twitter using social networks: A critical analysis and review of current practice. In ICWSM 2015.

  • [27] Y. Kim. Convolutional neural networks for sentence classification. arXiv:1408.5882 2014.

  • [28] M. Kosinski D. Stillwell and T. Graepel. Private traits and attributes are predictable from digital records of human behavior. PNAS 2013.

  • [29] C. Li and A. Sun. Fine-grained location extraction from tweets with temporal awareness. In Proceedings of the 37th International ACM SIGIR Conference on Research & Development in Information Retrieval SIGIR ’14 pages 43–52 New York NY USA 2014. ACM.

  • [30] J. Lingad S. Karimi and J. Yin. Location extraction from disaster-related microblogs. In 22nd international conference on World Wide Web companion International World Wide Web Conferences Steering Committee 2013.

  • [31] Z. Liu and Y. Huang. Where are you tweeting?: A context and user movement based approach. In ACM Conf. on Information and Knowledge Management 2016.

  • [32] J. Mahmud J. Nichols and C. Drews. Where Is This Tweet From? Inferring Home Locations of Twitter Users. In AAAI Conference on Weblogs and Social Media 2012.

  • [33] H. Mao X. Shuai and A. Kapadia. Loose Tweets: An Analysis of Privacy Leaks on Twitter. In ACM Workshop on Privacy in the Electronic Society 2011.

  • [34] Y. Miura M. Taniguchi T. Taniguchi and T. Ohkuma. Unifying text metadata and user network representations with a neural network for geolocation prediction. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) volume 1 pages 1260–1272 2017.

  • [35] R. Nauen. Number of Employers Using Social Media to Screen Candidates at All-Time High Finds Latest CareerBuilder Study PR Newswire. https://www.prnewswire.com/news-releases/number-of-employers-using-social-media-to-screen-candidates-at-all-time-high-finds-latest-careerbuilder-study-300474228.html 2017.

  • [36] L. Newman. Feds Monitoring Social Media Does More Harm Than Good Wired. https://www.wired.com/story/dhs-social-media-immigrants-green-card/ 2017.

  • [37] O. Ozdikis H. Ramampiaro and K. Nørvåg. Locality-adapted kernel densities for tweet localization. In 41st International ACM SIGIR Conference on Research & Development in Information Retrieval 2018.

  • [38] P. Paraskevopoulos and T. Palpanas. Fine-grained geolocalisation of non-geotagged tweets. In IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2015.

  • [39] P. Paraskevopoulos and T. Palpanas. Where has this tweet come from? Fast and fine-grained geolocalization of non-geotagged tweets. Soc. Netw. Anal. Min. 2016.

  • [40] A. Poulston M. Stevenson and K. Bontcheva. Hyperlocal home location identification of twitter profiles. In Proceedings of the 28th ACM Conference on Hypertext and Social Media pages 45–54. ACM 2017.

  • [41] R. Priedhorsky A. Cullotta and S. Y. D. Valle. Inferring the origin locations of tweets with quantitative confidence. In ACM Conf. on Computer Supported Cooperative Work and Social Computing 2014.

  • [42] A. Rahimi T. Baldwin and T. Cohn. Continuous representation of location for geolocation and lexical dialectology using mixture density networks. arXiv:1708.04358 2017.

  • [43] A. Rahimi T. Cohn and T. Baldwin. A neural model for user geolocation and lexical dialectology. arXiv:1704.04008 2017.

  • [44] L. Rainie. Americans’ complicated feelings about social media in an era of privacy concerns Pew Research Center. http://www.pewresearch.org/fact-tank/2018/03/27/americans-complicated-feelings-about-social-media-in-an-era-of-privacy-concerns/ 2018.

  • [45] E. Rodrigues R. Assunção G. L. Pappa D. Renno and W. Meira Jr. Exploring multiple evidence to infer users location in twitter. Neurocomputing 171:30–38 2016.

  • [46] A. Sadilek H. Kautz and J. P. Bigham. Finding your friends and following them to where you are. In Proceedings of the Fifth ACM International Conference on Web Search and Data Mining WSDM ’12 pages 723–732 New York NY USA 2012. ACM.

  • [47] A. Schulz A. Hadjakos H. Paulheim J. Nachtwey and M. Mühlhäuser. A Multi-Indicator Approach for Geolocalization of Tweets. In ICWSM 2013.

  • [48] L. Sloan and J. Morgan. Who Tweets with Their Location? Understanding the Relationship between Demographic Characteristics and the Use of Geoservices and Geotagging on Twitter. PLoS One 2015.

  • [49] Y. Yamaguchi T. Amagasa H. Kitagawa and Y. Ikawa. Online user location inference exploiting spatiotemporal correlations in social streams. In Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management CIKM ’14 pages 1139–1148 New York NY USA 2014. ACM.

  • [50] F. Zamal W. Liu and D. Ruths. Homophily and Latent Attribute Inference: Inferring Latent Attributes of Twitter Users from Neighbors. In AAAI Conference on Weblogs and Social Media 2012.

  • [51] Y. Zhang M. Humbert T. Rahman C.-T. Li J. Pang and M. Backes. Tagvisor: A privacy advisor for sharing hashtags. In Proceedings of the 2018 World Wide Web Conference on World Wide Web pages 287–296. International World Wide Web Conferences Steering Committee 2018.

  • [52] X. Zheng J. Han and A. Sun. A Survey of Location Prediction on Twitter. In IEEE Transactions on Knowledge and Data Engineering 2018.

  • [53] A. Zubiaga A. Voss R. Procter M. Liakata B. Wang and A. Tsakalidis. Towards Real-Time Country-Level Location Classification of Worldwide Tweets. In IEEE Transactions on Knowledge and Data Engineering 2017.

Search
Journal information
Metrics
All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 220 220 42
PDF Downloads 151 151 44