#DontTweetThis: Scoring Private Information in Social Networks

Open access


With the growing popularity of online social networks, a large amount of private or sensitive information has been posted online. In particular, studies show that users sometimes reveal too much information or unintentionally release regretful messages, especially when they are careless, emotional, or unaware of privacy risks. As such, there exist great needs to be able to identify potentially-sensitive online contents, so that users could be alerted with such findings. In this paper, we propose a context-aware, text-based quantitative model for private information assessment, namely PrivScore, which is expected to serve as the foundation of a privacy leakage alerting mechanism. We first solicit diverse opinions on the sensitiveness of private information from crowdsourcing workers, and examine the responses to discover a perceptual model behind the consensuses and disagreements. We then develop a computational scheme using deep neural networks to compute a context-free PrivScore (i.e., the “consensus” privacy score among average users). Finally, we integrate tweet histories, topic preferences and social contexts to generate a personalized context-aware PrivScore. This privacy scoring mechanism could be employed to identify potentially-private messages and alert users to think again before posting them to OSNs.

[1] J. H. Abawajy, M. I. H. Ninggal, Z. A. Aghbari, A. B. Darem, and A. Alhashmi. Privacy threat analysis of mobile social network data publishing. In SecureComm, 2017.

[2] M. E. Acer, E. Stark, A. P. Felt, S. Fahl, R. Bhargava, B. Dev, M. Braithwaite, R. Sleevi, and P. Tabriz. Where the wild warnings are: Root causes of chrome https certificate errors. In ACM CCS, pages 1407–1420. ACM, 2017.

[3] H. Almuhimedi, S. Wilson, B. Liu, N. Sadeh, and A. Acquisti. Tweets are forever: a large-scale quantitative analysis of deleted tweets. In ACM CSCW, pages 897–908, 2013.

[4] R. Baden, A. Bender, N. Spring, B. Bhattacharjee, and D. Starin. Persona: an online social network with user-defined privacy. SIGCOMM, 2009.

[5] M. Bagdouri and D. W. Oard. On predicting deletions of microblog posts. In ACM CIKM, 2015.

[6] S. B. Barnes. A privacy paradox: Social networking in the united states. First Monday, 11(9), 2006.

[7] G. Blank, G. Bolsover, and E. Dubois. A new privacy paradox: Young people and privacy on social network sites. In Annual Meeting of the American Sociological Assoc., 2014.

[8] P. F. Brown, P. V. Desouza, R. L. Mercer, V. J. D. Pietra, and J. C. Lai. Class-based n-gram models of natural language. Computational linguistics, 18(4):467–479, 1992.

[9] Z. Cai, Z. He, X. Guan, and Y. Li. Collective data-sanitization for preventing sensitive information inference attacks in social networks. IEEE TDSC, 15(4), 2018.

[10] D. Cer, Y. Yang, S.-y. Kong, N. Hua, N. Limtiaco, R. S. John, N. Constant, M. Guajardo-Cespedes, S. Yuan, C. Tar, et al. Universal sentence encoder. arXiv preprint arXiv:1803.11175, 2018.

[11] H.-W. Chang, D. Lee, M. Eltaher, and J. Lee. @ Phillies tweeting from Philly? Predicting Twitter user locations with spatial word usage. In IEEE ASONAM, 2012.

[12] Z. Cheng, J. Caverlee, and K. Lee. You are where you tweet: a content-based approach to geo-locating twitter users. In ACM CIKM, 2010.

[13] F. Chollet et al. Keras. https://github.com/fchollet/keras, 2015.

[14] M. Ciot, M. Sonderegger, and D. Ruths. Gender inference of twitter users in Non-English contexts. In EMNLP, pages 1136–1145, 2013.

[15] J. Dawes. Do data characteristics change according to the number of scale points used? an experiment using 5-point, 7-point and 10-point scales. IJMR, 50(1):61–104, 2008.

[16] A. Dhir, T. Torsheim, S. Pallesen, and C. S. Andreassen. Do online privacy concerns predict selfie behavior among adolescents, young adults and adults? Front Psy., 8, 2017.

[17] T. Dinev and P. Hart. Internet privacy concerns and social awareness as determinants of intention to transact. International Journal of Electronic Commerce, 10(2):7–29, 2005.

[18] C. Dwork. Differential privacy: A survey of results. In International Conference on Theory and Applications of Models of Computation, pages 1–19. Springer, 2008.

[19] S. Egelman, L. F. Cranor, and J. Hong. You’ve been warned: an empirical study of the effectiveness of web browser phishing warnings. In ACM CHI, 2008.

[20] J. L. Fleiss and J. Cohen. The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliability. Educ Psychol Meas., 33(3), 1973.

[21] J. Fogel and E. Nehmad. Internet social network communities: Risk taking, trust, and privacy concerns. Computers in human behavior, 25(1):153–160, 2009.

[22] N. Gerber, P. Gerber, and M. Volkamer. Explaining the privacy paradox-a systematic review of literature investigating privacy attitude and behavior. Computers & Security, 2018.

[23] Y. Goldberg. Neural network methods for natural language processing. Synthesis Lectures on Human Language Technologies, 10(1):1–309, 2017.

[24] Google. Google pre-trained word2vec, 2013.

[25] L. Guthrie, E. Walker, and J. Guthrie. Document classification by machine: theory and practice. In Conference on Computational linguistics, 1994.

[26] A. Haeberlen, B. C. Pierce, and A. Narayan. Differential privacy under fire. In USENIX Security Symposium, 2011.

[27] E. Hargittai and A. Marwick. “what can i really do?” explaining the privacy paradox with online apathy. International Journal of Communication, 10:21, 2016.

[28] J. He, W. W. Chu, and Z. V. Liu. Inferring privacy information from social networks. In ISI, 2006.

[29] S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural computation, 9(8):1735–1780, 1997.

[30] L. Humphreys, P. Gill, and B. Krishnamurthy. How much is too much? privacy issues on twitter. In Conference of International Communication Association, Singapore, 2010.

[31] G. Iachello, J. Hong, et al. End-user privacy in human–computer interaction. Foundations and Trends in Human–Computer Interaction, 1(1), 2007.

[32] Y. Ikawa, M. Enoki, and M. Tatsubori. Location inference using microblog messages. In 21st International Conference on World Wide Web, pages 687–690, 2012.

[33] A. Islam, J. Walsh, and R. Greenstadt. Privacy detective: Detecting private information and collective privacy behavior in a large social network. In ACM WPES, 2014.

[34] S. Jahid, P. Mittal, and N. Borisov. Easier: Encryption-based access control in social networks with efficient revocation. In ACM AsiaCCS, 2011.

[35] M. Johnson, S. Egelman, and S. M. Bellovin. Facebook and privacy: it’s complicated. In Eighth symposium on usable privacy and security, page 9. ACM, 2012.

[36] Z. G. K. The psychology of language. Houghton-Mifflin, 1935.

[37] D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.

[38] V. Lampos, N. Aletras, J. K. Geyti, B. Zou, and I. J. Cox. Inferring the socioeconomic status of social media users based on behaviour and language. In ECIR, 2016.

[39] Y. LeCun, Y. Bengio, and G. Hinton. Deep learning. Nature, 521(7553):436–444, 2015.

[40] K. Lee, D. Palsetia, R. Narayanan, M. M. A. Patwary, A. Agrawal, and A. Choudhary. Twitter trending topic classification. In IEEE ICDM Workshops, 2011.

[41] K. Lewis, J. Kaufman, and N. Christakis. The taste for privacy: An analysis of college student privacy settings in an online social network. J Comp Mediat Comm., 14(1), 2008.

[42] N. Li, T. Li, and S. Venkatasubramanian. t-closeness: Privacy beyond k-anonymity and l-diversity. In ICDE, 2007.

[43] R.-H. Li, J. Liu, J. X. Yu, H. Chen, and H. Kitagawa. Cooccurrence prediction in a large location-based social network. Frontiers of Computer Science, 7(2):185–194, 2013.

[44] E. Litt. Understanding social network site users’ privacy tool use. Computers in Human Behavior, 29(4):1649–1656, 2013.

[45] K. Liu and E. Terzi. A framework for computing the privacy scores of users in online social networks. ACM Transactions on Knowledge Discovery from Data, 5(1), 2010.

[46] W. Liu and D. Ruths. What’s in a name? using first names as features for gender inference in twitter. In AAAI spring symposium: Analyzing microtext, volume 13, page 01, 2013.

[47] B. Luo and D. Lee. On protecting private information in social networks: a proposal. In IEEE ICME Workshop of M3SN. IEEE, 2009.

[48] A. Machanavajjhala, D. Kifer, J. Gehrke, and M. Venkitasubramaniam. L-diversity: Privacy beyond k-anonymity. ACM Trans. Knowl. Discov. Data, 1(1):3, 2007.

[49] J. Mahmud, J. Nichols, and C. Drews. Home location identification of twitter users. ACM TIST, 5(3):47, 2014.

[50] H. Mao, X. Shuai, and A. Kapadia. Loose tweets: an analysis of privacy leaks on twitter. In ACM WPES, 2011.

[51] T. Mikolov, K. Chen, G. Corrado, and J. Dean. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781, 2013.

[52] M. Minaei, M. Mondal, P. Loiseau, K. Gummadi, and A. Kate. Lethe: Conceal content deletion from persistent observers. Privacy Enhancing Technologies, 2019.

[53] A. Mislove, B. Viswanath, K. P. Gummadi, and P. Druschel. You are who you know: inferring user profiles in online social networks. In ACM WSDM, 2010.

[54] M. Mondal, J. Messias, S. Ghosh, K. P. Gummadi, and A. Kate. Forgetting in social media: Understanding and controlling longitudinal exposure of socially shared data. In SOUPS 2016, pages 287–299, 2016.

[55] K. Moore and J. C. McElroy. The influence of personality on facebook usage, wall postings, and regret. Computers in Human Behavior, 28(1):267–274, 2012.

[56] S. Patil, G. Norcie, A. Kapadia, and A. J. Lee. Reasons, rewards, regrets: privacy considerations in location sharing as an interactive practice. In SOUPS, 2012.

[57] J. Pennington, R. Socher, and C. Manning. Glove: Global vectors for word representation. In EMNLP, 2014.

[58] D. Pergament, A. Aghasaryan, J.-G. Ganascia, and S. Betgé-Brezetz. Forps: Friends-oriented reputation privacy score. In First International Workshop on Security and Privacy Preserving in e-Societies, pages 19–25, 2011.

[59] M. E. Peters, M. Neumann, M. Iyyer, M. Gardner, C. Clark, K. Lee, and L. Zettlemoyer. Deep contextualized word representations. arXiv preprint arXiv:1802.05365, 2018.

[60] T. Pontes, G. Magno, M. Vasconcelos, A. Gupta, J. Almeida, P. Kumaraguru, and V. Almeida. Beware of what you share: Inferring home location in social networks. In ICDM Workshops. IEEE, 2012.

[61] D. Preoµiuc-Pietro, S. Volkova, V. Lampos, Y. Bachrach, and N. Aletras. Studying user income through language, behaviour and affect in social media. PloS one, 10(9), 2015.

[62] R. W. Reeder, A. P. Felt, S. Consolvo, N. Malkin, C. Thompson, and S. Egelman. An experience sampling study of user reactions to browser warnings in the field. In ACM CHI, page 512. ACM, 2018.

[63] S. Robertson, H. Zaragoza, et al. The probabilistic relevance framework: Bm25 and beyond. Foundations and Trends® in Information Retrieval, 3(4):333–389, 2009.

[64] E.-M. Schomakers, C. Lidynia, D. Müllmann, and M. Ziefle. Internet users’ perceptions of information sensitivity–insights from germany. International Journal of Information Management, 46:142–150, 2019.

[65] M. Sleeper, R. Balebako, S. Das, A. L. McConahy, J. Wiese, and L. F. Cranor. The post that wasn’t: exploring self-censorship on facebook. In ACM CSCW, 2013.

[66] M. Sleeper, J. Cranshaw, P. G. Kelley, B. Ur, A. Acquisti, L. F. Cranor, and N. Sadeh. i read my twitter the next morning and was astonished: a conversational perspective on twitter regrets. In ACM CHI, pages 3277–3286, 2013.

[67] R. Socher, A. Perelygin, J. Wu, J. Chuang, C. D. Manning, A. Ng, and C. Potts. Recursive deep models for semantic compositionality over a sentiment treebank. In EMNLP, 2013.

[68] A. Sotirakopoulos, K. Hawkey, and K. Beznosov. On the challenges in usable security lab studies: lessons learned from replicating a study on ssl warnings. In SOUPS. ACM, 2011.

[69] B. Sriram, D. Fuhry, E. Demir, H. Ferhatosmanoglu, and M. Demirbas. Short text classification in twitter to improve information filtering. In ACM SIGIR. ACM, 2010.

[70] J. Sunshine, S. Egelman, H. Almuhimedi, N. Atri, and L. F. Cranor. Crying wolf: An empirical study of ssl warning effectiveness. In USENIX Security, 2009.

[71] L. Sweeney. k-anonymity: A model for protecting privacy. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 10(05):557–570, 2002.

[72] H. Takemura and K. Tajima. Tweet classification based on their lifetime duration. In ACM CIKM, 2012.

[73] S. Talukder and B. Carbunar. Abusniff: Automatic detection and defenses against abusive facebook friends. In AAAI Conference on Web and Social Media, 2018.

[74] Twitter. Api reference index.

[75] O. Varol, E. Ferrara, C. A. Davis, F. Menczer, and A. Flam-mini. Online human-bot interactions: Detection, estimation, and characterization. In ICWSM, 2017.

[76] A. Vasalou, A. J. Gill, F. Mazanderani, C. Papoutsi, and A. Joinson. Privacy dictionary: A new resource for the automated content analysis of privacy. J Am Soc Inf Sci Technol., 62(11):2095–2105, 2011.

[77] S. Volkova and Y. Bachrach. On predicting sociodemographic traits and emotions from communications in social networks and their implications to online self-disclosure. Cyberpsychol Behav Soc Netw., 18(12), 2015.

[78] Q. Wang, J. Bhandal, S. Huang, and B. Luo. Classification of private tweets using tweet content. In IEEE ICSC, 2017.

[79] Q. Wang, J. Bhandal, S. Huang, and B. Luo. Content-based classification of sensitive tweets. International Journal of Semantic Computing, 11(04):541–562, 2017.

[80] Y. Wang, P. G. Leon, A. Acquisti, L. F. Cranor, A. Forget, and N. Sadeh. A field trial of privacy nudges for facebook. In ACN CHI, pages 2367–2376, 2014.

[81] Y. Wang, P. G. Leon, X. Chen, and S. Komanduri. From facebook regrets to facebook privacy nudges. Ohio St. LJ, 74:1307, 2013.

[82] Y. Wang, G. Norcie, S. Komanduri, A. Acquisti, P. G. Leon, and L. F. Cranor. I regretted the minute I pressed share: A qualitative study of regrets on Facebook. In ACM SOUPS, page 10, 2011.

[83] J. Weinberger and A. P. Felt. A week to remember: The impact of browser warning storage policies. In SOUPS, 2016.

[84] M. Wu, R. C. Miller, and S. L. Garfinkel. Do security tool-bars actually prevent phishing attacks? In ACM CHI, 2006.

[85] W. Xie and C. Kang. See you, see me: Teenagers’ self-disclosure and regret of posting on social network site. Computers in Human Behavior, 52:398–407, 2015.

[86] J.-M. Xu, B. Burchfiel, X. Zhu, and A. Bellmore. An examination of regret in bullying tweets. In HLT-NAACL, 2013.

[87] C. Yang and P. Srinivasan. Translating surveys to surveillance on social media: methodological challenges & solutions. In ACM Web science, 2014.

[88] Y. Yang, J. Lutes, F. Li, B. Luo, and P. Liu. Stalking online: on user privacy in social networks. In Proceedings of the second ACM conference on Data and Application Security and Privacy, 2012.

[89] L. Yu, S. M. Motipalli, D. Lee, P. Liu, H. Xu, Q. Liu, J. Tan, and B. Luo. My friend leaks my privacy: Modeling and analyzing privacy in social networks. In SACMAT, 2018.

[90] A. Zarras, K. Kohls, M. Dürmuth, and C. Pöpper. Neuralyzer: flexible expiration times for the revocation of online data. In ACM CODASPY, 2016.

[91] L. Zhou, W. Wang, and K. Chen. Tweet properly: Analyzing deleted tweets to understand and identify regrettable ones. In World Wide Web, 2016.

Journal Information


All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 41 41 41
PDF Downloads 38 38 38