With the growing popularity of online social networks, a large amount of private or sensitive information has been posted online. In particular, studies show that users sometimes reveal too much information or unintentionally release regretful messages, especially when they are careless, emotional, or unaware of privacy risks. As such, there exist great needs to be able to identify potentially-sensitive online contents, so that users could be alerted with such findings. In this paper, we propose a context-aware, text-based quantitative model for private information assessment, namely PrivScore, which is expected to serve as the foundation of a privacy leakage alerting mechanism. We first solicit diverse opinions on the sensitiveness of private information from crowdsourcing workers, and examine the responses to discover a perceptual model behind the consensuses and disagreements. We then develop a computational scheme using deep neural networks to compute a context-free PrivScore (i.e., the “consensus” privacy score among average users). Finally, we integrate tweet histories, topic preferences and social contexts to generate a personalized context-aware PrivScore. This privacy scoring mechanism could be employed to identify potentially-private messages and alert users to think again before posting them to OSNs.
If the inline PDF is not rendering correctly, you can download the PDF file here.
 J. H. Abawajy M. I. H. Ninggal Z. A. Aghbari A. B. Darem and A. Alhashmi. Privacy threat analysis of mobile social network data publishing. In SecureComm 2017.
 M. E. Acer E. Stark A. P. Felt S. Fahl R. Bhargava B. Dev M. Braithwaite R. Sleevi and P. Tabriz. Where the wild warnings are: Root causes of chrome https certificate errors. In ACM CCS pages 1407–1420. ACM 2017.
 H. Almuhimedi S. Wilson B. Liu N. Sadeh and A. Acquisti. Tweets are forever: a large-scale quantitative analysis of deleted tweets. In ACM CSCW pages 897–908 2013.
 R. Baden A. Bender N. Spring B. Bhattacharjee and D. Starin. Persona: an online social network with user-defined privacy. SIGCOMM 2009.
 M. Bagdouri and D. W. Oard. On predicting deletions of microblog posts. In ACM CIKM 2015.
 S. B. Barnes. A privacy paradox: Social networking in the united states. First Monday 11(9) 2006.
 G. Blank G. Bolsover and E. Dubois. A new privacy paradox: Young people and privacy on social network sites. In Annual Meeting of the American Sociological Assoc. 2014.
 P. F. Brown P. V. Desouza R. L. Mercer V. J. D. Pietra and J. C. Lai. Class-based n-gram models of natural language. Computational linguistics 18(4):467–479 1992.
 Z. Cai Z. He X. Guan and Y. Li. Collective data-sanitization for preventing sensitive information inference attacks in social networks. IEEE TDSC 15(4) 2018.
 D. Cer Y. Yang S.-y. Kong N. Hua N. Limtiaco R. S. John N. Constant M. Guajardo-Cespedes S. Yuan C. Tar et al. Universal sentence encoder. arXiv preprint arXiv:1803.11175 2018.
 H.-W. Chang D. Lee M. Eltaher and J. Lee. @ Phillies tweeting from Philly? Predicting Twitter user locations with spatial word usage. In IEEE ASONAM 2012.
 Z. Cheng J. Caverlee and K. Lee. You are where you tweet: a content-based approach to geo-locating twitter users. In ACM CIKM 2010.
 F. Chollet et al. Keras. https://github.com/fchollet/keras 2015.
 M. Ciot M. Sonderegger and D. Ruths. Gender inference of twitter users in Non-English contexts. In EMNLP pages 1136–1145 2013.
 J. Dawes. Do data characteristics change according to the number of scale points used? an experiment using 5-point 7-point and 10-point scales. IJMR 50(1):61–104 2008.
 A. Dhir T. Torsheim S. Pallesen and C. S. Andreassen. Do online privacy concerns predict selfie behavior among adolescents young adults and adults? Front Psy. 8 2017.
 T. Dinev and P. Hart. Internet privacy concerns and social awareness as determinants of intention to transact. International Journal of Electronic Commerce 10(2):7–29 2005.
 C. Dwork. Differential privacy: A survey of results. In International Conference on Theory and Applications of Models of Computation pages 1–19. Springer 2008.
 S. Egelman L. F. Cranor and J. Hong. You’ve been warned: an empirical study of the effectiveness of web browser phishing warnings. In ACM CHI 2008.
 J. L. Fleiss and J. Cohen. The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliability. Educ Psychol Meas. 33(3) 1973.
 J. Fogel and E. Nehmad. Internet social network communities: Risk taking trust and privacy concerns. Computers in human behavior 25(1):153–160 2009.
 N. Gerber P. Gerber and M. Volkamer. Explaining the privacy paradox-a systematic review of literature investigating privacy attitude and behavior. Computers & Security 2018.
 Y. Goldberg. Neural network methods for natural language processing. Synthesis Lectures on Human Language Technologies 10(1):1–309 2017.
 Google. Google pre-trained word2vec 2013.
 L. Guthrie E. Walker and J. Guthrie. Document classification by machine: theory and practice. In Conference on Computational linguistics 1994.
 A. Haeberlen B. C. Pierce and A. Narayan. Differential privacy under fire. In USENIX Security Symposium 2011.
 E. Hargittai and A. Marwick. “what can i really do?” explaining the privacy paradox with online apathy. International Journal of Communication 10:21 2016.
 J. He W. W. Chu and Z. V. Liu. Inferring privacy information from social networks. In ISI 2006.
 S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural computation 9(8):1735–1780 1997.
 L. Humphreys P. Gill and B. Krishnamurthy. How much is too much? privacy issues on twitter. In Conference of International Communication Association Singapore 2010.
 G. Iachello J. Hong et al. End-user privacy in human–computer interaction. Foundations and Trends in Human–Computer Interaction 1(1) 2007.
 Y. Ikawa M. Enoki and M. Tatsubori. Location inference using microblog messages. In 21st International Conference on World Wide Web pages 687–690 2012.
 A. Islam J. Walsh and R. Greenstadt. Privacy detective: Detecting private information and collective privacy behavior in a large social network. In ACM WPES 2014.
 S. Jahid P. Mittal and N. Borisov. Easier: Encryption-based access control in social networks with efficient revocation. In ACM AsiaCCS 2011.
 M. Johnson S. Egelman and S. M. Bellovin. Facebook and privacy: it’s complicated. In Eighth symposium on usable privacy and security page 9. ACM 2012.
 Z. G. K. The psychology of language. Houghton-Mifflin 1935.
 D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 2014.
 V. Lampos N. Aletras J. K. Geyti B. Zou and I. J. Cox. Inferring the socioeconomic status of social media users based on behaviour and language. In ECIR 2016.
 Y. LeCun Y. Bengio and G. Hinton. Deep learning. Nature 521(7553):436–444 2015.
 K. Lee D. Palsetia R. Narayanan M. M. A. Patwary A. Agrawal and A. Choudhary. Twitter trending topic classification. In IEEE ICDM Workshops 2011.
 K. Lewis J. Kaufman and N. Christakis. The taste for privacy: An analysis of college student privacy settings in an online social network. J Comp Mediat Comm. 14(1) 2008.
 N. Li T. Li and S. Venkatasubramanian. t-closeness: Privacy beyond k-anonymity and l-diversity. In ICDE 2007.
 R.-H. Li J. Liu J. X. Yu H. Chen and H. Kitagawa. Cooccurrence prediction in a large location-based social network. Frontiers of Computer Science 7(2):185–194 2013.
 E. Litt. Understanding social network site users’ privacy tool use. Computers in Human Behavior 29(4):1649–1656 2013.
 K. Liu and E. Terzi. A framework for computing the privacy scores of users in online social networks. ACM Transactions on Knowledge Discovery from Data 5(1) 2010.
 W. Liu and D. Ruths. What’s in a name? using first names as features for gender inference in twitter. In AAAI spring symposium: Analyzing microtext volume 13 page 01 2013.
 B. Luo and D. Lee. On protecting private information in social networks: a proposal. In IEEE ICME Workshop of M3SN. IEEE 2009.
 A. Machanavajjhala D. Kifer J. Gehrke and M. Venkitasubramaniam. L-diversity: Privacy beyond k-anonymity. ACM Trans. Knowl. Discov. Data 1(1):3 2007.
 J. Mahmud J. Nichols and C. Drews. Home location identification of twitter users. ACM TIST 5(3):47 2014.
 H. Mao X. Shuai and A. Kapadia. Loose tweets: an analysis of privacy leaks on twitter. In ACM WPES 2011.
 T. Mikolov K. Chen G. Corrado and J. Dean. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 2013.
 M. Minaei M. Mondal P. Loiseau K. Gummadi and A. Kate. Lethe: Conceal content deletion from persistent observers. Privacy Enhancing Technologies 2019.
 A. Mislove B. Viswanath K. P. Gummadi and P. Druschel. You are who you know: inferring user profiles in online social networks. In ACM WSDM 2010.
 M. Mondal J. Messias S. Ghosh K. P. Gummadi and A. Kate. Forgetting in social media: Understanding and controlling longitudinal exposure of socially shared data. In SOUPS 2016 pages 287–299 2016.
 K. Moore and J. C. McElroy. The influence of personality on facebook usage wall postings and regret. Computers in Human Behavior 28(1):267–274 2012.
 S. Patil G. Norcie A. Kapadia and A. J. Lee. Reasons rewards regrets: privacy considerations in location sharing as an interactive practice. In SOUPS 2012.
 J. Pennington R. Socher and C. Manning. Glove: Global vectors for word representation. In EMNLP 2014.
 D. Pergament A. Aghasaryan J.-G. Ganascia and S. Betgé-Brezetz. Forps: Friends-oriented reputation privacy score. In First International Workshop on Security and Privacy Preserving in e-Societies pages 19–25 2011.
 M. E. Peters M. Neumann M. Iyyer M. Gardner C. Clark K. Lee and L. Zettlemoyer. Deep contextualized word representations. arXiv preprint arXiv:1802.05365 2018.
 T. Pontes G. Magno M. Vasconcelos A. Gupta J. Almeida P. Kumaraguru and V. Almeida. Beware of what you share: Inferring home location in social networks. In ICDM Workshops. IEEE 2012.
 D. Preoµiuc-Pietro S. Volkova V. Lampos Y. Bachrach and N. Aletras. Studying user income through language behaviour and affect in social media. PloS one 10(9) 2015.
 R. W. Reeder A. P. Felt S. Consolvo N. Malkin C. Thompson and S. Egelman. An experience sampling study of user reactions to browser warnings in the field. In ACM CHI page 512. ACM 2018.
 S. Robertson H. Zaragoza et al. The probabilistic relevance framework: Bm25 and beyond. Foundations and Trends® in Information Retrieval 3(4):333–389 2009.
 E.-M. Schomakers C. Lidynia D. Müllmann and M. Ziefle. Internet users’ perceptions of information sensitivity–insights from germany. International Journal of Information Management 46:142–150 2019.
 M. Sleeper R. Balebako S. Das A. L. McConahy J. Wiese and L. F. Cranor. The post that wasn’t: exploring self-censorship on facebook. In ACM CSCW 2013.
 M. Sleeper J. Cranshaw P. G. Kelley B. Ur A. Acquisti L. F. Cranor and N. Sadeh. i read my twitter the next morning and was astonished: a conversational perspective on twitter regrets. In ACM CHI pages 3277–3286 2013.
 R. Socher A. Perelygin J. Wu J. Chuang C. D. Manning A. Ng and C. Potts. Recursive deep models for semantic compositionality over a sentiment treebank. In EMNLP 2013.
 A. Sotirakopoulos K. Hawkey and K. Beznosov. On the challenges in usable security lab studies: lessons learned from replicating a study on ssl warnings. In SOUPS. ACM 2011.
 B. Sriram D. Fuhry E. Demir H. Ferhatosmanoglu and M. Demirbas. Short text classification in twitter to improve information filtering. In ACM SIGIR. ACM 2010.
 J. Sunshine S. Egelman H. Almuhimedi N. Atri and L. F. Cranor. Crying wolf: An empirical study of ssl warning effectiveness. In USENIX Security 2009.
 L. Sweeney. k-anonymity: A model for protecting privacy. International Journal of Uncertainty Fuzziness and Knowledge-Based Systems 10(05):557–570 2002.
 H. Takemura and K. Tajima. Tweet classification based on their lifetime duration. In ACM CIKM 2012.
 S. Talukder and B. Carbunar. Abusniff: Automatic detection and defenses against abusive facebook friends. In AAAI Conference on Web and Social Media 2018.
 Twitter. Api reference index.
 O. Varol E. Ferrara C. A. Davis F. Menczer and A. Flam-mini. Online human-bot interactions: Detection estimation and characterization. In ICWSM 2017.
 A. Vasalou A. J. Gill F. Mazanderani C. Papoutsi and A. Joinson. Privacy dictionary: A new resource for the automated content analysis of privacy. J Am Soc Inf Sci Technol. 62(11):2095–2105 2011.
 S. Volkova and Y. Bachrach. On predicting sociodemographic traits and emotions from communications in social networks and their implications to online self-disclosure. Cyberpsychol Behav Soc Netw. 18(12) 2015.
 Q. Wang J. Bhandal S. Huang and B. Luo. Classification of private tweets using tweet content. In IEEE ICSC 2017.
 Q. Wang J. Bhandal S. Huang and B. Luo. Content-based classification of sensitive tweets. International Journal of Semantic Computing 11(04):541–562 2017.
 Y. Wang P. G. Leon A. Acquisti L. F. Cranor A. Forget and N. Sadeh. A field trial of privacy nudges for facebook. In ACN CHI pages 2367–2376 2014.
 Y. Wang P. G. Leon X. Chen and S. Komanduri. From facebook regrets to facebook privacy nudges. Ohio St. LJ 74:1307 2013.
 Y. Wang G. Norcie S. Komanduri A. Acquisti P. G. Leon and L. F. Cranor. I regretted the minute I pressed share: A qualitative study of regrets on Facebook. In ACM SOUPS page 10 2011.
 J. Weinberger and A. P. Felt. A week to remember: The impact of browser warning storage policies. In SOUPS 2016.
 M. Wu R. C. Miller and S. L. Garfinkel. Do security tool-bars actually prevent phishing attacks? In ACM CHI 2006.
 W. Xie and C. Kang. See you see me: Teenagers’ self-disclosure and regret of posting on social network site. Computers in Human Behavior 52:398–407 2015.
 J.-M. Xu B. Burchfiel X. Zhu and A. Bellmore. An examination of regret in bullying tweets. In HLT-NAACL 2013.
 C. Yang and P. Srinivasan. Translating surveys to surveillance on social media: methodological challenges & solutions. In ACM Web science 2014.
 Y. Yang J. Lutes F. Li B. Luo and P. Liu. Stalking online: on user privacy in social networks. In Proceedings of the second ACM conference on Data and Application Security and Privacy 2012.
 L. Yu S. M. Motipalli D. Lee P. Liu H. Xu Q. Liu J. Tan and B. Luo. My friend leaks my privacy: Modeling and analyzing privacy in social networks. In SACMAT 2018.
 A. Zarras K. Kohls M. Dürmuth and C. Pöpper. Neuralyzer: flexible expiration times for the revocation of online data. In ACM CODASPY 2016.
 L. Zhou W. Wang and K. Chen. Tweet properly: Analyzing deleted tweets to understand and identify regrettable ones. In World Wide Web 2016.