Evaluating stance-annotated sentences from the Brexit Blog Corpus: A quantitative linguistic analysis

Open access

Abstract

This paper offers a formally driven quantitative analysis of stance-annotated sentences in the Brexit Blog Corpus (BBC). Our goal is to identify features that determine the formal profiles of six stance categories (contrariety, hypotheticality, necessity, prediction, source of knowledge and uncertainty) in a subset of the BBC. The study has two parts: firstly, it examines a large number of formal linguistic features, such as punctuation, words and grammatical categories that occur in the sentences in order to describe the specific characteristics of each category, and secondly, it compares characteristics in the entire data set in order to determine stance similarities in the data set. We show that among the six stance categories in the corpus, contrariety and necessity are the most discriminative ones, with the former using longer sentences, more conjunctions, more repetitions and shorter forms than the sentences expressing other stances. necessity has longer lexical forms but shorter sentences, which are syntactically more complex. We show that stance in our data set is expressed in sentences with around 21 words per sentence. The sentences consist mainly of alphabetical characters forming a varied vocabulary without special forms, such as digits or special characters.

If the inline PDF is not rendering correctly, you can download the PDF file here.

  • Adar Eytan Li Zhang Lada A. Adamic and Rajan M. Lukose. 2004. Implicit structure and the dynamics of blogspace. Workshop on the Weblogging Ecosystem 13 (1): 16989–16995.

  • Agarwal Nitin and Huan Liu. 2008. Blogosphere: Research issues tools and applications. ACM SIGKDD Explorations Newsletter 10 (1): 18–31.

  • Anand Pranav Marilyn Walker Rob Abbott Jean E. Fox Tree Robeson Bowmani and Michael Minor. 2011. Cats rule and dogs drool!: Classifying stance in online debate. In Proceedings of the 2nd Workshop on Computational Approaches to Subjectivity and Sentiment AnalysisWASSA ’11 1–9. Stroudsburg PA USA: Association for Computational Linguistics.

  • Bassiouney Reem. 2012. Politicizing identity: Code choice and stance-taking during the Egyptian revolution. Discourse & Society 23 (2): 107–126.

  • Benveniste Émile. 1971. Subjectivity in language. In M. E. Meek (ed.). Problems in general linguistics 223–230. Coral Gables FL: University of Miami Press.

  • Berman Ruth Hrafnhildur Ragnarsdóttir and Sven Strömqvist. 2002. Discourse stance: Written and spoken language. Written Language & Literacy 5 (2): 253–287.

  • Biber Douglas. 2006. Stance in spoken and written university registers. Journal of English for Academic Purposes 5 (2): 97–116.

  • Cabrejas-Peñuelas Ana B. and Mercedes Díez-Prados. 2014. Positive self-evaluation versus negative other-evaluation in the political genre of pre-election debates. Discourse & Society 25 (2): 159–185.

  • Cataldi Cataldi Mario Luigi Di Caro and Claudio Schifanella. 2010. Emerging topic detection on twitter based on temporal and social terms evaluation. In Proceedings of the Tenth International Workshop on Multimedia Data Mining 4 1–10. Washington DC USA: Association for Computing Machinery.

  • Chaemsaithong Krisda. 2012. Performing self on the witness stand: Stance and relational work in expert witness testimony. Discourse & Society 23 (5): 465–486.

  • Chiluwa Innocent and Presley Ifukor. 2015. ‘War against our Children’: Stance and evaluation in #BringBackOurGirls campaign discourse on Twitter and Facebook. Discourse & Society 26 (3): 267–296.

  • Conrad Susan and Douglas Biber. 2000. Adverbial marking of stance in speech and writing. In G. Thompson (ed.). Evaluation in text: Authorial stance and the construction of discourse 56–73. Oxford: Oxford University Press.

  • Downing Angela. 2001. “Surely you knew!”: Surely as a marker of evidentiality and stance. Functions of Language 8 (2): 251–282.

  • Du Bois John. 2007. The stance triangle. In R. Englebretson (ed.). Stancetaking in discourse: Subjectivity evaluation interaction 139–182. Amsterdam: John Benjamins.

  • Ekberg Lena and Carita Paradis. 2009. Editorial: Evidentiality in language and cognition. Functions of Language 16 (1): 5–7.

  • Englebretson Robert. 2007. Stancetaking in discourse: An introduction. In R. Englebretson (ed.). Stancetaking in discourse: Subjectivity evaluation interaction 1–25. Amsterdam: John Benjamins.

  • Facchinetti Roberta Frank Palmer and Manfred Krug (eds.). 2003. Modality in contemporary English (Topics in English Linguistics 44). Berlin: Walter de Gruyter.

  • Faulkner Adam. 2014. Automated classification of stance in student essays: An approach using stance target information and the Wikipedia link-based measure. Science 376 (12): 86.

  • Ferreira William and Andreas Vlachos. 2016. Emergent: A novel data-set for stance classification. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 1163–1168. Sheffield UK.

  • Fuoli Matteo. 2012. Assessing social responsibility: A quantitative analysis of Appraisal in BP’s and IKEA’s social reports. Discourse & Communication 6 (1): 55–81.

  • Glynn Dylan and Mette Sjölin. 2015. Subjectivity and epistemicity: Corpus discourse and literary approaches to stance. In D. Glynn and M. Sjölin (eds.). Corpus discourse and literary approaches to stance (Lund Studies in English 117) 360–410. Lund: Lund University.

  • Granger Sylviane. 2003. The international corpus of learner English: A new resource for foreign language learning and teaching and second language acquisition research. Tesol Quarterly 37 (3): 538–546.

  • Gray Bethany and Douglas Biber. 2014. Stance markers. In K. Aijmer and C. Rühlemann (eds.). Corpus pragmatics: A handbook 219–248. Cambridge: Cambridge University Press.

  • Gu Xiang. 2015. Evidentiality subjectivity and ideology in the Japanese history textbook. Discourse & Society 26 (1): 29–51.

  • Hasan Kazi Saidul and Vincent Ng. 2013a. Stance classification of ideological debates: Data models features and constraints. In Proceeding of IJCNLP 2013: The 6th International Joint Conference on Natural Language Processing 1348–1356. Nagoya Japan.

  • Hasan Kazi Saidul and Vincent Ng. 2013b. Frame semantics for stance classification. In Proceedings of CoNLL 2013: The Seventeenth Conference on Computational Natural Language Learning 124–132. Sofia Bulgaria.

  • Hasan Kazi Saidul and Vincent Ng. 2013c. Extra-linguistic constraints on stance recognition in ideological debates. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Vol. 2: Short Papers) 816–821. Sofia Bulgaria.

  • Hasan Kazi Saidul and Vincent Ng. 2014. Why are you taking this stance? Identifying and classifying reasons in ideological debates. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) 751–762. Doha Qatar.

  • Hunston Susan and Geoffrey Thompson (eds.). 2000. Evaluation in text: Authorial stance and the construction of discourse. Oxford: Oxford University Press.

  • Hyland Ken. 2005. Stance and engagement: A model of interaction in academic discourse. Discourse Studies 7 (2): 173–192.

  • Jiang Feng Kevin. 2017. Stance and voice in academic writing. International Journal of Corpus Linguistics 22 (1): 85–106.

  • Kanté Issa. 2010. Mood and modality in finite noun complement clauses: A French-English contrastive study. International Journal of Corpus Linguistics 15 (2): 267–290.

  • Kärkkäinen Elise. 2003. Epistemic stance in English conversation: A description of its interactional functions with a focus on I think (Pragmatics & Beyond New Series 115). Amsterdam: John Benjamins.

  • Kessler Brett Geoffrey Numberg and Hinrich Schütze. 1997. Automatic detection of text genre. In Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics 32–38. Association for Computational Linguistics.

  • Kucher Kostiantyn Andreas Kerren Carita Paradis and Magnus Sahlgren. 2016a. Visual analysis of text annotations for stance classification with ALVA. In EuroVis 2016: The 18th EG/VGTC Conference on Visualization 49–51. Eurographics – European Association for Computer Graphics.

  • Kucher Kostiantyn Teri Schamp-Bjerede Andreas Kerren Carita Paradis and Magnus Sahlgren. 2016b. Visual analysis of online social media to open up the investigation of stance phenomena. Information Visualization 15 (2): 93–116.

  • Kucher Kostiantyn Carita Paradis Magnus Sahlgren and Andreas Kerren. 2017. Active learning and visual analytics for stance classification with ALVA. ACM Transactions on Interactive Intelligent Systems (TiiS) 7 (3): 1–31.

  • Martin James R. and Peter R. White. 2003. The language of evaluation. London: Palgrave Macmillan.

  • Mathioudakis Michael and Nick Koudas. 2010. Twittermonitor: Trend detection over the twitter stream. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data 1155–1158. Association for Computing Machinery.

  • Mohammad Saif M. Parinaz Sobhani and Svetlana Kiritchenko. 2016. Stance and sentiment in tweets. arXiv preprint arXiv:1605.01655.

  • Mukherjee Arjun and Bing Liu. 2010. Improving gender classification of blog authors. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing 207–217. Association for Computational Linguistics.

  • Nguyen Dong Rilana Gravel Dolf Trieschnigg and Theo Meder. 2013. “How old do you think I am?” A study of language and age in Twitter. In Proceedings of the Seventh International AAAI Conference on Weblogs and Social Media 439–448. Cambridge Massachusetts USA.

  • Pak Alexander and Patrick Paroubek. 2010. Twitter as a corpus for sentiment analysis and opinion mining. In Proceedings of The Seventh International Conference on Language Resources and Evaluation (LREC) (Vol. 10) 1320–1326. Valletta Malta.

  • Pang Bo and Lillian Lee. 2008. Opinion mining and sentiment analysis. Foundations and Trends in Information Retrieval 2 (1–2): 1–135.

  • Paradis Carita. 2003. Between epistemic modality and degree: The case of really. In R. Facchinetti F. Palmer and M. Krug (eds.). Modality in contemporary English (Topics in English Linguistics 44) 191–222. Berlin: DeGruyter.

  • Park Jaram Young Min Baek and Meeyoung Cha. 2014. Cross-cultural comparison of nonverbal cues in emoticons on twitter: Evidence from big data analysis. Journal of Communication 64 (2): 333–354.

  • Paterson Laura L. Laura Coffey-Glover and David Peplow. 2016. Negotiating stance within discourses of class: Reactions to Benefits Street. Discourse & Society 27 (2): 195–214.

  • Peersman Claudia Walter Daelemans and Leona Van Vaerenbergh. 2011. Predicting age and gender in online social networks. In Proceedings of the Third International Workshop on Search and Mining User-Generated Contents 37–44. Association for Computational Linguistics.

  • Persing Isaac and Vincent Ng V. 2016. Modeling stance in student essays. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics 2174–2184. Association for Computational Linguistics.

  • Põldvere Nele Matteo Fuoli and Carita Paradis. 2016. A study of dialogic expansion and contraction in spoken discourse using corpus and experimental techniques. Corpora 11 (2): 191–225.

  • Precht Kristen. 2003. Stance moods in spoken English: Evidentiality and aspect in British and American conversation. Text (Special issue: Negotiating Heteroglossia: Social Perspectives on Evaluation) 23 (2): 239–257.

  • Rajadesingan Ashwin and Huan Liu. 2014. Identifying users with opposing opinions in Twitter debates. In International Conference on Social Computing Behavioral-Cultural Modeling and Prediction 153–160. Berlin: Springer International Publishing.

  • Read Jonathon and John Carroll. 2012. Annotating expressions of appraisal in English. Language Resources and Evaluation 46 (3): 421–447.

  • Saurí Roser and James Pustejovsky. 2009. FactBank: A corpus annotated with event factuality. Language Resources and Evaluation 43 (3): 227–268.

  • Scheffé Henry. 1999 [1959]. The analysis of variance. New York City: John Wiley & Sons.

  • Schwartz Andrew Johannes Eichstaedt Margaret Kern Lukasz Dziurzynski Stephanie Ramones Megha Agrawal Achal Shah Michal Kosinski David Stillwell Martin Seligman and Lyle Ungar. 2013. Personality gender and age in the language of social media: The open-vocabulary approach. PLOSONE 8 (9): e73791.

  • Simaki Vasiliki. 2015. Sociolinguistic research on web textual data (Doctoral dissertation in Greek). University of Patras Greece. Retrieved from: http://hdl.handle.net/10889/9422

  • Simaki Vasiliki Christina Aravantinou Iosif Mporas and Vasileios Megalooikonomou. 2015a. Using sociolinguistic inspired features for gender classification of web authors. In International Conference on Text Speech and Dialogue (TSD) (Lecture Notes in Computer Science vol. 9302) 587–594. Berlin: Springer International Publishing.

  • Simaki Vasiliki Christina Aravantinou Iosif Mporas and Vasileios Megalooikonomou. 2015b. Automatic estimation of web bloggers’ age using regression models. In International Conference on Speech and Computer (SPECOM) 113–120. Berlin: Springer International Publishing.

  • Simaki Vasiliki Christina Aravantinou Iosif Mporas Marianna Kondyli and Vasileios Megalooikonomou. 2017a. Sociolinguistic features for author gender identification: From qualitative evidence to quantitative analysis. Journal of Quantitative Linguistics 24 (1): 65–84.

  • Simaki Vasiliki Carita Paradis and Andreas Kerren. 2017b. Stance classification in texts from blogs on the 2016 British Referendum. In A. Karpov R. Potapova and I. Mporas (eds.). Speech and computer. SPECOM 2017 (Lecture Notes in Computer Science vol. 10458) 700–709. Berlin: Springer International Publishing.

  • Simaki Vasiliki Carita Paradis Maria Skeppstedt Magnus Sahlgren Kostiantyn Kucher and Andreas Kerren. 2017c. Annotating speaker stance in discourse: The Brexit Blog Corpus. Corpus Linguistics and Linguistic Theory. DOI:10.1515/cllt-2016-0060

  • Somasundaran Swapna and Janyce Wiebe. 2010. Recognizing stances in ideological on-line debates. In Proceedings of the NAACL HLT 2010 Workshop on Computational Approaches to Analysis and Generation of Emotion in Text 116–124. Association for Computational Linguistics.

  • Sridhar Dhanya Lise Getoor and Marilyn Walker. 2014. Collective stance classification of posts in online debate forums. In Proceedings of the Joint Workshop on Social Dynamics and Personal Attributes in Social Media 109–117. Baltimore Maryland USA.

  • Stamatatos Efstathios. 2009. A survey of modern authorship attribution methods. Journal of the American Society for Information Science and Technology 60 (3): 538–556.

  • Stamatatos Efstathios Nikos Fakotakis and George Kokkinakis. 2000. Automatic authorship attribution. In Proceedings of the Ninth Conference on European Chapter of the Association for Computational Linguistics 158–164. Association for Computational Linguistics.

  • Stamatatos Efstathios Nikos Fakotakis and George Kokkinakis. 2001. Computer-based authorship attribution without lexical measures. Computers and the Humanities 35 (2): 193–214.

  • Taboada Maite. 2016. Sentiment analysis: An overview from linguistics. Annual Review of Linguistics 2: 325–347.

  • Tracy Karen. 2011. What’s in a name? Stance markers in oral argument about marriage laws. Discourse & Communication 5 (1): 65–88.

  • Tukey John W. 1949. Comparing individual means in the analysis of variance. Biometrics 5 (2): 99–114.

  • Van de Kauter Marjan Bart Desmet and Véronique Hoste. 2015. The good the bad and the implicit: A comprehensive approach to annotating explicit and implicit sentiment. Language Resources and Evaluation 49 (3): 685–720.

  • Verhagen Arie. 2005. Constructions of intersubjectivity: Discourse syntax and cognition. Oxford: Oxford University Press.

  • Walker Marilyn Pranav Anand Robert Abbott and Ricky Grant. 2012a. Stance classification using dialogic properties of persuasion. In Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 592–596. Association for Computational Linguistics.

  • Walker Marilyn Pranav Anand Robert Abbott Jean E. Fox Tree Craig Martell and Joseph King. 2012b. That is your evidence?: Classifying stance in online political debate. Decision Support Systems 53 (4): 719–729.

  • Walker Marilyn Jean E. Fox Tree Pranav Anand Robert Abbott and Joseph King. 2012c. A corpus for research on deliberation and debate. In Proceedings of The Eighth International Conference on Language Resources and Evaluation (LREC) 812–817. Istanbul Turkey.

  • White Peter R. 2003. Beyond modality and hedging: A dialogic view of the language of intersubjective stance. Text 23 (2): 259–284.

  • Wiebe Janyce Theresa Wilson and Claire Cardie. 2005. Annotating expressions of opinions and emotions in language. Language Resources and Evaluation 39 (2): 165–210.

  • Wiebe Janyce Theresa Wilson Rebecca Bruce Matthew Bell and Melanie Martin. 2004. Learning subjective language. Computational Linguistics 30 (3): 277–308.

  • Zheng Rong Jiexun Li Hisnchun Chen and Zan Huang. 2006. A framework for authorship identification of online messages: Writing-style features and classification techniques. Journal of the American Society for Information Science and Technology 57 (3): 378–393.

Search
Journal information
Cited By
Metrics
All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 707 255 22
PDF Downloads 457 192 16