Evaluating stance-annotated sentences from the Brexit Blog Corpus: A quantitative linguistic analysis

Open access

Abstract

This paper offers a formally driven quantitative analysis of stance-annotated sentences in the Brexit Blog Corpus (BBC). Our goal is to identify features that determine the formal profiles of six stance categories (contrariety, hypotheticality, necessity, prediction, source of knowledge and uncertainty) in a subset of the BBC. The study has two parts: firstly, it examines a large number of formal linguistic features, such as punctuation, words and grammatical categories that occur in the sentences in order to describe the specific characteristics of each category, and secondly, it compares characteristics in the entire data set in order to determine stance similarities in the data set. We show that among the six stance categories in the corpus, contrariety and necessity are the most discriminative ones, with the former using longer sentences, more conjunctions, more repetitions and shorter forms than the sentences expressing other stances. necessity has longer lexical forms but shorter sentences, which are syntactically more complex. We show that stance in our data set is expressed in sentences with around 21 words per sentence. The sentences consist mainly of alphabetical characters forming a varied vocabulary without special forms, such as digits or special characters.

Adar, Eytan, Li Zhang, Lada A. Adamic and Rajan M. Lukose. 2004. Implicit structure and the dynamics of blogspace. Workshop on the Weblogging Ecosystem 13 (1): 16989–16995.

Agarwal, Nitin and Huan Liu. 2008. Blogosphere: Research issues, tools, and applications. ACM SIGKDD Explorations Newsletter 10 (1): 18–31.

Anand, Pranav, Marilyn Walker, Rob Abbott, Jean E. Fox Tree, Robeson Bowmani and Michael Minor. 2011. Cats rule and dogs drool!: Classifying stance in online debate. In Proceedings of the 2nd Workshop on Computational Approaches to Subjectivity and Sentiment Analysis, WASSA ’11, 1–9. Stroudsburg, PA, USA: Association for Computational Linguistics.

Bassiouney, Reem. 2012. Politicizing identity: Code choice and stance-taking during the Egyptian revolution. Discourse & Society 23 (2): 107–126.

Benveniste, Émile. 1971. Subjectivity in language. In M. E. Meek (ed.). Problems in general linguistics, 223–230. Coral Gables, FL: University of Miami Press.

Berman, Ruth, Hrafnhildur Ragnarsdóttir and Sven Strömqvist. 2002. Discourse stance: Written and spoken language. Written Language & Literacy 5 (2): 253–287.

Biber, Douglas. 2006. Stance in spoken and written university registers. Journal of English for Academic Purposes 5 (2): 97–116.

Cabrejas-Peñuelas, Ana B. and Mercedes Díez-Prados. 2014. Positive self-evaluation versus negative other-evaluation in the political genre of pre-election debates. Discourse & Society 25 (2): 159–185.

Cataldi, Cataldi, Mario, Luigi Di Caro and Claudio Schifanella. 2010. Emerging topic detection on twitter based on temporal and social terms evaluation. In Proceedings of the Tenth International Workshop on Multimedia Data Mining 4, 1–10. Washington, DC, USA: Association for Computing Machinery.

Chaemsaithong, Krisda. 2012. Performing self on the witness stand: Stance and relational work in expert witness testimony. Discourse & Society 23 (5): 465–486.

Chiluwa, Innocent and Presley Ifukor. 2015. ‘War against our Children’: Stance and evaluation in #BringBackOurGirls campaign discourse on Twitter and Facebook. Discourse & Society 26 (3): 267–296.

Conrad, Susan and Douglas Biber. 2000. Adverbial marking of stance in speech and writing. In G. Thompson (ed.). Evaluation in text: Authorial stance and the construction of discourse, 56–73. Oxford: Oxford University Press.

Downing, Angela. 2001. “Surely you knew!”: Surely as a marker of evidentiality and stance. Functions of Language 8 (2): 251–282.

Du Bois, John. 2007. The stance triangle. In R. Englebretson (ed.). Stancetaking in discourse: Subjectivity, evaluation, interaction, 139–182. Amsterdam: John Benjamins.

Ekberg, Lena and Carita Paradis. 2009. Editorial: Evidentiality in language and cognition. Functions of Language 16 (1): 5–7.

Englebretson, Robert. 2007. Stancetaking in discourse: An introduction. In R. Englebretson (ed.). Stancetaking in discourse: Subjectivity, evaluation, interaction, 1–25. Amsterdam: John Benjamins.

Facchinetti, Roberta, Frank Palmer and Manfred Krug (eds.). 2003. Modality in contemporary English (Topics in English Linguistics 44). Berlin: Walter de Gruyter.

Faulkner, Adam. 2014. Automated classification of stance in student essays: An approach using stance target information and the Wikipedia link-based measure. Science 376 (12): 86.

Ferreira, William and Andreas Vlachos. 2016. Emergent: A novel data-set for stance classification. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 1163–1168. Sheffield, UK.

Fuoli, Matteo. 2012. Assessing social responsibility: A quantitative analysis of Appraisal in BP’s and IKEA’s social reports. Discourse & Communication 6 (1): 55–81.

Glynn, Dylan and Mette Sjölin. 2015. Subjectivity and epistemicity: Corpus, discourse, and literary approaches to stance. In D. Glynn and M. Sjölin (eds.). Corpus, discourse, and literary approaches to stance (Lund Studies in English 117), 360–410. Lund: Lund University.

Granger, Sylviane. 2003. The international corpus of learner English: A new resource for foreign language learning and teaching and second language acquisition research. Tesol Quarterly 37 (3): 538–546.

Gray, Bethany and Douglas Biber. 2014. Stance markers. In K. Aijmer and C. Rühlemann (eds.). Corpus pragmatics: A handbook, 219–248. Cambridge: Cambridge University Press.

Gu, Xiang. 2015. Evidentiality, subjectivity and ideology in the Japanese history textbook. Discourse & Society 26 (1): 29–51.

Hasan, Kazi Saidul and Vincent Ng. 2013a. Stance classification of ideological debates: Data, models, features, and constraints. In Proceeding of IJCNLP 2013: The 6th International Joint Conference on Natural Language Processing, 1348–1356. Nagoya, Japan.

Hasan, Kazi Saidul and Vincent Ng. 2013b. Frame semantics for stance classification. In Proceedings of CoNLL 2013: The Seventeenth Conference on Computational Natural Language Learning, 124–132. Sofia, Bulgaria.

Hasan, Kazi Saidul and Vincent Ng. 2013c. Extra-linguistic constraints on stance recognition in ideological debates. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Vol. 2: Short Papers), 816–821. Sofia, Bulgaria.

Hasan, Kazi Saidul and Vincent Ng. 2014. Why are you taking this stance? Identifying and classifying reasons in ideological debates. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 751–762. Doha, Qatar.

Hunston, Susan and Geoffrey Thompson (eds.). 2000. Evaluation in text: Authorial stance and the construction of discourse. Oxford: Oxford University Press.

Hyland, Ken. 2005. Stance and engagement: A model of interaction in academic discourse. Discourse Studies 7 (2): 173–192.

Jiang, Feng Kevin. 2017. Stance and voice in academic writing. International Journal of Corpus Linguistics 22 (1): 85–106.

Kanté, Issa. 2010. Mood and modality in finite noun complement clauses: A French-English contrastive study. International Journal of Corpus Linguistics 15 (2): 267–290.

Kärkkäinen, Elise. 2003. Epistemic stance in English conversation: A description of its interactional functions, with a focus on I think (Pragmatics & Beyond New Series 115). Amsterdam: John Benjamins.

Kessler, Brett, Geoffrey Numberg and Hinrich Schütze. 1997. Automatic detection of text genre. In Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics, 32–38. Association for Computational Linguistics.

Kucher, Kostiantyn, Andreas Kerren, Carita Paradis and Magnus Sahlgren. 2016a. Visual analysis of text annotations for stance classification with ALVA. In EuroVis 2016: The 18th EG/VGTC Conference on Visualization, 49–51. Eurographics – European Association for Computer Graphics.

Kucher, Kostiantyn, Teri Schamp-Bjerede, Andreas Kerren, Carita Paradis and Magnus Sahlgren. 2016b. Visual analysis of online social media to open up the investigation of stance phenomena. Information Visualization 15 (2): 93–116.

Kucher, Kostiantyn, Carita Paradis, Magnus Sahlgren and Andreas Kerren. 2017. Active learning and visual analytics for stance classification with ALVA. ACM Transactions on Interactive Intelligent Systems (TiiS) 7 (3): 1–31.

Martin, James R. and Peter R. White. 2003. The language of evaluation. London: Palgrave Macmillan.

Mathioudakis, Michael and Nick Koudas. 2010. Twittermonitor: Trend detection over the twitter stream. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, 1155–1158. Association for Computing Machinery.

Mohammad, Saif M., Parinaz Sobhani and Svetlana Kiritchenko. 2016. Stance and sentiment in tweets. arXiv preprint arXiv:1605.01655.

Mukherjee, Arjun and Bing Liu. 2010. Improving gender classification of blog authors. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, 207–217. Association for Computational Linguistics.

Nguyen, Dong, Rilana Gravel, Dolf Trieschnigg and Theo Meder. 2013. “How old do you think I am?” A study of language and age in Twitter. In Proceedings of the Seventh International AAAI Conference on Weblogs and Social Media, 439–448. Cambridge, Massachusetts, USA.

Pak, Alexander and Patrick Paroubek. 2010. Twitter as a corpus for sentiment analysis and opinion mining. In Proceedings of The Seventh International Conference on Language Resources and Evaluation (LREC) (Vol. 10), 1320–1326. Valletta, Malta.

Pang, Bo and Lillian Lee. 2008. Opinion mining and sentiment analysis. Foundations and Trends in Information Retrieval 2 (1–2): 1–135.

Paradis, Carita. 2003. Between epistemic modality and degree: The case of really. In R. Facchinetti, F. Palmer and M. Krug (eds.). Modality in contemporary English (Topics in English Linguistics 44), 191–222. Berlin: DeGruyter.

Park, Jaram, Young Min Baek and Meeyoung Cha. 2014. Cross-cultural comparison of nonverbal cues in emoticons on twitter: Evidence from big data analysis. Journal of Communication 64 (2): 333–354.

Paterson, Laura L., Laura Coffey-Glover and David Peplow. 2016. Negotiating stance within discourses of class: Reactions to Benefits Street. Discourse & Society 27 (2): 195–214.

Peersman, Claudia, Walter Daelemans and Leona Van Vaerenbergh. 2011. Predicting age and gender in online social networks. In Proceedings of the Third International Workshop on Search and Mining User-Generated Contents, 37–44. Association for Computational Linguistics.

Persing, Isaac and Vincent Ng, V. 2016. Modeling stance in student essays. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, 2174–2184. Association for Computational Linguistics.

Põldvere, Nele, Matteo Fuoli and Carita Paradis. 2016. A study of dialogic expansion and contraction in spoken discourse using corpus and experimental techniques. Corpora 11 (2): 191–225.

Precht, Kristen. 2003. Stance moods in spoken English: Evidentiality and aspect in British and American conversation. Text (Special issue: Negotiating Heteroglossia: Social Perspectives on Evaluation) 23 (2): 239–257.

Rajadesingan, Ashwin and Huan Liu. 2014. Identifying users with opposing opinions in Twitter debates. In International Conference on Social Computing, Behavioral-Cultural Modeling, and Prediction, 153–160. Berlin: Springer International Publishing.

Read, Jonathon and John Carroll. 2012. Annotating expressions of appraisal in English. Language Resources and Evaluation 46 (3): 421–447.

Saurí, Roser and James Pustejovsky. 2009. FactBank: A corpus annotated with event factuality. Language Resources and Evaluation 43 (3): 227–268.

Scheffé, Henry. 1999 [1959]. The analysis of variance. New York City: John Wiley & Sons.

Schwartz, Andrew, Johannes Eichstaedt, Margaret Kern, Lukasz Dziurzynski, Stephanie Ramones, Megha Agrawal, Achal Shah, Michal Kosinski, David Stillwell, Martin Seligman and Lyle Ungar. 2013. Personality, gender, and age in the language of social media: The open-vocabulary approach. PLOSONE 8 (9): e73791.

Simaki, Vasiliki. 2015. Sociolinguistic research on web textual data (Doctoral dissertation, in Greek). University of Patras, Greece. Retrieved from: http://hdl.handle.net/10889/9422

Simaki, Vasiliki, Christina Aravantinou, Iosif Mporas and Vasileios Megalooikonomou. 2015a. Using sociolinguistic inspired features for gender classification of web authors. In International Conference on Text, Speech, and Dialogue (TSD) (Lecture Notes in Computer Science, vol. 9302), 587–594. Berlin: Springer International Publishing.

Simaki, Vasiliki, Christina Aravantinou, Iosif Mporas and Vasileios Megalooikonomou. 2015b. Automatic estimation of web bloggers’ age using regression models. In International Conference on Speech and Computer (SPECOM), 113–120. Berlin: Springer International Publishing.

Simaki, Vasiliki, Christina Aravantinou, Iosif Mporas, Marianna Kondyli and Vasileios Megalooikonomou. 2017a. Sociolinguistic features for author gender identification: From qualitative evidence to quantitative analysis. Journal of Quantitative Linguistics 24 (1): 65–84.

Simaki Vasiliki, Carita Paradis and Andreas Kerren. 2017b. Stance classification in texts from blogs on the 2016 British Referendum. In A. Karpov, R. Potapova and I. Mporas (eds.). Speech and computer. SPECOM 2017 (Lecture Notes in Computer Science, vol. 10458), 700–709. Berlin: Springer International Publishing.

Simaki, Vasiliki, Carita Paradis, Maria Skeppstedt, Magnus Sahlgren, Kostiantyn Kucher and Andreas Kerren. 2017c. Annotating speaker stance in discourse: The Brexit Blog Corpus. Corpus Linguistics and Linguistic Theory. DOI:10.1515/cllt-2016-0060

Somasundaran, Swapna and Janyce Wiebe. 2010. Recognizing stances in ideological on-line debates. In Proceedings of the NAACL HLT 2010 Workshop on Computational Approaches to Analysis and Generation of Emotion in Text, 116–124. Association for Computational Linguistics.

Sridhar, Dhanya, Lise Getoor and Marilyn Walker. 2014. Collective stance classification of posts in online debate forums. In Proceedings of the Joint Workshop on Social Dynamics and Personal Attributes in Social Media, 109–117. Baltimore, Maryland, USA.

Stamatatos, Efstathios. 2009. A survey of modern authorship attribution methods. Journal of the American Society for Information Science and Technology 60 (3): 538–556.

Stamatatos, Efstathios, Nikos Fakotakis and George Kokkinakis. 2000. Automatic authorship attribution. In Proceedings of the Ninth Conference on European Chapter of the Association for Computational Linguistics, 158–164. Association for Computational Linguistics.

Stamatatos, Efstathios, Nikos Fakotakis and George Kokkinakis. 2001. Computer-based authorship attribution without lexical measures. Computers and the Humanities 35 (2): 193–214.

Taboada, Maite. 2016. Sentiment analysis: An overview from linguistics. Annual Review of Linguistics 2: 325–347.

Tracy, Karen. 2011. What’s in a name? Stance markers in oral argument about marriage laws. Discourse & Communication 5 (1): 65–88.

Tukey, John W. 1949. Comparing individual means in the analysis of variance. Biometrics 5 (2): 99–114.

Van de Kauter, Marjan, Bart Desmet and Véronique Hoste. 2015. The good, the bad and the implicit: A comprehensive approach to annotating explicit and implicit sentiment. Language Resources and Evaluation 49 (3): 685–720.

Verhagen, Arie. 2005. Constructions of intersubjectivity: Discourse, syntax, and cognition. Oxford: Oxford University Press.

Walker, Marilyn, Pranav Anand, Robert Abbott and Ricky Grant. 2012a. Stance classification using dialogic properties of persuasion. In Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 592–596. Association for Computational Linguistics.

Walker, Marilyn, Pranav Anand, Robert Abbott, Jean E. Fox Tree, Craig Martell and Joseph King. 2012b. That is your evidence?: Classifying stance in online political debate. Decision Support Systems 53 (4): 719–729.

Walker, Marilyn, Jean E. Fox Tree, Pranav Anand, Robert Abbott and Joseph King. 2012c. A corpus for research on deliberation and debate. In Proceedings of The Eighth International Conference on Language Resources and Evaluation (LREC), 812–817. Istanbul, Turkey.

White, Peter R. 2003. Beyond modality and hedging: A dialogic view of the language of intersubjective stance. Text 23 (2): 259–284.

Wiebe, Janyce, Theresa Wilson and Claire Cardie. 2005. Annotating expressions of opinions and emotions in language. Language Resources and Evaluation 39 (2): 165–210.

Wiebe, Janyce, Theresa Wilson, Rebecca Bruce, Matthew Bell and Melanie Martin. 2004. Learning subjective language. Computational Linguistics 30 (3): 277–308.

Zheng, Rong, Jiexun Li, Hisnchun Chen and Zan Huang. 2006. A framework for authorship identification of online messages: Writing-style features and classification techniques. Journal of the American Society for Information Science and Technology 57 (3): 378–393.

Journal Information

Metrics

All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 271 271 40
PDF Downloads 142 142 31