Studying text coherence in Czech – a corpus-based analysis

Magdaléna Rysová 1
  • 1 Department of English, Faculty of International Relations, University of Economics, W. Churchill Sq. 4, , Prague, Czech Republic

Abstract

The paper deals with the field of Czech corpus linguistics and represents one of various current studies analysing text coherence through language interactions. It presents a corpusbased analysis of grammatical coreference and sentence information structure (in terms of contextual boundness) in Czech. It focuses on examining the interaction of these two language phenomena and observes where they meet to participate in text structuring. Specifically, the paper analyses contextually bound and non-bound sentence items and examines whether (and how often) they are involved in relations of grammatical coreference in Czech newspaper articles. The analysis is carried out on the language data of the Prague Dependency Treebank (PDT) containing 3,165 Czech texts. The results of the analysis are helpful in automatic text annotation - the paper presents how (or to what extent) the annotation of grammatical coreference may be used in automatic (pre-)annotation of sentence information structure in Czech. It demonstrates how accurately we may (automatically) assume the value of contextual boundness for the antecedent and anaphor (as the two participants of a grammatical coreference relation). The results of the paper demonstrate that the anaphor of grammatical coreference is automatically predictable - it is a non-contrastive contextually bound sentence item in 99.18% of cases. On the other hand, the value of contextual boundness of the antecedent is not so easy to estimate (according to the PDT, the antecedent is contextually non-bound in 37% of cases, non-contrastive contextually bound in 50% and contrastive contextually bound in 13% of cases).

If the inline PDF is not rendering correctly, you can download the PDF file here.

  • BEJČEK, E. et al., 2013. Prague Dependency Treebank 3.0. Data/software. Charles University in Prague, MFF, ÚFAL, Prague, Czech Republic, http://ufal.mff.cuni.cz/pdt3.0/.

  • BIRNER, B. J. and WARD, G., 2009. Information structure and syntactic structure. Language and Linguistics Compass, vol. 3, no.4, pp. 1167-1187.

  • BURKE, M., 2016. Discourse implicature, Quintilian and the Lucidity Principle: Rhetorical phenomena in pragmatics. Topics in Linguistics, vol.17, no. 1, pp. 1-16.

  • CAMBLIN, C. Ch., GORDON, P.C. and SWAAB, T.Y., 2007. The interplay of discourse congruence and lexical association during sentence processing: Evidence from ERPs and eye tracking. Journal of Memory and Language, vol. 56, no.1, pp. 103-128.

  • CARNAP, R., 1947. Meaning and necessity. Chicago: University of Chicago Press.

  • CHOMSKY, N., 1964. Aspects of the theory of syntax. Massachusetts Inst. of Tech. Cambridge Research Lab of Electronics.

  • DANEŠ, F., 1974. FSP and the organization of the text. In: Papers on Functional Sentence Perspective, pp. 106-128. Prague: Academia.

  • FREGE, G., 1892. Über Sinn und Bedeutung. Zeitschift für Philosophie und philologische Kritik, 100, pp. 25-50.

  • GORDON, P. C. and HENDRICK, R., 1998. Dimensions of grammatical coreference. In: Proceedings of the Twentieth Annual Conference of the Cognitive Science Society, pp. 424-429.

  • GROSS, M., 1973. On grammatical reference. Generative grammar in Europe. Springer Netherlands, pp. 203-217.

  • GROSZ, B. J. and SIDNER, C.L., 1986. Attention, intentions, and the structure of discourse. Computational Linguistics, vol.12, no. 3, pp. 175-204.

  • HAJIČOVÁ, E., 1972. Some remarks on presuppositions. The Prague Bulletin of Mathematical Linguistics, vol.17, pp. 11-23.

  • HAJIČOVÁ, E., 2011. On interplay of information structure, anaphoric links and discourse relations. In: Societas linguistica europaea SLE 2011, 44th Annual Meeting, Book of Abstracts. Universidad de la Rioja, Center for Research in the Applications of Language, Logrono, pp. 139-140.

  • HAJIČOVÁ, E., 2012. Topic-focus revisited (Through the eyes of the Prague Dependency Treebank). In: J. D. Apresjan, ed. Smysly, teksty i drugie zachvatyvajuščie sjužety. Sbornik statej v čest 80-letija Igorja Aleksandroviča Melčuka. Moscow: Jazyky slavjanskoj kultury, pp. 218-232.

  • HAJIČOVÁ, E., PARTEE, B.H. and SGALL, P., 1998. Topic-focus articulation, tripartite structures and semantic content. Kluwer, Dordrecht.

  • HAJIČOVÁ, E., HAVELKA, J. and SGALL, P., 2014. Topic and focus, anaphoric relations and degrees of salience. Accepted for publication Prague Linguistic Circle Papers, vol.2, no.4, John Benjamins Publishing Company, Amsterdam.

  • HAJIČOVÁ, E., PANEVOVÁ, J. and SGALL, P. 1985. Coreference in the grammar and in the text. Part I. The Prague Bulletin of Mathematical Linguistics, vol.44, pp. 2-22.

  • HAJIČOVÁ, E., PANEVOVÁ, J. and SGALL, P. 1986. Coreference in the grammar and in the text. Part II. The Prague Bulletin of Mathematical Linguistics, vol. 46, pp. 1-11.

  • HAJIČOVÁ, E., PANEVOVÁ, J. and P. SGALL, P. 1987. Coreference in the grammar and in the text. Part III. The Prague Bulletin of Mathematical Linguistics, vol. 48, pp. 3-12.

  • HAJIČOVÁ, E., OLIVA, K. and SGALL, P. 1987. Odkazování v gramatice a v textu [Coreference in the grammar and in the text]. Slovo a slovesnost, vol.48, no.3, pp. 199-212.

  • HALLIDAY, M. A. K. and HASAN, R., 1976. Cohesion in English. London: Longman.

  • HLAVSA, Z., 1975. Denotace objektu a její prostředky v současné češtině. Vol. 10. Acad. Naklad.

  • HOBBS, J. R., 1979. Coherence and coreference. Cognitive Science, vol.3, no.1, pp. 67-90.

  • KEHLER, A., 2002. Coherence, reference, and the theory of grammar. Stanford: CSLI Publications.

  • LAMBRECHT, K., 1996. Information structure and sentence form: Topic, focus, and the mental representations of discourse referents. Cambridge, UK: Cambridge University Press.

  • LANGACKER, R., 2008. Cognitive grammar: A basic introduction. New York: Oxford University Press.

  • LE GAC, D. and YOO, H.Y.,2002. Intonative structure of focalization in French and Greek. Amsterdam Studies in the Theory and History of Linguistic Science, 4, pp. 213-232.

  • LEDOUX, K., GORDON, P. C., CAMBLIN, C.C. and SWAAB, T.Y., 2007. Coreference and lexical repetition: Mechanisms of discourse integration. Memory & Cognition, vol.35, no.4, pp. 801-815.

  • LONG, D. L and CHONG, J. L., 2001. Comprehension skill and global coherence: A paradoxical picture of poor comprehenders abilities. Journal of Experimental Psychology Learning, Memory and Cognition, vol. 27, pp. 1424-1429.

  • MATHESIUS, V., 1907. Studie k dějinám anglického slovosledu [A study on history of English word order]. Věstník České akademie, vol.16, no.1, pp. 261-265.

  • MIKULOVÁ, M. et al., 2005. Annotation on the tectogrammatical layer in the Prague Dependency Treebank. Praha: Universitas Carolina Pragensia. <http://ufal.mff.cuni.cz/pdt2.0/browse/doc/manuals/en/t-layer/html/>.

  • MIODUNKA, W., 1974. Funkcje zaimków w grupach nominalnych współczesnej polszczyzny mówionej. Zesz. Nauk. UJ. Prace językoznawcze, Zesz. 43. Krakow: PWN.

  • MITKOV, R., 2014. Anaphora resolution. Routledge.

  • NEDOLUZHKO A. and HAJIČOVÁ, E., 2015. Information structure and anaphoric links - a case study and probe. In: Corpus Linguistics 2015. Abstract book. Lancaster: Lancaster University, pp. 252-254.

  • NEDOLUZHKO, A., 2011. Extended nominal coreference and bridging anaphora (An approach to annotation of Czech data in the Prague Dependency Treebank). Prague: ÚFAL.

  • PADUCHEVA, E., 1985. Vyskazyvanie i ego sootnesennost’ s dejstvitel’nost’ju [The utterance and its realization in the text]. Moscow: Nauka.

  • PAJAS, P. and ŠTĚPÁNEK, J., 2008. Recent advances in a feature-rich framework for treebank annotation. In: Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008). Manchester, pp. 673-680.

  • PALEK, B., 1988. Referenční výstavba textu. Univerzita Karlova, Praha.

  • PAUL, H., 1886. Prinzipien der Sprachgeschichte. Halle: Max Niemeyer.

  • PETI-STANTIĆ, A., 2013. Informativity of sentence information structure: The role of word order. Language as Information, pp. 155-178.

  • PETROVA, S., 2009. Information structure and word order variation in the Old High German Tatian. Information structure and language change: New approaches to word order variation in Germanic, pp. 251-280.

  • POVOLNÁ, R., 2016. A cross cultural analysis of conjuncts as indicators of the interaction and negotiation of meaning in research articles. Topics in Linguistics, vol.17, no.1, pp. 45-63.

  • PUTNAM, H., 1961. Some issues in the theory of grammar. In: R. Jakobson, ed. The structure of language and its mathematical aspects. Proceedings of Symposia in Applied Mathematics. Providence: American Mathematical Society, pp. 25-42.

  • RUSSEL, B., 1905. On denoting. Mind, vol.14, no.56, pp. 479-493.

  • RYSOVÁ, K., 2014. O slovosledu z komunikačního pohledu [On word order from the communicative point of view]. Prague: ÚFAL.

  • RYSOVÁ, K. and RYSOVÁ, M. 2015. Analyzing text coherence via multiple annotation in the Prague Dependency Treebank. In: Lecture Notes in Computer Science, No. 9302, Text, Speech, and Dialogue: 18th International Conference, TSD 2015. Cham / Heidelberg / New York / Dordrecht / London: Springer International Publishing, pp. 71-79.

  • SGALL, P., 1964. Generativní systémy v lingvistice [Generative systems in linguistics]. Slovo a slovesnost, vol.25, no.4, pp. 274-282.

  • SGALL, P., 1967. Functional sentence perspective in a generative description of language. Prague Studies in Mathematical Linguistics, 2, pp. 203-225.

  • SGALL, P., 1975. On the nature of topic and focus. In: H. Ringbom, ed. Style and text (Studies Presented to Nils Erik Enkvist). Stockholm: Scriptor, pp. 409-15.

  • SGALL, P., HAJIČOVÁ, E. and BENEŠOVÁ, E. 1973. Topic, focus and generative semantics. Kronberg/Taunus: Scriptor.

  • SGALL, P., HAJIČOVÁ, E. and PANEVOVÁ, J. 1986. The meaning of the sentence in its semantic and pragmatic aspects. Dordrecht: Reidel Publishing Company.

  • SGALL, P., NEBESKÝ, L., GORALČÍKOVÁ, A. and HAJIČOVÁ, E., 1969. A functional approach to syntax in generative description of language. New York: American Elsevier Publishing Company.

  • SORACE, A. and FILIACI, F., 2006. Anaphora resolution in near-native speakers of Italian. Second Language Research, vol.22, no. 3, pp. 339-368.

  • STEEDMAN, M., 1991. Structure and intonation. Language, vol.67, no.2, pp. 260-296.

  • ŠTĚPÁNEK, J. and PAJAS, P., 2010. Querying diverse treebanks in a uniform way. In: Proceedings of the 7th International Conference on Language Resources and Evaluation (LREC 2010). European Language Resources Association, Valletta, Malta, pp. 1828-1835.

  • TOPOLIŃSKA, Z., 1984. Składnia grupy imiennej. Gramatyka współczesnego języka polskiego, pp. 301-384.

  • VAN HOEK, K., 1995. Conceptual reference points: A cognitive grammar account of pronominal anaphora constraints. Language, pp. 310-340.

  • VON DER GABELENTZ, G., 1868. Ideen zu einer vergleichenden Syntax - Wort- und Satzstellung. Zeitschrift für Völkerpsychologie und Sprachwissenschaft, vol.6, no.1, pp. 376-384.

  • WEGENER, P., 1885. Untersuchungen über die Grundfragen des Sprachlebens. Amsterdam: Benjamins.

  • WEIL, H., 1844. Question de grammaire générale: de l’ordre des mots dans les langues anciennes comparées aux langues modernes (thèse française). Paris: Joubert. Translated by Charles W. Super as Weil, H. 1887. The order of words in the ancient languages compared with that of the modern languages. Boston: Ginn.

OPEN ACCESS

Journal + Issues

Search