Studying text coherence in Czech – a corpus-based analysis

Open access

Abstract

The paper deals with the field of Czech corpus linguistics and represents one of various current studies analysing text coherence through language interactions. It presents a corpusbased analysis of grammatical coreference and sentence information structure (in terms of contextual boundness) in Czech. It focuses on examining the interaction of these two language phenomena and observes where they meet to participate in text structuring. Specifically, the paper analyses contextually bound and non-bound sentence items and examines whether (and how often) they are involved in relations of grammatical coreference in Czech newspaper articles. The analysis is carried out on the language data of the Prague Dependency Treebank (PDT) containing 3,165 Czech texts. The results of the analysis are helpful in automatic text annotation - the paper presents how (or to what extent) the annotation of grammatical coreference may be used in automatic (pre-)annotation of sentence information structure in Czech. It demonstrates how accurately we may (automatically) assume the value of contextual boundness for the antecedent and anaphor (as the two participants of a grammatical coreference relation). The results of the paper demonstrate that the anaphor of grammatical coreference is automatically predictable - it is a non-contrastive contextually bound sentence item in 99.18% of cases. On the other hand, the value of contextual boundness of the antecedent is not so easy to estimate (according to the PDT, the antecedent is contextually non-bound in 37% of cases, non-contrastive contextually bound in 50% and contrastive contextually bound in 13% of cases).

If the inline PDF is not rendering correctly, you can download the PDF file here.

  • BEJČEK E. et al. 2013. Prague Dependency Treebank 3.0. Data/software. Charles University in Prague MFF ÚFAL Prague Czech Republic http://ufal.mff.cuni.cz/pdt3.0/.

  • BIRNER B. J. and WARD G. 2009. Information structure and syntactic structure. Language and Linguistics Compass vol. 3 no.4 pp. 1167-1187.

  • BURKE M. 2016. Discourse implicature Quintilian and the Lucidity Principle: Rhetorical phenomena in pragmatics. Topics in Linguistics vol.17 no. 1 pp. 1-16.

  • CAMBLIN C. Ch. GORDON P.C. and SWAAB T.Y. 2007. The interplay of discourse congruence and lexical association during sentence processing: Evidence from ERPs and eye tracking. Journal of Memory and Language vol. 56 no.1 pp. 103-128.

  • CARNAP R. 1947. Meaning and necessity. Chicago: University of Chicago Press.

  • CHOMSKY N. 1964. Aspects of the theory of syntax. Massachusetts Inst. of Tech. Cambridge Research Lab of Electronics.

  • DANEŠ F. 1974. FSP and the organization of the text. In: Papers on Functional Sentence Perspective pp. 106-128. Prague: Academia.

  • FREGE G. 1892. Über Sinn und Bedeutung. Zeitschift für Philosophie und philologische Kritik 100 pp. 25-50.

  • GORDON P. C. and HENDRICK R. 1998. Dimensions of grammatical coreference. In: Proceedings of the Twentieth Annual Conference of the Cognitive Science Society pp. 424-429.

  • GROSS M. 1973. On grammatical reference. Generative grammar in Europe. Springer Netherlands pp. 203-217.

  • GROSZ B. J. and SIDNER C.L. 1986. Attention intentions and the structure of discourse. Computational Linguistics vol.12 no. 3 pp. 175-204.

  • HAJIČOVÁ E. 1972. Some remarks on presuppositions. The Prague Bulletin of Mathematical Linguistics vol.17 pp. 11-23.

  • HAJIČOVÁ E. 2011. On interplay of information structure anaphoric links and discourse relations. In: Societas linguistica europaea SLE 2011 44th Annual Meeting Book of Abstracts. Universidad de la Rioja Center for Research in the Applications of Language Logrono pp. 139-140.

  • HAJIČOVÁ E. 2012. Topic-focus revisited (Through the eyes of the Prague Dependency Treebank). In: J. D. Apresjan ed. Smysly teksty i drugie zachvatyvajuščie sjužety. Sbornik statej v čest 80-letija Igorja Aleksandroviča Melčuka. Moscow: Jazyky slavjanskoj kultury pp. 218-232.

  • HAJIČOVÁ E. PARTEE B.H. and SGALL P. 1998. Topic-focus articulation tripartite structures and semantic content. Kluwer Dordrecht.

  • HAJIČOVÁ E. HAVELKA J. and SGALL P. 2014. Topic and focus anaphoric relations and degrees of salience. Accepted for publication Prague Linguistic Circle Papers vol.2 no.4 John Benjamins Publishing Company Amsterdam.

  • HAJIČOVÁ E. PANEVOVÁ J. and SGALL P. 1985. Coreference in the grammar and in the text. Part I. The Prague Bulletin of Mathematical Linguistics vol.44 pp. 2-22.

  • HAJIČOVÁ E. PANEVOVÁ J. and SGALL P. 1986. Coreference in the grammar and in the text. Part II. The Prague Bulletin of Mathematical Linguistics vol. 46 pp. 1-11.

  • HAJIČOVÁ E. PANEVOVÁ J. and P. SGALL P. 1987. Coreference in the grammar and in the text. Part III. The Prague Bulletin of Mathematical Linguistics vol. 48 pp. 3-12.

  • HAJIČOVÁ E. OLIVA K. and SGALL P. 1987. Odkazování v gramatice a v textu [Coreference in the grammar and in the text]. Slovo a slovesnost vol.48 no.3 pp. 199-212.

  • HALLIDAY M. A. K. and HASAN R. 1976. Cohesion in English. London: Longman.

  • HLAVSA Z. 1975. Denotace objektu a její prostředky v současné češtině. Vol. 10. Acad. Naklad.

  • HOBBS J. R. 1979. Coherence and coreference. Cognitive Science vol.3 no.1 pp. 67-90.

  • KEHLER A. 2002. Coherence reference and the theory of grammar. Stanford: CSLI Publications.

  • LAMBRECHT K. 1996. Information structure and sentence form: Topic focus and the mental representations of discourse referents. Cambridge UK: Cambridge University Press.

  • LANGACKER R. 2008. Cognitive grammar: A basic introduction. New York: Oxford University Press.

  • LE GAC D. and YOO H.Y.2002. Intonative structure of focalization in French and Greek. Amsterdam Studies in the Theory and History of Linguistic Science 4 pp. 213-232.

  • LEDOUX K. GORDON P. C. CAMBLIN C.C. and SWAAB T.Y. 2007. Coreference and lexical repetition: Mechanisms of discourse integration. Memory & Cognition vol.35 no.4 pp. 801-815.

  • LONG D. L and CHONG J. L. 2001. Comprehension skill and global coherence: A paradoxical picture of poor comprehenders abilities. Journal of Experimental Psychology Learning Memory and Cognition vol. 27 pp. 1424-1429.

  • MATHESIUS V. 1907. Studie k dějinám anglického slovosledu [A study on history of English word order]. Věstník České akademie vol.16 no.1 pp. 261-265.

  • MIKULOVÁ M. et al. 2005. Annotation on the tectogrammatical layer in the Prague Dependency Treebank. Praha: Universitas Carolina Pragensia. .

  • MIODUNKA W. 1974. Funkcje zaimków w grupach nominalnych współczesnej polszczyzny mówionej. Zesz. Nauk. UJ. Prace językoznawcze Zesz. 43. Krakow: PWN.

  • MITKOV R. 2014. Anaphora resolution. Routledge.

  • NEDOLUZHKO A. and HAJIČOVÁ E. 2015. Information structure and anaphoric links - a case study and probe. In: Corpus Linguistics 2015. Abstract book. Lancaster: Lancaster University pp. 252-254.

  • NEDOLUZHKO A. 2011. Extended nominal coreference and bridging anaphora (An approach to annotation of Czech data in the Prague Dependency Treebank). Prague: ÚFAL.

  • PADUCHEVA E. 1985. Vyskazyvanie i ego sootnesennost’ s dejstvitel’nost’ju [The utterance and its realization in the text]. Moscow: Nauka.

  • PAJAS P. and ŠTĚPÁNEK J. 2008. Recent advances in a feature-rich framework for treebank annotation. In: Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008). Manchester pp. 673-680.

  • PALEK B. 1988. Referenční výstavba textu. Univerzita Karlova Praha.

  • PAUL H. 1886. Prinzipien der Sprachgeschichte. Halle: Max Niemeyer.

  • PETI-STANTIĆ A. 2013. Informativity of sentence information structure: The role of word order. Language as Information pp. 155-178.

  • PETROVA S. 2009. Information structure and word order variation in the Old High German Tatian. Information structure and language change: New approaches to word order variation in Germanic pp. 251-280.

  • POVOLNÁ R. 2016. A cross cultural analysis of conjuncts as indicators of the interaction and negotiation of meaning in research articles. Topics in Linguistics vol.17 no.1 pp. 45-63.

  • PUTNAM H. 1961. Some issues in the theory of grammar. In: R. Jakobson ed. The structure of language and its mathematical aspects. Proceedings of Symposia in Applied Mathematics. Providence: American Mathematical Society pp. 25-42.

  • RUSSEL B. 1905. On denoting. Mind vol.14 no.56 pp. 479-493.

  • RYSOVÁ K. 2014. O slovosledu z komunikačního pohledu [On word order from the communicative point of view]. Prague: ÚFAL.

  • RYSOVÁ K. and RYSOVÁ M. 2015. Analyzing text coherence via multiple annotation in the Prague Dependency Treebank. In: Lecture Notes in Computer Science No. 9302 Text Speech and Dialogue: 18th International Conference TSD 2015. Cham / Heidelberg / New York / Dordrecht / London: Springer International Publishing pp. 71-79.

  • SGALL P. 1964. Generativní systémy v lingvistice [Generative systems in linguistics]. Slovo a slovesnost vol.25 no.4 pp. 274-282.

  • SGALL P. 1967. Functional sentence perspective in a generative description of language. Prague Studies in Mathematical Linguistics 2 pp. 203-225.

  • SGALL P. 1975. On the nature of topic and focus. In: H. Ringbom ed. Style and text (Studies Presented to Nils Erik Enkvist). Stockholm: Scriptor pp. 409-15.

  • SGALL P. HAJIČOVÁ E. and BENEŠOVÁ E. 1973. Topic focus and generative semantics. Kronberg/Taunus: Scriptor.

  • SGALL P. HAJIČOVÁ E. and PANEVOVÁ J. 1986. The meaning of the sentence in its semantic and pragmatic aspects. Dordrecht: Reidel Publishing Company.

  • SGALL P. NEBESKÝ L. GORALČÍKOVÁ A. and HAJIČOVÁ E. 1969. A functional approach to syntax in generative description of language. New York: American Elsevier Publishing Company.

  • SORACE A. and FILIACI F. 2006. Anaphora resolution in near-native speakers of Italian. Second Language Research vol.22 no. 3 pp. 339-368.

  • STEEDMAN M. 1991. Structure and intonation. Language vol.67 no.2 pp. 260-296.

  • ŠTĚPÁNEK J. and PAJAS P. 2010. Querying diverse treebanks in a uniform way. In: Proceedings of the 7th International Conference on Language Resources and Evaluation (LREC 2010). European Language Resources Association Valletta Malta pp. 1828-1835.

  • TOPOLIŃSKA Z. 1984. Składnia grupy imiennej. Gramatyka współczesnego języka polskiego pp. 301-384.

  • VAN HOEK K. 1995. Conceptual reference points: A cognitive grammar account of pronominal anaphora constraints. Language pp. 310-340.

  • VON DER GABELENTZ G. 1868. Ideen zu einer vergleichenden Syntax - Wort- und Satzstellung. Zeitschrift für Völkerpsychologie und Sprachwissenschaft vol.6 no.1 pp. 376-384.

  • WEGENER P. 1885. Untersuchungen über die Grundfragen des Sprachlebens. Amsterdam: Benjamins.

  • WEIL H. 1844. Question de grammaire générale: de l’ordre des mots dans les langues anciennes comparées aux langues modernes (thèse française). Paris: Joubert. Translated by Charles W. Super as Weil H. 1887. The order of words in the ancient languages compared with that of the modern languages. Boston: Ginn.

Search
Journal information
Impact Factor


CiteScore 2018: 0.25

SCImago Journal Rank (SJR) 2018: 0.144
Source Normalized Impact per Paper (SNIP) 2018: 0.447

Metrics
All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 321 88 3
PDF Downloads 167 61 4