Studying text coherence in Czech – a corpus-based analysis

Magdaléna Rysová

Open Access

Studying text coherence in Czech – a corpus-based analysis

Magdaléna Rysová

| Dec 29, 2017

Topics in Linguistics

Volume 18 (2017): Issue 2 (December 2017)

About this article

Cite

Page range: 36 - 47

DOI: https://doi.org/10.1515/topling-2017-0009

Keywords
sentence information structure, coreference, corpus analysis, Czech

© 2018

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.

The paper deals with the field of Czech corpus linguistics and represents one of various current studies analysing text coherence through language interactions. It presents a corpusbased analysis of grammatical coreference and sentence information structure (in terms of contextual boundness) in Czech. It focuses on examining the interaction of these two language phenomena and observes where they meet to participate in text structuring. Specifically, the paper analyses contextually bound and non-bound sentence items and examines whether (and how often) they are involved in relations of grammatical coreference in Czech newspaper articles. The analysis is carried out on the language data of the Prague Dependency Treebank (PDT) containing 3,165 Czech texts. The results of the analysis are helpful in automatic text annotation - the paper presents how (or to what extent) the annotation of grammatical coreference may be used in automatic (pre-)annotation of sentence information structure in Czech. It demonstrates how accurately we may (automatically) assume the value of contextual boundness for the antecedent and anaphor (as the two participants of a grammatical coreference relation). The results of the paper demonstrate that the anaphor of grammatical coreference is automatically predictable - it is a non-contrastive contextually bound sentence item in 99.18% of cases. On the other hand, the value of contextual boundness of the antecedent is not so easy to estimate (according to the PDT, the antecedent is contextually non-bound in 37% of cases, non-contrastive contextually bound in 50% and contrastive contextually bound in 13% of cases).

eISSN:: 2199-6504
ISSN:: 1337-7590
Language:: English

Publication timeframe:: 2 times per year
Journal Subjects:: Linguistics and Semiotics, Theoretical Frameworks and Disciplines, Linguistics, other, Philosophy of Language

Journal RSS Feed

Studying text coherence in Czech – a corpus-based analysis

Published Online: Dec 29, 2017

Page range: 36 - 47

DOI: https://doi.org/10.1515/topling-2017-0009

Keywordssentence information structure, coreference, corpus analysis, Czech

© 2018

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.

Keywords
sentence information structure, coreference, corpus analysis, Czech