Universal Annotation of Slavic Verb Forms

Open access

Abstract

This article proposes application of a subset of the Universal Dependencies (UD) standard to the group of Slavic languages. The subset in question comprises morphosyntactic features of various verb forms. We systematically document the inventory of features observable with Slavic verbs, giving numerous examples from 10 languages. We demonstrate that terminology in literature may differ, yet the substance remains the same. Our goal is practical. We definitely do not intend to overturn the many decades of research in Slavic comparative linguistics. Instead, we want to put the properties of Slavic verbs in the context of UD, and to propose a unified (Slavic-wide) application of UD features and values to them. We believe that our proposal is a compromise that could be accepted by corpus linguists working on all Slavic languages.

Academia. Mluvnice češtiny (2) Tvarosloví. Academia, nakladatelství Československé akademie věd, Praha, Czechoslovakia, 1986.

Breu, Walter. Probleme der Interaktion von Lexik und Aspekt (ILA), volume 412 of Linguistische Arbeiten. Niemeyer, Tübingen, Germany, 2000. ISBN 3-484-30412-X.

Comrie, Bernard and Greville G. Corbett. The Slavonic Languages. Routledge, London, UK, 2001. ISBN 0-415-04755-2.

Erjavec, Tomaž. MULTEXT-East: Morphosyntactic Resources for Central and Eastern European Languages. Language Resources and Evaluation, 46(1):131–142, 2012.

Komárek, Miroslav, Václav Vážný, and František Trávníček. Historická mluvnice česká II. Tvarosloví. Státní pedagogické nakladatelství, Praha, Czechoslovakia, 1967.

Nedjalkov, Vladimir P. and Igor’ V. Nedjalkov. On the typological characteristics of converbs. In Help, Toomas, editor, Symposium on language universals, pages 75–79, Tallinn, Soviet Union, 1987.

Nivre, Joakim, Marie-Catherine de Marneffe, Filip Ginter, Yoav Goldberg, Jan Hajič, Christopher Manning, Ryan McDonald, Slav Petrov, Sampo Pyysalo, Natalia Silveira, Reut Tsarfaty, and Daniel Zeman. Universal Dependencies v1: A Multilingual Treebank Collection. In Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016), Portorož, Slovenia, 2016. European Language Resources Association.

Przepiórkowski, Adam and Marcin Woliński. A Flexemic Tagset for Polish. In Proceedings of Morphological Processing of Slavic Languages, EACL 2003, 2003. URL http://nlp.ipipan.waw.pl/~adamp/Papers/2003-eacl-ws12/ws12.pdf.

Zeman, Daniel. Reusable Tagset Conversion Using Tagset Drivers. In Proceedings of the 6th International Conference on Language Resources and Evaluation (LREC 2008), pages 213–218, Marrakech, Morocco, 2008. European Language Resources Association. ISBN 2-9517408-4-0.

Zeman, Daniel. Slavic Languages in Universal Dependencies. In Gajdošová, Katarína and Adriána Žáková, editors, Natural Language Processing, Corpus Linguistics, E-learning (proceedings of SLOVKO 2015), pages 151–163, Bratislava, Slovakia, 2015. Slovenská akadémia vied, RAM-Verlag. ISBN 978-3-942303-32-3.

The Prague Bulletin of Mathematical Linguistics

The Journal of Charles University

Journal Information

Metrics

All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 94 94 7
PDF Downloads 54 54 5