Golden Rule of Morphology and Variants of Word forms

Open access

Abstract

In many languages, some words can be written in several ways. We call them variants. Values of all their morphological categories are identical, which leads to an identical morphological tag. Together with the identical lemma, we have two or more wordforms with the same morphological description. This ambiguity may cause problems in various NLP applications. There are two types of variants – those affecting the whole paradigm (global variants) and those affecting only wordforms sharing some combinations of morphological values (inflectional variants). In the paper, we propose means how to tag all wordforms, including their variants, unambiguously. We call this requirement “Golden rule of morphology”. The paper deals mainly with Czech, but the ideas can be applied to other languages as well.

[1] Czech National Corpus: Accessible at: http://ucnk.ff.cuni.cz/.

[2] British National Corpus. Accessible at: http://www.natcorp.ox.ac.uk/.

[3] Hajič, J. (2004). Disambiguation of Rich Inflection. (Computational Morphology of Czech). Karolinum, Praha.

[4] Brno morphological analyzer ajka. Accessible at: http://nlp.fi.muni.cz/projekty/ajka/index.htm.

[5] Savický, P. and Hlaváčová, J. (2002). Measures of Word Commonness. Journal of Quantitative Linguistics, 9(3):215–231.

[6] Hlaváčová, J. (2011). Problém variantních tvarů slov při automatickém zpracování jazyka. In Information Technologies – Applications and Theory, pages 75–78, Univerzita Pavla Jozefa Šafárika v Košiciach, Slovakia.

[7] Hlaváčová, J. (2009). Formalizace systému české morfologie s ohledem na automatické zpracování českých textů. Ph.D. thesis, FF UK.

[8] Hlaváčová, J. (2008). Pravopisné varianty a morfologická anotace korpusů. In Grammar & Corpora / Gramatika a korpus 2007, pages 161–168, Academia, Praha, Czech Republic.

Journal of Linguistics/Jazykovedný casopis

The Journal of Ludovít Štúr Institute of Linguistics, SAV

Journal Information


CiteScore 2017: 0.03

SCImago Journal Rank (SJR) 2017: 0.101
Source Normalized Impact per Paper (SNIP) 2017: 0.203

Metrics

All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 293 247 16
PDF Downloads 151 130 9