Text collections for evaluation of Russian morphological taggers

The paper describes the preparation and development of the text collections within the framework of MorphoRuEval-2017 shared task, an evaluation campaign designed to stimulate development of the automatic morphological processing technologies for Russian. The main challenge for the organizers was to standardize all available Russian corpora with the manually verified high-quality tagging to a single format (Universal Dependencies CONLL-U). The sources of the data were the disambiguated subcorpus of the Russian National Corpus, SynTagRus, OpenCorpora.org data and GICR corpus with the resolved homonymy, all exhibiting different tagsets, rules for lemmatization, pipeline architecture, technical solutions and error systematicity. The collections includes both normative texts (the news and modern literature) and more informal discourse (social media and spoken data), the texts are available under CC BY-NC-SA 3.0 license.

eISSN:: 1338-4287
ISSN:: 0021-5597
Language:: English

Publication timeframe:: 2 times per year
Journal Subjects:: Linguistics and Semiotics, Theoretical Frameworks and Disciplines, Linguistics, other

Journal RSS Feed

Text collections for evaluation of Russian morphological taggers

Published Online: Jan 24, 2018

Page range: 258 - 267

DOI: https://doi.org/10.1515/jazcas-2017-0035

Keywords
text collection, shared task, morphological tagging, universal dependencies, morphological parsing, Russian corpora

© 2017 Olga Lyashevskaya et al., published by De Gruyter Open

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.

Text collections for evaluation of Russian morphological taggers

Published Online: Jan 24, 2018

Page range: 258 - 267

DOI: https://doi.org/10.1515/jazcas-2017-0035

Keywordstext collection, shared task, morphological tagging, universal dependencies, morphological parsing, Russian corpora

© 2017 Olga Lyashevskaya et al., published by De Gruyter Open

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.

Keywords
text collection, shared task, morphological tagging, universal dependencies, morphological parsing, Russian corpora