We describe a conversion of the syntactically annotated part of the Slovak National Corpus into the annotation scheme known as Universal Dependencies. Only a small subset of the data has been converted so far; yet it is the first Slovak treebank that is publicly available for research. We list a number of research projects in which the dataset has been used so far, including the first parsing results.
Word-Order Issues in English-to-Urdu Statistical Machine Translation
We investigate phrase-based statistical machine translation between English and Urdu, two Indo-European languages that differ significantly in their word-order preferences. Reordering of words and phrases is thus a necessary part of the translation process. While local reordering is modeled nicely by phrase-based systems, long-distance reordering is known to be a hard problem. We perform experiments using the Moses SMT system and discuss reordering models available in Moses. We then present our novel, Urdu-aware, yet generalizable approach based on reordering phrases in syntactic parse tree of the source English sentence. Our technique significantly improves quality of English-Urdu translation with Moses, both in terms of BLEU score and of subjective human judgments.
This article proposes application of a subset of the Universal Dependencies (UD) standard to the group of Slavic languages. The subset in question comprises morphosyntactic features of various verb forms. We systematically document the inventory of features observable with Slavic verbs, giving numerous examples from 10 languages. We demonstrate that terminology in literature may differ, yet the substance remains the same. Our goal is practical. We definitely do not intend to overturn the many decades of research in Slavic comparative linguistics. Instead, we want to put the properties of Slavic verbs in the context of UD, and to propose a unified (Slavic-wide) application of UD features and values to them. We believe that our proposal is a compromise that could be accepted by corpus linguists working on all Slavic languages.
We present various achievements in statistical machine translation from English, German, Spanish and French into Czech. We discuss specific properties of the individual source languages and describe techniques that exploit these properties and address language-specific errors. Besides the translation proper, we also present our contribution to error analysis.
Daniel Zeman, Mark Fishel, Jan Berka and Ondřej Bojar
Addicter: What Is Wrong with My Translations?
We introduce Addicter, a tool for Automatic Detection and DIsplay of Common Translation ERrors. The tool allows to automatically identify and label translation errors and browse the test and training corpus and word alignments; usage of additional linguistic tools is also supported. The error classification is inspired by that of Vilar et al. (2006), although some of their higher-level categories are beyond the reach of the current version of our system. In addition to the tool itself we present a comparison of the proposed method to manually classified translation errors and a thorough evaluation of the generated alignments.
Pavel Broz, Daniel Rajdl, Jaroslav Novak, Milan Hromadka, Jaroslav Racek, Ladislav Trefil and Vaclav Zeman
The aim of this study was to examine high-sensitivity troponin T and I (hsTnT and hsTnI) after a treadmill run under laboratory conditions and to find a possible connection with echocardiographic, laboratory and other assessed parameters. Nineteen trained men underwent a standardized 2-hour-long treadmill run. Concentrations of hsTnT and hsTnI were assessed before the run, 60, 120 and 180 minutes after the start and 24 hours after the run. Changes in troponins were tested using non-parametric analysis of variance (ANOVA). The multiple linear regression model was used to find the explanatory variables for hsTnT and hsTnI changes. Values of troponins were evaluated using the 0h/1h algorithm. Changes in hsTnT and hsTnI levels were statistically significant (p<0.0001 and p<0.0001, respectively). In a multiple regression model (adjusted R2: 0.60, p=0.005 for hsTnT and adjusted R2: 0.60, p=0.005 for hsTnI), changes in both troponins can be explained by relative left wall thickness (LV), training volume, body temperature after the run and creatinine changes. According to the 0h/1h algorithm, none of the runners was evaluated as negative. Relative LV wall thickness, creatinine changes, training volume and body temperature after the run can predict changes in hsTnT and hsTnI levels. When medical attention is needed after physical exercise, hsTn levels should be tested only when clinical suspicion and the patient’s history indicate a high probability of myocardial damage.