Analyzing Error Types in English-Czech Machine Translation
This paper examines two techniques of manual evaluation that can be used to identify error types of individual machine translation systems. The first technique of "blind post-editing" is being used in WMT evaluation campaigns since 2009 and manually constructed data of this type are available for various language pairs. The second technique of explicit marking of errors has been used in the past as well.
We propose a method for interpreting blind post-editing data at a finer level and compare the results with explicit marking of errors. While the human annotation of either of the techniques is not exactly reproducible (relatively low agreement), both techniques lead to similar observations of differences of the systems. Specifically, we are able to suggest which errors in MT output are easy and hard to correct with no access to the source, a situation experienced by users who do not understand the source language.
eppex: Epochal Phrase Table Extraction for Statistical Machine Translation
We present a tool that extracts phrase pairs from a word-aligned parallel corpus and filters them on the fly based on a user-defined frequency threshold. The bulk of phrase pairs to be scored is much reduced, making the whole phrase table construction process faster with no significant harm to the ultimate phrase table quality as measured by BLEU. Technically, our tool is an alternative to the extract component of the phrase-extract toolkit bundled with Moses SMT software and covers some of the functionality of sigfilter.
Evaluation of Machine Translation Metrics for Czech as the Target Language
In the present work we study semi-automatic evaluation techniques of machine translation (MT) systems. These techniques are based on a comparison of the MT system's output to human translations of the same text. Various metrics were proposed in the recent years, ranging from metrics using only a unigram comparison to metrics that try to take advantage of additional syntactic or semantic information. The main goal of this article is to compare these metrics with respect to their correlation with human judgments for Czech as the target language and to propose the best ones that can be used for an evaluation of MT systems translating into Czech language.
TrTok: A Fast and Trainable Tokenizer for Natural Languages
We present a universal data-driven tool for segmenting and tokenizing text. The presented tokenizer lets the user define where token and sentence boundaries should be considered. These instances are then judged by a classifier which is trained from provided tokenized data. The features passed to the classifier are also defined by the user making, e.g., the inclusion of abbreviation lists trivial. This level of customizability makes the tokenizer a versatile tool which we show is capable of sentence detection in English text as well as word segmentation in Chinese text. In the case of English sentence detection, the system outperforms previous methods. The software is available as an open-source project on GitHub1.
CzEng 0.9: Large Parallel Treebank with Rich Annotation
We describe our ongoing efforts in collecting a Czech-English parallel corpus CzEng. The paper provides full details on the current version 0.9 and focuses on its new features: (1) data from new sources were added, most importantly a few hundred electronically available books, technical documentation and also some parallel web pages, (2) the full corpus has been automatically annotated up to the tectogrammatical layer (surface and deep syntactic analysis), (3) sentence segmentation has been refined, and (4) several heuristic filters to improve corpus quality were implemented. In total, we provide a sentence-aligned automatic parallel treebank of about 8.0 million sentences, 93 million English and 82 million Czech words. CzEng 0.9 is freely available for non-commercial research purposes.
In this work, we present subjectivity lexicons of positive and negative expressions for Indonesian language created by automatically translating English lexicons. Other variations are created by intersecting or unioning them. We compare the lexicons in the task of predicting sentence polarity on a set of 446 manually annotated sentences and we also contrast the generic lexicons with a small lexicon extracted directly from the annotated sentences (in a cross-validation setting). We seek for further improvements by assigning weights to lexicon entries and by wrapping the prediction into a machine learning task with a small number of additional features. We observe that lexicons are able to reach high recall but suffer from low precision when predicting whether a sentence is evaluative (positive or negative) or not (neutral). Weighting the lexicons can improve either the recall or the precision but with a comparable decrease in the other measure.
We present eman, a tool for managing large numbers of computational experiments. Over
the years of our research in machine translation (MT), we have collected a couple of ideas for
efficient experimenting. We believe these ideas are generally applicable in (computational)
research of any field. We incorporated them into eman in order to make them available in a
command-line Unix environment.
The aim of this article is to highlight the core of the many ideas. We hope the text can serve
as a collection of experiment management tips and tricks for anyone, regardless their field of
study or computer platform they use. The specific examples we provide in eman’s current syntax
are less important but they allow us to use concrete terms. The article thus also fills the gap in
eman documentation by providing some high-level overview.
We propose a manual evaluation method for machine translation (MT), in which annotators rank only translations of short segments instead of whole sentences. This results in an easier and more efficient annotation. We have conducted an annotation experiment and evaluated a set of MT systems using this method. The obtained results are very close to the official WMT14 evaluation results. We also use the collected database of annotations to automatically evaluate new, unseen systems and to tune parameters of a statistical machine translation system. The evaluation of unseen systems, however, does not work and we analyze the reasons
Over the past years, pivoting methods, i.e. machine translation via a third language, gained respectable attention. Various experiments with different approaches and datasets have been carried out but the lack of open-source tools makes it difficult to replicate the results of these experiments. This paper presents a new tool for pivoting for phrase-based statistical machine translation by so called phrase-table triangulation. Besides the tool description, this paper discusses the strong and weak points of various triangulation techniques implemented in the tool.
This paper proposes a new method of manual evaluation for statistical machine translation, the so-called quiz-based evaluation, estimating whether people are able to extract information from machine-translated texts reliably. We apply the method to two commercial and two experimental MT systems that participated in WMT 2010 in English-to-Czech translation. We report inter-annotator agreement for the evaluation as well as the outcomes of the individual systems. The quiz-based evaluation suggests rather different ranking of the systems compared to the WMT 2010 manual and automatic metrics. We also see that overall, MT quality is becoming acceptable for obtaining information from the text: about 80% of questions can be answered correctly given only machine-translated text.