Kateřina Rysová, Magdaléna Rysová, Michal Novák, Jiří Mírovský and Eva Hajičová
. European Languages Resources Association (ELRA).
Burstein, Jill, Karen Kukich, Susanne Wolff, Chi Lu, and Martin Chodorow. Computer analysis of essays. 1998.
Castro-Castro, Daniel, Rocío Lannes-Losada, Montse Maritxalar, Ianire Niebla, Celia Pérez-Marqués, Nancy C. Álamo-Suárez, and Aurora Pons-Porrata. A Multilingual Application for Automated Essay Scoring. In Advances in Artificial Intelligence – IBERAMIA 2008 , pages 243–251, Berlin, Heidelberg, 2008. Springer Berlin Heidelberg.
Foltz, Peter W., Darrell Laham, and Thomas K. Landauer. The Intelligent
., Roland Kuhn, and Howard Johnson. Phrasetable smoothing for statistical machine translation. In EMNLP , pages 53-61, 2006.
Gao, Qin and Stephan Vogel. Training phrase-based machine translation models on the cloudopen source machine translation toolkit chaski. Prague Bull. Math. Linguistics , 93: 37-46, 2010.
Hardmeier, Christian. Fast and extensible phrase scoring for statistical machine translation. Prague Bull. Math. Linguistics , 93:87-96, 2010.
Koehn, Philipp, Hieu Hoang, Alexandra
Fast and Extensible Phrase Scoring for Statistical Machine Translation
Existing tools for generating phrase tables for phrase-based Statistical Machine Translation (SMT) are generally optimised towards low memory use to allow processing of large corpora with limited memory. Whilst being a reasonable design choice, this approach does not make optimal use of resources when the sufficient memory is available. We present memscore, a new open-source tool to score phrases in memory. Besides acting as a faster drop-in replacement for existing software, it implements a number of standard smoothing techniques and provides a platform for easy experimentation with new scoring methods.
Goyal, Amit, Hal Daumé, III, and Suresh Venkatasubramanian. Streaming for large scale NLP: language modeling. In Proc. of HTL/NAACL , pages 512-520, Boulder, Colorado, 2009. URL http://portal.acm.org/citation.cfm?id=1620754.1620829 http://portal.acm.org/citation.cfm?id=1620754.1620829
Hardmeier, Christian. Fast and Extensible Phrase Scoring for Statistical Machine Translation. The Prague Bulletin of Mathematical Linguistics , 93:79-88, 2010.
Johnson, J Howard, Joel Martin, George Foster, and Roland Kuhn. Improving
rgbF: An Open Source Tool for n-gram Based Automatic Evaluation of Machine Translation Output
We describe RGBF, a tool for automatic evaluation of machine translation output based on n-gram precision and recall. The tool calculates the F-score averaged on all n-grams of an arbitrary set of distinct units such as words, morphemes, POS tags, etc. The arithmetic mean is used for n-gram averaging. As input, the tool requires reference translation(s) and hypothesis, both containing the same combination of units. The default output is the document level 4-gram F-score of the desired unit combination. The scores at the sentence level can be obtained on demand, as well as precision and/or recall scores, separate unit scores and separate n-gram scores. In addition, weights can be introduced both for n-grams and for units, as well as the desired n-gram order n.
Sulis: An Open Source Transfer Decoder for Deep Syntactic Statistical Machine Translation
In this paper, we describe an open source transfer decoder for Deep Syntactic Transfer-Based Statistical Machine Translation. Transfer decoding involves the application of transfer rules to a SL structure. The N-best TL structures are found via a beam search of TL hypothesis structures which are ranked via a log-linear combination of feature scores, such as translation model and dependency-based language model.
In this paper we describe the use of QuEst, a framework that aims to obtain predictions on the quality of translations, to improve the performance of machine translation (MT) systems without changing their internal functioning. We apply QuEst to experiments with:
i. multiple system translation ranking, where translations produced by different MT systems are ranked according to their estimated quality, leading to gains of up to 2:72 BLEU, 3:66 BLEUs, and 2:17 F1 points;
ii. n-best list re-ranking, where n-best list translations produced by an MT system are reranked based on predicted quality scores to get the best translation ranked top, which lead to improvements on sentence NIST score by 0:41 points;
iii. n-best list combination, where segments from an n-best list are combined using a latticebased re-scoring approach that minimize word error, obtaining gains of 0:28 BLEU points; and
iv. the ITERPE strategy, which attempts to identify translation errors regardless of prediction errors (ITERPE) and build sentence-specific SMT systems (SSSS) on the ITERPE sorted instances identified as having more potential for improvement, achieving gains of up to 1:43 BLEU, 0:54 F1, 2:9 NIST, 0:64 sentence BLEU, and 4:7 sentence NIST points in English to German over the top 100 ITERPE sorted instances.
Optimisation in statistical machine translation is usually made toward the BLEU score, but this metric is questioned about its relevance to an human evaluation. Many other metrics exist but none of them are in perfect harmony with human evaluation. On the other hand, most evaluation campaigns use multiple metrics (BLEU, TER, METEOR, etc.). Statistical machine translation systems can be optimised for other metrics than BLEU, but usually the optimisation with other metrics tends to decrease the BLEU score, the main metric used in MT evaluation campaigns.
In this paper we extend the minimum error training tool of the popular Moses SMT toolkit with a scorer for the TER score, and any linear combination of the existing metrics. The TER scorer was reimplemented in C++ which results in a ten times faster execution than the reference java code.
We have performed experiments with two large-scale phrase-base SMT systems to show the benefit of the new options of the minimum error training in Moses. The first one translates from French into English (WMT 2011 evaluation). The second one was developed in the frame work of the DARPA Gale project to translate from Arabic to English in three different genres (news, web and transcribed broadcast news and conversations).
The present paper summarizes our recent results concerning English-Czech Machine Translation implemented in the TectoMT framework. The system uses tectogrammatical trees as the transfer medium. A detailed analysis of errors made by the previous version of the system (considered as the baseline) is presented first. Then several improvements of the system are described that led to better translation quality in terms of BLEU and NIST scores. The biggest performance gain comes from applying Hidden Tree Markov Model in the transfer phase, which is a novel technique in the field of Machine Translation.
Éva Mújdricza-Maydt, Huiqin Körkel-Qu, Stefan Riezler and Sebastian Padó
We present a semi-supervised, language- and domain-independent approach to high precision sentence alignment. The key idea is to bootstrap a supervised discriminative learner from wood-standard alignments, i.e. alignments that have been automatically generated by state-of-the-art sentence alignment tools. We deploy 3 different unsupervised sentence aligners (Opus, Hunalign, Gargantua) and 2 different datasets (movie subtitles and novels) and show experimentally that bootstrapping consistently improves precision significantly such that, with one exception, we obtain an overall gain in F-score.