Kateřina Rysová, Magdaléna Rysová, Michal Novák, Jiří Mírovský and Eva Hajičová
. European Languages Resources Association (ELRA).
Burstein, Jill, Karen Kukich, Susanne Wolff, Chi Lu, and Martin Chodorow. Computer analysis of essays. 1998.
Castro-Castro, Daniel, Rocío Lannes-Losada, Montse Maritxalar, Ianire Niebla, Celia Pérez-Marqués, Nancy C. Álamo-Suárez, and Aurora Pons-Porrata. A Multilingual Application for Automated Essay Scoring. In Advances in Artificial Intelligence – IBERAMIA 2008 , pages 243–251, Berlin, Heidelberg, 2008. Springer Berlin Heidelberg.
Foltz, Peter W., Darrell Laham, and Thomas K. Landauer. The Intelligent
., Roland Kuhn, and Howard Johnson. Phrasetable smoothing for statistical machine translation. In EMNLP , pages 53-61, 2006.
Gao, Qin and Stephan Vogel. Training phrase-based machine translation models on the cloudopen source machine translation toolkit chaski. Prague Bull. Math. Linguistics , 93: 37-46, 2010.
Hardmeier, Christian. Fast and extensible phrase scoring for statistical machine translation. Prague Bull. Math. Linguistics , 93:87-96, 2010.
Koehn, Philipp, Hieu Hoang, Alexandra
Fast and Extensible Phrase Scoring for Statistical Machine Translation
Existing tools for generating phrase tables for phrase-based Statistical Machine Translation (SMT) are generally optimised towards low memory use to allow processing of large corpora with limited memory. Whilst being a reasonable design choice, this approach does not make optimal use of resources when the sufficient memory is available. We present memscore, a new open-source tool to score phrases in memory. Besides acting as a faster drop-in replacement for existing software, it implements a number of standard smoothing techniques and provides a platform for easy experimentation with new scoring methods.
Goyal, Amit, Hal Daumé, III, and Suresh Venkatasubramanian. Streaming for large scale NLP: language modeling. In Proc. of HTL/NAACL , pages 512-520, Boulder, Colorado, 2009. URL http://portal.acm.org/citation.cfm?id=1620754.1620829 http://portal.acm.org/citation.cfm?id=1620754.1620829
Hardmeier, Christian. Fast and Extensible Phrase Scoring for Statistical Machine Translation. The Prague Bulletin of Mathematical Linguistics , 93:79-88, 2010.
Johnson, J Howard, Joel Martin, George Foster, and Roland Kuhn. Improving
rgbF: An Open Source Tool for n-gram Based Automatic Evaluation of Machine Translation Output
We describe RGBF, a tool for automatic evaluation of machine translation output based on n-gram precision and recall. The tool calculates the F-score averaged on all n-grams of an arbitrary set of distinct units such as words, morphemes, POS tags, etc. The arithmetic mean is used for n-gram averaging. As input, the tool requires reference translation(s) and hypothesis, both containing the same combination of units. The default output is the document level 4-gram F-score of the desired unit combination. The scores at the sentence level can be obtained on demand, as well as precision and/or recall scores, separate unit scores and separate n-gram scores. In addition, weights can be introduced both for n-grams and for units, as well as the desired n-gram order n.
Sulis: An Open Source Transfer Decoder for Deep Syntactic Statistical Machine Translation
In this paper, we describe an open source transfer decoder for Deep Syntactic Transfer-Based Statistical Machine Translation. Transfer decoding involves the application of transfer rules to a SL structure. The N-best TL structures are found via a beam search of TL hypothesis structures which are ranked via a log-linear combination of feature scores, such as translation model and dependency-based language model.
Optimisation in statistical machine translation is usually made toward the BLEU score, but this metric is questioned about its relevance to an human evaluation. Many other metrics exist but none of them are in perfect harmony with human evaluation. On the other hand, most evaluation campaigns use multiple metrics (BLEU, TER, METEOR, etc.). Statistical machine translation systems can be optimised for other metrics than BLEU, but usually the optimisation with other metrics tends to decrease the BLEU score, the main metric used in MT evaluation campaigns.
In this paper we extend the minimum error training tool of the popular Moses SMT toolkit with a scorer for the TER score, and any linear combination of the existing metrics. The TER scorer was reimplemented in C++ which results in a ten times faster execution than the reference java code.
We have performed experiments with two large-scale phrase-base SMT systems to show the benefit of the new options of the minimum error training in Moses. The first one translates from French into English (WMT 2011 evaluation). The second one was developed in the frame work of the DARPA Gale project to translate from Arabic to English in three different genres (news, web and transcribed broadcast news and conversations).
In this paper we describe the use of QuEst, a framework that aims to obtain predictions on the quality of translations, to improve the performance of machine translation (MT) systems without changing their internal functioning. We apply QuEst to experiments with:
i. multiple system translation ranking, where translations produced by different MT systems are ranked according to their estimated quality, leading to gains of up to 2:72 BLEU, 3:66 BLEUs, and 2:17 F1 points;
ii. n-best list re-ranking, where n-best list translations produced by an MT system are reranked based on predicted quality scores to get the best translation ranked top, which lead to improvements on sentence NIST score by 0:41 points;
iii. n-best list combination, where segments from an n-best list are combined using a latticebased re-scoring approach that minimize word error, obtaining gains of 0:28 BLEU points; and
iv. the ITERPE strategy, which attempts to identify translation errors regardless of prediction errors (ITERPE) and build sentence-specific SMT systems (SSSS) on the ITERPE sorted instances identified as having more potential for improvement, achieving gains of up to 1:43 BLEU, 0:54 F1, 2:9 NIST, 0:64 sentence BLEU, and 4:7 sentence NIST points in English to German over the top 100 ITERPE sorted instances.
Matthias Huck, Jan-Thorsten Peter, Markus Freitag, Stephan Peitz and Hermann Ney
Hierarchical Phrase-Based Translation with Jane 2
In this paper, we give a survey of several recent extensions to hierarchical phrase-based machine translation that have been implemented in version 2 of Jane, RWTH's open source statistical machine translation toolkit. We focus on the following techniques: Insertion and deletion models, lexical scoring variants, reordering extensions with non-lexicalized reordering rules and with a discriminative lexicalized reordering model, and soft string-to-dependency hierarchical machine translation. We describe the fundamentals of each of these techniques and present experimental results obtained with Jane 2 to confirm their usefulness in state-of-the-art hierarchical phrase-based translation (HPBT).
The present paper summarizes our recent results concerning English-Czech Machine Translation implemented in the TectoMT framework. The system uses tectogrammatical trees as the transfer medium. A detailed analysis of errors made by the previous version of the system (considered as the baseline) is presented first. Then several improvements of the system are described that led to better translation quality in terms of BLEU and NIST scores. The biggest performance gain comes from applying Hidden Tree Markov Model in the transfer phase, which is a novel technique in the field of Machine Translation.