Fast and Extensible Phrase Scoring for Statistical Machine Translation
Existing tools for generating phrase tables for phrase-based Statistical Machine Translation (SMT) are generally optimised towards low memory use to allow processing of large corpora with limited memory. Whilst being a reasonable design choice, this approach does not make optimal use of resources when the sufficient memory is available. We present memscore, a new open-source tool to score phrases in memory. Besides acting as a faster drop-in replacement for existing software, it implements a number of standard smoothing techniques and provides a platform for easy experimentation with new scoring methods.
If the inline PDF is not rendering correctly, you can download the PDF file here.
Federico Marcello Nicola Bertoldi and Mauro Cettolo. Irstlm: an open source toolkit for handling large scale language models. In Proceedings of Interspeech Brisbane 2008.
Koehn Philipp. Europarl: a corpus for statistical machine translation. In Proceedings of MT Summit X pages 79-86 Phuket Thailand 2005. AAMT.
Koehn Philipp Franz Josef Och and Daniel Marcu. Statistical phrase-based translation. In Proceedings of the 2003 conference of the North American chapter of the Association for Computational Linguistics on Human Language Technology pages 48-54 Edmonton 2003.
Koehn Philipp et al. Moses: open source toolkit for statistical machine translation. In Annual meeting of the Association for Computational Linguistics: Demonstration session pages 177-180 Prague 2007.
Ney Hermann Ute Essen and Reinhard Kneser. On structuring probabilistic dependences in stochastic language modelling. Computer Speech and Language 8:1-38 1994.
Ortiz-Martínez D. I. García-Varea and F. Casacuberta. Thot: a toolkit to train phrase-based statistical translation models. In Proceedings of MT Summit X pages 141-148 Phuket Thailand 2005. AAMT.
Witten Ian H. and Timothy C. Bell. The zero-frequency problem: Estimating the probabilities of novel events in adaptive text compression. IEEE Transactions on Information Theory 37(4): 1085-1094 1991.