Fast and Extensible Phrase Scoring for Statistical Machine Translation

Open access

Fast and Extensible Phrase Scoring for Statistical Machine Translation

Existing tools for generating phrase tables for phrase-based Statistical Machine Translation (SMT) are generally optimised towards low memory use to allow processing of large corpora with limited memory. Whilst being a reasonable design choice, this approach does not make optimal use of resources when the sufficient memory is available. We present memscore, a new open-source tool to score phrases in memory. Besides acting as a faster drop-in replacement for existing software, it implements a number of standard smoothing techniques and provides a platform for easy experimentation with new scoring methods.

If the inline PDF is not rendering correctly, you can download the PDF file here.

  • Federico Marcello Nicola Bertoldi and Mauro Cettolo. Irstlm: an open source toolkit for handling large scale language models. In Proceedings of Interspeech Brisbane 2008.

  • Koehn Philipp. Europarl: a corpus for statistical machine translation. In Proceedings of MT Summit X pages 79-86 Phuket Thailand 2005. AAMT.

  • Koehn Philipp Franz Josef Och and Daniel Marcu. Statistical phrase-based translation. In Proceedings of the 2003 conference of the North American chapter of the Association for Computational Linguistics on Human Language Technology pages 48-54 Edmonton 2003.

  • Koehn Philipp et al. Moses: open source toolkit for statistical machine translation. In Annual meeting of the Association for Computational Linguistics: Demonstration session pages 177-180 Prague 2007.

  • Ney Hermann Ute Essen and Reinhard Kneser. On structuring probabilistic dependences in stochastic language modelling. Computer Speech and Language 8:1-38 1994.

  • Ortiz-Martínez D. I. García-Varea and F. Casacuberta. Thot: a toolkit to train phrase-based statistical translation models. In Proceedings of MT Summit X pages 141-148 Phuket Thailand 2005. AAMT.

  • Witten Ian H. and Timothy C. Bell. The zero-frequency problem: Estimating the probabilities of novel events in adaptive text compression. IEEE Transactions on Information Theory 37(4): 1085-1094 1991.

Search
Journal information
Metrics
All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 120 67 5
PDF Downloads 98 61 3