Ncode: an Open Source Bilingual N-gram SMT Toolkit
This paper describes Ncode, an open source statistical machine translation (SMT) toolkit for translation models estimated as n-gram language models of bilingual units (tuples). This toolkit includes tools for extracting tuples, estimating models and performing translation. It can be easily coupled to several other open source toolkits to yield a complete SMT pipeline. In this article, we review the main features of the toolkit and explain how to build a translation engine with Ncode. We also report a short comparison with the widely known Moses system. Results show that Ncode outperforms Moses in terms of memory requirements and translation speed. Ncode also achieves slightly higher accuracy results.
If the inline PDF is not rendering correctly, you can download the PDF file here.
Casacuberta, F. and E. Vidal. Machine translation with inferred stochastic finite-state transducers. Computational Linguistics 30(4):205-225, 2004.
Crego, Josep M. and José B. Mariño. Improving SMT by coupling reordering and decoding. Machine Translation, 20(3):199-215, 2007.
Crego, Josep M. and Franc cois Yvon. Improving reordering with linguistically informed bilingual n-grams, August 2009.
Koehn, Philipp, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, Chris Dyer, Ondrej Bojar, Alexandra Constantin, and Evan Herbst. Moses: Open source toolkit for statistical machine translation. Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), demonstration session, Prague, Czech Republic, June, 2007.
Mariño, José B., Rafael E. Banchs, Josep M. Crego, Adrià de Gispert, Patrick Lambert, José A.R. Fonollosa, and Marta R. Costa-Jussà. N-gram-based machine translation. Computational Linguistics 32(4):7-549, 2006.
Och, Franz Josef and Hermann Ney. Discriminative training and maximum entropy models for statistical machine translation. Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), 295-302, Philadelphia, PA, July, 2002.
Papineni, Kishore, Salim Roukos, Todd Ward, and Wei-Jing Zhu. Bleu: a method for automatic evaluation of machine translation, 2002.
Tillman, Christoph. A unigram orientation model for statistical machine translation. Proceedings of the HLT-NAACL'04, 101-104, Boston, MA, USA, May, 2004.
Zens, Richard, Franz Joseph Och, and Herman Ney. Phrase-based statistical machine translation. Proceedings of the 25th Annual German Conference on AI: Advances in Artificial Intelligence, 18-32, 2002.