Ncode: an Open Source Bilingual N-gram SMT Toolkit
This paper describes Ncode, an open source statistical machine translation (SMT) toolkit for translation models estimated as n-gram language models of bilingual units (tuples). This toolkit includes tools for extracting tuples, estimating models and performing translation. It can be easily coupled to several other open source toolkits to yield a complete SMT pipeline. In this article, we review the main features of the toolkit and explain how to build a translation engine with Ncode. We also report a short comparison with the widely known Moses system. Results show that Ncode outperforms Moses in terms of memory requirements and translation speed. Ncode also achieves slightly higher accuracy results.
If the inline PDF is not rendering correctly, you can download the PDF file here.
Casacuberta F. and E. Vidal. Machine translation with inferred stochastic finite-state transducers. Computational Linguistics 30(4):205-225 2004.
Crego Josep M. and José B. Mariño. Improving SMT by coupling reordering and decoding. Machine Translation 20(3):199-215 2007.
Crego Josep M. and Franc cois Yvon. Improving reordering with linguistically informed bilingual n-grams August 2009.
Koehn Philipp Hieu Hoang Alexandra Birch Chris Callison-Burch Marcello Federico Nicola Bertoldi Brooke Cowan Wade Shen Christine Moran Richard Zens Chris Dyer Ondrej Bojar Alexandra Constantin and Evan Herbst. Moses: Open source toolkit for statistical machine translation. Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL) demonstration session Prague Czech Republic June 2007.
Mariño José B. Rafael E. Banchs Josep M. Crego Adrià de Gispert Patrick Lambert José A.R. Fonollosa and Marta R. Costa-Jussà. N-gram-based machine translation. Computational Linguistics 32(4):7-549 2006.
Och Franz Josef and Hermann Ney. Discriminative training and maximum entropy models for statistical machine translation. Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL) 295-302 Philadelphia PA July 2002.
Papineni Kishore Salim Roukos Todd Ward and Wei-Jing Zhu. Bleu: a method for automatic evaluation of machine translation 2002.
Tillman Christoph. A unigram orientation model for statistical machine translation. Proceedings of the HLT-NAACL'04 101-104 Boston MA USA May 2004.
Zens Richard Franz Joseph Och and Herman Ney. Phrase-based statistical machine translation. Proceedings of the 25th Annual German Conference on AI: Advances in Artificial Intelligence 18-32 2002.