A Guide to Jane, an Open Source Hierarchical Translation Toolkit

Daniel Stein 1 , David Vilar 1 , Stephan Peitz 1 , Markus Freitag 1 , Matthias Huck 1  and Hermann Ney 1
  • 1 Chair for Computer Science 6, RWTH Aachen University

A Guide to Jane, an Open Source Hierarchical Translation Toolkit

Jane is RWTH's hierarchical phrase-based translation toolkit. It includes tools for phrase extraction, translation and scaling factor optimization, with efficient and documented programs of which large parts can be parallelized. The decoder features syntactic enhancements, reorderings, triplet models, discriminative word lexica, and support for a variety of language model formats. In this article, we will review the main features of Jane and explain the overall architecture. We will also indicate where and how new models can be included.

If the inline PDF is not rendering correctly, you can download the PDF file here.

  • Brown, Peter F., Stephan A. Della Pietra, Vincent J. Della Pietra, and Robert L. Mercer. The Mathematics of Statistical Machine Translation: Parameter Estimation. Computational Linguistics, 19(2):263-311, June 1993.

  • Chappelier, Jean-Cédric and Martin Rajman. A Generalized CYK Algorithm for Parsing Stochastic CFG. In Proc. of the First Workshop on Tabulation in Parsing and Deduction, pages 133-137, Apr. 1998.

  • Chiang, David. Hierarchical Phrase-based Translation. Computational Linguistics, 33(2): 201-228, June 2007.

  • Chiang, David, Kevin Knight, and Wei Wang. 11,001 new Features for Statistical Machine Translation. In Proc. of the Human Language Technology Conf. / North American Chapter of the Association for Computational Linguistics (HLT-NAACL), pages 218-226, Boulder, Colorado, June 2009.

  • Heger, Carmen, Joern Wuebker, Matthias Huck, Gregor Leusch, Saab Mansour, Daniel Stein, and Hermann Ney. The RWTH Aachen Machine Translation System for WMT 2010. In ACL 2010 Joint Fifth Workshop on Statistical Machine Translation and Metrics MATR, pages 93-97, Uppsala, Sweden, July 2010.

  • Huang, Liang and David Chiang. Forest Rescoring: Faster Decoding with Integrated Language Models. In Proc. of the Annual Meeting of the Association for Computational Linguistics (ACL), pages 144-151, Prague, Czech Republic, June 2007.

  • Koehn, Philipp, Franz Josef Och, and Daniel Marcu. Statistical Phrase-Based Translation. In Proceedings of the Human Language Technology, North American Chapter of the Association for Computational Linguistics, pages 54-60, Edmonton, Canada, May 2003.

  • Koehn, P., H. Hoang, A. Birch, C. Callison-Burch, M. Federico, N. Bertoldi, B. Cowan, W. Shen, C. Moran, R. Zens, et al. Moses: Open Source Toolkit for Statistical Machine Translation. In Proc. of the Annual Meeting of the Association for Computational Linguistics (ACL), pages 177-180, Prague, Czech Republic, June 2007.

  • Li, Zhifei, Chris Callison-Burch, Chris Dyer, Sanjeev Khudanpur, Lane Schwartz, Wren Thornton, Jonathan Weese, and Omar Zaidan. Joshua: An Open Source Toolkit for Parsing-Based Machine Translation. In Proc. of the Workshop on Statistical Machine Translation, pages 135-139, Athens, Greece, March 2009. Association for Computational Linguistics.

  • Li, Zhifei, Chris Callison-Burch, Chris Dyer, Juri Ganitkevitch, Ann Irvine, Sanjeev Khudanpur, Lane Schwartz, Wren N. G. Thornton, Ziyuan Wang, Jonathan Weese, and Omar F. Zaidan. Joshua 2.0: A toolkit for parsing-based machine translation with syntax, semirings, discriminative training and other goodies. In ACL 2010 Joint Fifth Workshop on Statistical Machine Translation and Metrics MATR, pages 133-137, Uppsala, Sweden, July 2010.

  • Mauser, Arne, Saša Hasan, and Hermann Ney. Extending Statistical Machine Translation with Discriminative and Trigger-Based Lexicon Models. In Proc. of the Conf. on Empirical Methods for Natural Language Processing (EMNLP), pages 210-218, Singapore, Aug. 2009.

  • Nelder, John A. and Roger Mead. The Downhill Simplex Method. Computer Journal, 7:308, 1965.

  • Och, Franz Josef. Minimum Error Rate Training for Statistical Machine Translation. In Proc. of the Annual Meeting of the Association for Computational Linguistics (ACL), pages 160-167, Sapporo, Japan, July 2003.

  • Och, Franz Josef and Hermann Ney. A Systematic Comparison of Various Statistical Alignment Models. Computational Linguistics, 29(1):19-51, Mar. 2003.

  • Press, William H., Saul A. Teukolsky, William T. Vetterling, and Brian P. Flannery. Numerical Recipes in C++. Cambridge University Press, Cambridge, UK, 2002.

  • Schwartz, Lane. Reproducible Results in Parsing-Based Machine Translation: The JHU Shared Task Submission. In Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR, pages 177-182, Uppsala, Sweden, July 2010. Association for Computational Linguistics.

  • Shen, Libin, Jinxi Xu, and Ralph Weischedel. A New String-to-Dependency Machine Translation Algorithm with a Target Dependency Language Model. In Proc. of the Annual Meeting of the Association for Computational Linguistics (ACL), pages 577-585, Columbus, Ohio, June 2008.

  • Stolcke, Andreas. SRILM - an Extensible Language Modeling Toolkit. In Proc. of the Int. Conf. on Spoken Language Processing (ICSLP), volume 3, Denver, Colorado, Sept. 2002.

  • Talbot, David and Miles Osborne. Smoothed Bloom Filter Language Models: Tera-scale LMs on the Cheap. In Proc. of the Joint Conf. on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pages 468-476, Prague, Czech Republic, June 2007.

  • Venugopal, Ashish, Andreas Zollmann, N. A. Smith, and Stephan Vogel. Preference Grammars: Softening Syntactic Constraints to Improve Statistical Machine Translation. In Proc. of the Human Language Technology Conf. / North American Chapter of the Association for Computational Linguistics (HLT-NAACL), pages 236-244, Boulder, Colorado, June 2009.

  • Vilar, David and Hermann Ney. On LM Heuristics for the Cube Growing Algorithm. In Proc. of the Annual Conf. of the European Association for Machine Translation (EAMT), pages 242-249, Barcelona, Spain, May 2009.

  • Vilar, David, Daniel Stein, and Hermann Ney. Analysing Soft Syntax Features and Heuristics for Hierarchical Phrase Based Machine Translation. In Proc. of the Int. Workshop on Spoken Language Translation (IWSLT), pages 190-197, Waikiki, Hawaii, Oct. 2008.

  • Vilar, David, Daniel Stein, Matthias Huck, and Hermann Ney. Jane: Open source hierarchical translation, extended with reordering and lexicon models. In Proc. of the Workshop on Statistical Machine Translation, pages 262-270, Uppsala, Sweden, July 2010.

  • Wuebker, Joern, Arne Mauser, and Hermann Ney. Training phrase translation models with leaving-one-out. In 48th Annual Meeting of the Association for Computational Linguistics, pages 475-484, Uppsala, Sweden, 2010.

  • Zollmann, Andreas and Ashish Venugopal. Syntax Augmented Machine Translation via Chart Parsing. In Proc. of the Human Language Technology Conf. / North American Chapter of the Association for Computational Linguistics (HLT-NAACL), New York, June 2006.

OPEN ACCESS

Journal + Issues

Search