Hierarchical Phrase-Based Translation with Jane 2

Matthias Huck 1 , Jan-Thorsten Peter 1 , Markus Freitag 1 , Stephan Peitz 1  and Hermann Ney 1
  • 1 Human Language Technology and Pattern Recognition Group, RWTH Aachen University

Hierarchical Phrase-Based Translation with Jane 2

In this paper, we give a survey of several recent extensions to hierarchical phrase-based machine translation that have been implemented in version 2 of Jane, RWTH's open source statistical machine translation toolkit. We focus on the following techniques: Insertion and deletion models, lexical scoring variants, reordering extensions with non-lexicalized reordering rules and with a discriminative lexicalized reordering model, and soft string-to-dependency hierarchical machine translation. We describe the fundamentals of each of these techniques and present experimental results obtained with Jane 2 to confirm their usefulness in state-of-the-art hierarchical phrase-based translation (HPBT).

If the inline PDF is not rendering correctly, you can download the PDF file here.

  • Berger, Adam L., Stephen A. Della Pietra, and Vincent J. Della Pietra. A Maximum Entropy Approach to Natural Language Processing. Computational Linguistics, 22(1):39-72, Mar 1996.

  • Brown, Peter F., Stephen A. Della Pietra, Vincent J. Della Pietra, and Robert L. Mercer. The Mathematics of Statistical Machine Translation: Parameter Estimation. Computational Linguistics, 19(2):263-311, June 1993.

  • Chappelier, Jean-Cédric and Martin Rajman. A Generalized CYK Algorithm for Parsing Stochastic CFG. In Proc. of the First Workshop on Tabulation in Parsing and Deduction, pages 133-137, Apr. 1998.

  • Chen, Stanley F. and Ronald Rosenfeld. A Gaussian Prior for Smoothing Maximum Entropy Models. Technical Report CMUCS-99-108, Carnegie Mellon University, Pittsburgh, PA, USA, Feb. 1999.

  • Chiang, David. A Hierarchical Phrase-Based Model for Statistical Machine Translation. In Proc. of the Annual Meeting of the Assoc. for Computational Linguistics (ACL), pages 263-270, Ann Arbor, MI, USA, June 2005.

  • Chiang, David. Hierarchical Phrase-Based Translation. Computational Linguistics, 33(2): 201-228, June 2007.

  • Chiang, David, Kevin Knight, and Wei Wang. 11,001 new Features for Statistical Machine Translation. In Proc. of the Human Language Technology Conf. / North American Chapter of the Assoc. for Computational Linguistics (HLT-NAACL), pages 218-226, Boulder, CO, USA, June 2009.

  • Darroch, John N. and Douglas Ratcliff. Generalized Iterative Scaling for Log-Linear Models. Annals of Mathematical Statistics, 43:1470-1480, 1972.

  • He, Zhongjun, Yao Meng, and Hao Yu. Maximum Entropy Based Phrase Reordering for Hierarchical Phrase-based Translation. In Proc. of the Conf. on Empirical Methods for Natural Language Processing (EMNLP), pages 555-563, Cambridge, MA, USA, Oct. 2010a.

  • He, Zhongjun, Yao Meng, and Hao Yu. Extending the Hierarchical Phrase Based Model with Maximum Entropy Based BTG. In Proc. of the Conf. of the Assoc. for Machine Translation in the Americas (AMTA), Denver, CO, USA, Oct./Nov. 2010b.

  • Huang, Liang and David Chiang. Forest Rescoring: Faster Decoding with Integrated Language Models. In Proc. of the Annual Meeting of the Assoc. for Computational Linguistics (ACL), pages 144-151, Prague, Czech Republic, June 2007.

  • Huck, Matthias and Hermann Ney. Insertion and Deletion Models for Statistical Machine Translation. In Proc. of the Human Language Technology Conf. / North American Chapter of the Assoc. for Computational Linguistics (HLT-NAACL), pages 347-351, Montréal, Canada, June 2012.

  • Huck, Matthias, Martin Ratajczak, Patrick Lehnen, and Hermann Ney. A Comparison of Various Types of Extended Lexicon Models for Statistical Machine Translation. In Proc. of the Conf. of the Assoc. for Machine Translation in the Americas (AMTA), Denver, CO, USA, Oct. Nov. 2010.

  • Huck, Matthias, Saab Mansour, Simon Wiesler, and Hermann Ney. Lexicon Models for Hierarchical Phrase-Based Machine Translation. In Proc. of the Int. Workshop on Spoken Language Translation (IWSLT), pages 191-198, San Francisco, CA, USA, Dec. 2011.

  • Huck, Matthias, Stephan Peitz, Markus Freitag, and Hermann Ney. Discriminative Reordering Extensions for Hierarchical Phrase-Based Machine Translation. In Proc. of the 16th Annual Conf. of the European Assoc. for Machine Translation, pages 313-320, Trento, Italy, May 2012.

  • Klein, Dan and Christopher D. Manning. Accurate Unlexicalized Parsing. In Proc. of the Annual Meeting of the Assoc. for Computational Linguistics (ACL), pages 423-430, Sapporo, Japan, July 2003.

  • Koehn, Philipp, Franz Joseph Och, and Daniel Marcu. Statistical Phrase-Based Translation. In Proc. of the Human Language Technology Conf. / North American Chapter of the Assoc. for Computational Linguistics (HLT-NAACL), pages 127-133, Edmonton, Canada, May/June 2003.

  • Koehn, P., H. Hoang, A. Birch, C. Callison-Burch, M. Federico, N. Bertoldi, B. Cowan, W. Shen, C. Moran, R. Zens, C. Dyer, O. Bojar, A. Constantin, and E. Herbst. Moses: Open Source Toolkit for Statistical Machine Translation. In Proc. of the Annual Meeting of the Assoc. for Computational Linguistics (ACL), pages 177-180, Prague, Czech Republic, June 2007.

  • Li, Junhui, Zhaopeng Tu, Guodong Zhou, and Josef van Genabith. Using Syntactic Head Information in Hierarchical Phrase-Based Translation. In Proc. of the Workshop on Statistical Machine Translation (WMT), pages 232-242, Montréal, Canada, June 2012.

  • Mauser, Arne, Saša Hasan, and Hermann Ney. Extending Statistical Machine Translation with Discriminative and Trigger-Based Lexicon Models. In Proc. of the Conf. on Empirical Methods for Natural Language Processing (EMNLP), pages 210-218, Singapore, Aug. 2009.

  • Nelder, John A. and Roger Mead. A Simplex Method for Function Minimization. The Computer Journal, 7:308-313, 1965.

  • Och, Franz Josef. Minimum Error Rate Training for Statistical Machine Translation. In Proc. of the Annual Meeting of the Assoc. for Computational Linguistics (ACL), pages 160-167, Sapporo, Japan, July 2003.

  • Och, Franz Josef and Hermann Ney. A Systematic Comparison of Various Statistical Alignment Models. Computational Linguistics, 29(1):19-51, Mar. 2003.

  • Papineni, Kishore, Salim Roukos, Todd Ward, and Wei-Jing Zhu. Bleu: a Method for Automatic Evaluation of Machine Translation. In Proc. of the Annual Meeting of the Assoc. for Computational Linguistics (ACL), pages 311-318, Philadelphia, PA, USA, July 2002.

  • Peter, Jan-Thorsten, Matthias Huck, Hermann Ney, and Daniel Stein. Soft String-to-Dependency Hierarchical Machine Translation. In International Workshop on Spoken Language Translation, pages 246-253, San Francisco, CA, USA, Dec. 2011.

  • Sankaran, Baskaran and Anoop Sarkar. Improved Reordering for Shallow-n Grammar based Hierarchical Phrase-based Translation. In Proc. of the Human Language Technology Conf. / North American Chapter of the Assoc. for Computational Linguistics (HLT-NAACL), pages 533-537, Montréal, Canada, June 2012.

  • Shen, Libin, Jinxi Xu, and Ralph Weischedel. A New String-to-Dependency Machine Translation Algorithm with a Target Dependency Language Model. In Proc. of the Annual Meeting of the Assoc. for Computational Linguistics (ACL), pages 577-585, Columbus, OH, USA, June 2008.

  • Shen, Libin, Jinxi Xu, and Ralph Weischedel. String-to-Dependency Statistical Machine Translation. Computational Linguistics, 36(4):649-671, Dec. 2010.

  • Snover, Matthew, Bonnie Dorr, Richard Schwartz, Linnea Micciulla, and John Makhoul. A Study of Translation Edit Rate with Targeted Human Annotation. In Proc. of the Conf. of the Assoc. for Machine Translation in the Americas (AMTA), pages 223-231, Cambridge, MA, USA, Aug. 2006.

  • Stein, Daniel, Stephan Peitz, David Vilar, and Hermann Ney. A Cocktail of Deep Syntactic Features for Hierarchical Machine Translation. In Proc. of the Conf. of the Assoc. for Machine Translation in the Americas (AMTA), Denver, CO, USA, Oct./Nov. 2010.

  • Stein, Daniel, David Vilar, Stephan Peitz, Markus Freitag, Matthias Huck, and Hermann Ney. A Guide to Jane, an Open Source Hierarchical Translation Toolkit. The Prague Bulletin of Mathematical Linguistics, (95):5-18, Apr. 2011.

  • Stolcke, Andreas. SRILM - an Extensible Language Modeling Toolkit. In Proc. of the Int. Conf. on Spoken Language Processing (ICSLP), volume 3, Denver, CO, USA, Sept. 2002.

  • Venugopal, Ashish, Andreas Zollmann, N. A. Smith, and Stephan Vogel. Preference Grammars: Softening Syntactic Constraints to Improve Statistical Machine Translation. In Proc. of the Human Language Technology Conf. / North American Chapter of the Assoc. for Computational Linguistics (HLT-NAACL), pages 236-244, Boulder, CO, USA, June 2009.

  • Vilar, David and Hermann Ney. On LM Heuristics for the Cube Growing Algorithm. In Proc. of the Annual Conf. of the European Assoc. for Machine Translation (EAMT), pages 242-249, Barcelona, Spain, May 2009.

  • Vilar, David and Hermann Ney. Cardinality pruning and language model heuristics for hierarchical phrase-based translation. Machine Translation, Nov. 2011. URL http://dx.doi.org/10.1007/s10590-011-9119-4.

  • Vilar, David, Daniel Stein, and Hermann Ney. Analysing Soft Syntax Features and Heuristics for Hierarchical Phrase Based Machine Translation. In Proc. of the Int. Workshop on Spoken Language Translation (IWSLT), pages 190-197, Waikiki, HI, USA, Oct. 2008.

  • Vilar, David, Daniel Stein, Matthias Huck, and Hermann Ney. Jane: Open Source Hierarchical Translation, Extended with Reordering and Lexicon Models. In Proc. of the Workshop on Statistical Machine Translation (WMT), pages 262-270, Uppsala, Sweden, July 2010a.

  • Vilar, David, Daniel Stein, Stephan Peitz, and Hermann Ney. If I Only Had a Parser: Poor Man's Syntax for Hierarchical Machine Translation. In Proc. of the Int. Workshop on Spoken Language Translation (IWSLT), pages 345-352, Paris, France, Dec. 2010b.

  • Vilar, David, Daniel Stein, Matthias Huck, and Hermann Ney. Jane: an advanced freely available hierarchical machine translation toolkit. Machine Translation, 2012. URL http://dx.doi.org/10.1007/s10590-011-9120-y.

  • Zens, Richard and Hermann Ney. Improvements in Phrase-Based Statistical Machine Translation. In Proc. of the Human Language Technology Conf. / North American Chapter of the Assoc. for Computational Linguistics (HLT-NAACL), pages 257-264, Boston, MA, USA, May 2004.

  • Zens, Richard and Hermann Ney. Discriminative Reordering Models for Statistical Machine Translation. In Proc. of the Human Language Technology Conf. / North American Chapter of the Assoc. for Computational Linguistics (HLT-NAACL), pages 55-63, New York City, NY, USA, June 2006.

OPEN ACCESS

Journal + Issues

Search