Hierarchical Phrase-Based Translation with Jane 2

Open access

Hierarchical Phrase-Based Translation with Jane 2

In this paper, we give a survey of several recent extensions to hierarchical phrase-based machine translation that have been implemented in version 2 of Jane, RWTH's open source statistical machine translation toolkit. We focus on the following techniques: Insertion and deletion models, lexical scoring variants, reordering extensions with non-lexicalized reordering rules and with a discriminative lexicalized reordering model, and soft string-to-dependency hierarchical machine translation. We describe the fundamentals of each of these techniques and present experimental results obtained with Jane 2 to confirm their usefulness in state-of-the-art hierarchical phrase-based translation (HPBT).

If the inline PDF is not rendering correctly, you can download the PDF file here.

  • Berger Adam L. Stephen A. Della Pietra and Vincent J. Della Pietra. A Maximum Entropy Approach to Natural Language Processing. Computational Linguistics 22(1):39-72 Mar 1996.

  • Brown Peter F. Stephen A. Della Pietra Vincent J. Della Pietra and Robert L. Mercer. The Mathematics of Statistical Machine Translation: Parameter Estimation. Computational Linguistics 19(2):263-311 June 1993.

  • Chappelier Jean-Cédric and Martin Rajman. A Generalized CYK Algorithm for Parsing Stochastic CFG. In Proc. of the First Workshop on Tabulation in Parsing and Deduction pages 133-137 Apr. 1998.

  • Chen Stanley F. and Ronald Rosenfeld. A Gaussian Prior for Smoothing Maximum Entropy Models. Technical Report CMUCS-99-108 Carnegie Mellon University Pittsburgh PA USA Feb. 1999.

  • Chiang David. A Hierarchical Phrase-Based Model for Statistical Machine Translation. In Proc. of the Annual Meeting of the Assoc. for Computational Linguistics (ACL) pages 263-270 Ann Arbor MI USA June 2005.

  • Chiang David. Hierarchical Phrase-Based Translation. Computational Linguistics 33(2): 201-228 June 2007.

  • Chiang David Kevin Knight and Wei Wang. 11001 new Features for Statistical Machine Translation. In Proc. of the Human Language Technology Conf. / North American Chapter of the Assoc. for Computational Linguistics (HLT-NAACL) pages 218-226 Boulder CO USA June 2009.

  • Darroch John N. and Douglas Ratcliff. Generalized Iterative Scaling for Log-Linear Models. Annals of Mathematical Statistics 43:1470-1480 1972.

  • He Zhongjun Yao Meng and Hao Yu. Maximum Entropy Based Phrase Reordering for Hierarchical Phrase-based Translation. In Proc. of the Conf. on Empirical Methods for Natural Language Processing (EMNLP) pages 555-563 Cambridge MA USA Oct. 2010a.

  • He Zhongjun Yao Meng and Hao Yu. Extending the Hierarchical Phrase Based Model with Maximum Entropy Based BTG. In Proc. of the Conf. of the Assoc. for Machine Translation in the Americas (AMTA) Denver CO USA Oct./Nov. 2010b.

  • Huang Liang and David Chiang. Forest Rescoring: Faster Decoding with Integrated Language Models. In Proc. of the Annual Meeting of the Assoc. for Computational Linguistics (ACL) pages 144-151 Prague Czech Republic June 2007.

  • Huck Matthias and Hermann Ney. Insertion and Deletion Models for Statistical Machine Translation. In Proc. of the Human Language Technology Conf. / North American Chapter of the Assoc. for Computational Linguistics (HLT-NAACL) pages 347-351 Montréal Canada June 2012.

  • Huck Matthias Martin Ratajczak Patrick Lehnen and Hermann Ney. A Comparison of Various Types of Extended Lexicon Models for Statistical Machine Translation. In Proc. of the Conf. of the Assoc. for Machine Translation in the Americas (AMTA) Denver CO USA Oct. Nov. 2010.

  • Huck Matthias Saab Mansour Simon Wiesler and Hermann Ney. Lexicon Models for Hierarchical Phrase-Based Machine Translation. In Proc. of the Int. Workshop on Spoken Language Translation (IWSLT) pages 191-198 San Francisco CA USA Dec. 2011.

  • Huck Matthias Stephan Peitz Markus Freitag and Hermann Ney. Discriminative Reordering Extensions for Hierarchical Phrase-Based Machine Translation. In Proc. of the 16th Annual Conf. of the European Assoc. for Machine Translation pages 313-320 Trento Italy May 2012.

  • Klein Dan and Christopher D. Manning. Accurate Unlexicalized Parsing. In Proc. of the Annual Meeting of the Assoc. for Computational Linguistics (ACL) pages 423-430 Sapporo Japan July 2003.

  • Koehn Philipp Franz Joseph Och and Daniel Marcu. Statistical Phrase-Based Translation. In Proc. of the Human Language Technology Conf. / North American Chapter of the Assoc. for Computational Linguistics (HLT-NAACL) pages 127-133 Edmonton Canada May/June 2003.

  • Koehn P. H. Hoang A. Birch C. Callison-Burch M. Federico N. Bertoldi B. Cowan W. Shen C. Moran R. Zens C. Dyer O. Bojar A. Constantin and E. Herbst. Moses: Open Source Toolkit for Statistical Machine Translation. In Proc. of the Annual Meeting of the Assoc. for Computational Linguistics (ACL) pages 177-180 Prague Czech Republic June 2007.

  • Li Junhui Zhaopeng Tu Guodong Zhou and Josef van Genabith. Using Syntactic Head Information in Hierarchical Phrase-Based Translation. In Proc. of the Workshop on Statistical Machine Translation (WMT) pages 232-242 Montréal Canada June 2012.

  • Mauser Arne Saša Hasan and Hermann Ney. Extending Statistical Machine Translation with Discriminative and Trigger-Based Lexicon Models. In Proc. of the Conf. on Empirical Methods for Natural Language Processing (EMNLP) pages 210-218 Singapore Aug. 2009.

  • Nelder John A. and Roger Mead. A Simplex Method for Function Minimization. The Computer Journal 7:308-313 1965.

  • Och Franz Josef. Minimum Error Rate Training for Statistical Machine Translation. In Proc. of the Annual Meeting of the Assoc. for Computational Linguistics (ACL) pages 160-167 Sapporo Japan July 2003.

  • Och Franz Josef and Hermann Ney. A Systematic Comparison of Various Statistical Alignment Models. Computational Linguistics 29(1):19-51 Mar. 2003.

  • Papineni Kishore Salim Roukos Todd Ward and Wei-Jing Zhu. Bleu: a Method for Automatic Evaluation of Machine Translation. In Proc. of the Annual Meeting of the Assoc. for Computational Linguistics (ACL) pages 311-318 Philadelphia PA USA July 2002.

  • Peter Jan-Thorsten Matthias Huck Hermann Ney and Daniel Stein. Soft String-to-Dependency Hierarchical Machine Translation. In International Workshop on Spoken Language Translation pages 246-253 San Francisco CA USA Dec. 2011.

  • Sankaran Baskaran and Anoop Sarkar. Improved Reordering for Shallow-n Grammar based Hierarchical Phrase-based Translation. In Proc. of the Human Language Technology Conf. / North American Chapter of the Assoc. for Computational Linguistics (HLT-NAACL) pages 533-537 Montréal Canada June 2012.

  • Shen Libin Jinxi Xu and Ralph Weischedel. A New String-to-Dependency Machine Translation Algorithm with a Target Dependency Language Model. In Proc. of the Annual Meeting of the Assoc. for Computational Linguistics (ACL) pages 577-585 Columbus OH USA June 2008.

  • Shen Libin Jinxi Xu and Ralph Weischedel. String-to-Dependency Statistical Machine Translation. Computational Linguistics 36(4):649-671 Dec. 2010.

  • Snover Matthew Bonnie Dorr Richard Schwartz Linnea Micciulla and John Makhoul. A Study of Translation Edit Rate with Targeted Human Annotation. In Proc. of the Conf. of the Assoc. for Machine Translation in the Americas (AMTA) pages 223-231 Cambridge MA USA Aug. 2006.

  • Stein Daniel Stephan Peitz David Vilar and Hermann Ney. A Cocktail of Deep Syntactic Features for Hierarchical Machine Translation. In Proc. of the Conf. of the Assoc. for Machine Translation in the Americas (AMTA) Denver CO USA Oct./Nov. 2010.

  • Stein Daniel David Vilar Stephan Peitz Markus Freitag Matthias Huck and Hermann Ney. A Guide to Jane an Open Source Hierarchical Translation Toolkit. The Prague Bulletin of Mathematical Linguistics (95):5-18 Apr. 2011.

  • Stolcke Andreas. SRILM - an Extensible Language Modeling Toolkit. In Proc. of the Int. Conf. on Spoken Language Processing (ICSLP) volume 3 Denver CO USA Sept. 2002.

  • Venugopal Ashish Andreas Zollmann N. A. Smith and Stephan Vogel. Preference Grammars: Softening Syntactic Constraints to Improve Statistical Machine Translation. In Proc. of the Human Language Technology Conf. / North American Chapter of the Assoc. for Computational Linguistics (HLT-NAACL) pages 236-244 Boulder CO USA June 2009.

  • Vilar David and Hermann Ney. On LM Heuristics for the Cube Growing Algorithm. In Proc. of the Annual Conf. of the European Assoc. for Machine Translation (EAMT) pages 242-249 Barcelona Spain May 2009.

  • Vilar David and Hermann Ney. Cardinality pruning and language model heuristics for hierarchical phrase-based translation. Machine Translation Nov. 2011. URL http://dx.doi.org/10.1007/s10590-011-9119-4.

  • Vilar David Daniel Stein and Hermann Ney. Analysing Soft Syntax Features and Heuristics for Hierarchical Phrase Based Machine Translation. In Proc. of the Int. Workshop on Spoken Language Translation (IWSLT) pages 190-197 Waikiki HI USA Oct. 2008.

  • Vilar David Daniel Stein Matthias Huck and Hermann Ney. Jane: Open Source Hierarchical Translation Extended with Reordering and Lexicon Models. In Proc. of the Workshop on Statistical Machine Translation (WMT) pages 262-270 Uppsala Sweden July 2010a.

  • Vilar David Daniel Stein Stephan Peitz and Hermann Ney. If I Only Had a Parser: Poor Man's Syntax for Hierarchical Machine Translation. In Proc. of the Int. Workshop on Spoken Language Translation (IWSLT) pages 345-352 Paris France Dec. 2010b.

  • Vilar David Daniel Stein Matthias Huck and Hermann Ney. Jane: an advanced freely available hierarchical machine translation toolkit. Machine Translation 2012. URL http://dx.doi.org/10.1007/s10590-011-9120-y.

  • Zens Richard and Hermann Ney. Improvements in Phrase-Based Statistical Machine Translation. In Proc. of the Human Language Technology Conf. / North American Chapter of the Assoc. for Computational Linguistics (HLT-NAACL) pages 257-264 Boston MA USA May 2004.

  • Zens Richard and Hermann Ney. Discriminative Reordering Models for Statistical Machine Translation. In Proc. of the Human Language Technology Conf. / North American Chapter of the Assoc. for Computational Linguistics (HLT-NAACL) pages 55-63 New York City NY USA June 2006.

Search
Journal information
Metrics
All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 160 66 4
PDF Downloads 87 45 4