Source-Side Discontinuous Phrases for Machine Translation: A Comparative Study on Phrase Extraction and Search

Open access

Abstract

Standard phrase-based statistical machine translation systems generate translations based on an inventory of continuous bilingual phrases. In this work, we extend a phrase-based decoder with the ability to make use of phrases that are discontinuous in the source part. Our dynamic programming beam search algorithm supports separate pruning of coverage hypotheses per cardinality and of lexical hypotheses per coverage, as well as coverage constraints that impose restrictions on the possible reorderings. In addition to investigating these aspects, which are related to the decoding procedure, we also concentrate our attention on the question of how to obtain source-side discontinuous phrases from parallel training data. Two approaches (hierarchical and discontinuous extraction) are presented and compared. On a large-scale Chinese!English translation task, we conduct a thorough empirical evaluation in order to study a number of system configurations with source-side discontinuous phrases, and to compare them to setups which employ continuous phrases only.

If the inline PDF is not rendering correctly, you can download the PDF file here.

  • Brown Peter F. Stephen A. Della Pietra Vincent J. Della Pietra and Robert L. Mercer. The Mathematics of Statistical Machine Translation: Parameter Estimation. Computational Linguistics 19(2):263-311 June 1993.

  • Chiang David. A Hierarchical Phrase-Based Model for Statistical Machine Translation. In Proc. of the Annual Meeting of the Assoc. for Computational Linguistics (ACL) pages 263-270 Ann Arbor MI USA June 2005.

  • Chiang David. Hierarchical Phrase-Based Translation. Computational Linguistics 33(2): 201-228 June 2007.

  • Galley Michel and Christopher D. Manning. Accurate Non-Hierarchical Phrase-Based Translation. In Proc. of the Human Language Technology Conf. / North American Chapter of the Assoc. forComputational Linguistics (HLT-NAACL) pages 966-974 Los Angeles CA USA June 2010.

  • Huck Matthias and Hermann Ney. Insertion and Deletion Models for Statistical Machine Translation. In Proc. of the Human Language Technology Conf. / North American Chapter of theAssoc. for Computational Linguistics (HLT-NAACL) pages 347-351 Montréal Canada June 2012.

  • Huck Matthias Saab Mansour Simon Wiesler and Hermann Ney. Lexicon Models for Hierarchical Phrase-Based Machine Translation. In Proc. of the Int. Workshop on Spoken LanguageTranslation (IWSLT) pages 191-198 San Francisco CA USA Dec. 2011.

  • Huck Matthias Jan-Thorsten Peter Markus Freitag Stephan Peitz and Hermann Ney. Hierarchical Phrase-Based Translation with Jane 2. The Prague Bulletin of Mathematical Linguistics (98):37-50 Oct. 2012.

  • Kneser Reinhard and Hermann Ney. Improved Backing-Off for M-gram Language Modelling. In Proc. of the IEEE Int. Conf. on Acoustics Speech and Signal Processing (ICASSP) pages 181-184 Detroit MI USA May 1995.

  • Koehn Philipp Franz Joseph Och and Daniel Marcu. Statistical Phrase-Based Translation. In Proc. of the Human Language Technology Conf. / North American Chapter of the Assoc. for ComputationalLinguistics (HLT-NAACL) pages 127-133 Edmonton Canada May/June 2003.

  • Lopez Adam. Hierarchical Phrase-Based Translation with Suffix Arrays. In Proc. of the JointConf. on Empirical Methods in Natural Language Processing and Computational Natural LanguageLearning (EMNLP-CoNLL) pages 976-985 Prague Czech Republic June 2007.

  • Mauser Arne Saša Hasan and Hermann Ney. Extending Statistical Machine Translation with Discriminative and Trigger-Based Lexicon Models. In Proc. of the Conf. on Empirical Methodsfor Natural Language Processing (EMNLP) pages 210-217 Singapore Aug. 2009.

  • Moore Robert C. and Chris Quirk. Faster Beam-Search Decoding for Phrasal Statistical Machine Translation. In Proc. of MT Summit XI Copenhagen Denmark Sept. 2007.

  • Och Franz Josef. Statistical Machine Translation: From Single-Word Models to Alignment Templates. PhD thesis RWTH Aachen University Aachen Germany Oct. 2002.

  • Och Franz Josef. Minimum Error Rate Training for Statistical Machine Translation. In Proc. ofthe Annual Meeting of the Assoc. for Computational Linguistics (ACL) pages 160-167 Sapporo Japan July 2003.

  • Och Franz Josef and Hermann Ney. Discriminative Training and Maximum Entropy Models for Statistical Machine Translation. In Proc. of the Annual Meeting of the Assoc. for ComputationalLinguistics (ACL) pages 295-302 Philadelphia PA USA July 2002.

  • Och Franz Josef and Hermann Ney. A Systematic Comparison of Various Statistical Alignment Models. Computational Linguistics 29(1):19-51 Mar. 2003.

  • Och Franz Josef Christoph Tillmann and Hermann Ney. Improved Alignment Models for Statistical Machine Translation. In Proc. of the Joint SIGDAT Conf. on Empirical Methods inNatural Language Processing and Very Large Corpora pages 20-28 University of Maryland College Park MD USA June 1999.

  • Papineni Kishore Salim Roukos Todd Ward and Wei-Jing Zhu. BLEU: a Method for Automatic Evaluation of Machine Translation. In Proc. of the Annual Meeting of the Assoc. forComputational Linguistics (ACL) pages 311-318 Philadelphia PA USA July 2002.

  • Rahman Mohammad Sohel Costas S. Iliopoulos Inbok Lee Manal Mohamed and William F. Smyth. Finding Patterns with Variable Length gaps or Don’t Cares. In Proc. of the InternationalComputing and Combinatorics Conf. (COCOON) pages 146-155 Aug. 2006.

  • Snover Matthew Bonnie Dorr Richard Schwartz Linnea Micciulla and John Makhoul. A Study of Translation Edit Rate with Targeted Human Annotation. In Proc. of the Conf. of theAssoc. for Machine Translation in the Americas (AMTA) pages 223-231 Cambridge MA USA Aug. 2006.

  • Søgaard Anders and Jonas Kuhn. Empirical Lower Bounds on Alignment Error Rates in Syntax-Based Machine Translation. In Proc. of the Third Workshop on Syntax and Structurein Statistical Translation (SSST) pages 19-27 Boulder CO USA June 2009.

  • Stein Daniel David Vilar Stephan Peitz Markus Freitag Matthias Huck and Hermann Ney. A Guide to Jane an Open Source Hierarchical Translation Toolkit. The Prague Bulletin ofMathematical Linguistics (95):5-18 Apr. 2011.

  • Stolcke Andreas. SRILM - an Extensible Language Modeling Toolkit. In Proc. of the Int. Conf.on Spoken Language Processing (ICSLP) Denver CO USA Sept. 2002.

  • Vilar David. Investigations on Hierarchical Phrase-Based Machine Translation. PhD thesis RWTH Aachen University Aachen Germany Nov. 2011.

  • Vilar David Daniel Stein Matthias Huck and Hermann Ney. Jane: Open Source Hierarchical Translation Extended with Reordering and Lexicon Models. In Proc. of the Workshop onStatistical Machine Translation (WMT) pages 262-270 Uppsala Sweden July 2010.

  • Vilar David Daniel Stein Matthias Huck and Hermann Ney. Jane: an Advanced Freely Available Hierarchical Machine Translation Toolkit. Machine Translation 26(3):197-216 Sept.

  • Vogel Stephan. Hermann Ney and Christoph Tillmann. HMM-Based Word Alignment in Statistical Translation. In Proc. of the Int. Conf. on Computational Linguistics (COLING) pages 836-841 Copenhagen Denmark Aug. 1996.

  • Wu Dekai. Stochastic Inversion Transduction Grammars and Bilingual Parsing of Parallel Corpora. Computational Linguistics 23(3):377-404 Sept. 1997.

  • Wuebker Joern Matthias Huck Stephan Peitz Malte Nuhn Markus Freitag Jan-Thorsten Peter Saab Mansour and Hermann Ney. Jane 2: Open Source Phrase-based and Hierarchical Statistical Machine Translation. In Proc. of the Int. Conf. on Computational Linguistics (COLING) pages 483-491 Mumbai India Dec. 2012.

  • Zens Richard. Phrase-Based Statistical Machine Translation: Models Search Training. PhD thesis RWTH Aachen University Aachen Germany Feb. 2008.

  • Zens Richard and Hermann Ney. A Comparative Study on Reordering Constraints in Statistical Machine Translation. In Proc. of the Annual Meeting of the Assoc. for Computational Linguistics(ACL) pages 144-151 Sapporo Japan July 2003.

  • Zens Richard and Hermann Ney. Improvements in Dynamic Programming Beam Search for Phrase-based Statistical Machine Translation. In Proc. of the Int. Workshop on Spoken LanguageTranslation (IWSLT) pages 195-205 Honolulu HI USA Oct. 2008.

  • Zens Richard Hermann Ney Taro Watanabe and Eiichiro Sumita. Reordering Constraints for Phrase-Based Statistical Machine Translation. In Proc. of the Int. Conf. on ComputationalLinguistics (COLING) pages 205-211 Geneva Switzerland Aug. 2004.

Search
Journal information
Metrics
All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 118 39 0
PDF Downloads 80 28 1