Scalable Reordering Models for SMT based on Multiclass SVM

Open access

Abstract

In state-of-the-art phrase-based statistical machine translation systems, modelling phrase reorderings is an important need to enhance naturalness of the translated outputs, particularly when the grammatical structures of the language pairs differ significantly. Posing phrase movements as a classification problem, we exploit recent developments in solving large-scale multiclass support vector machines. Using dual coordinate descent methods for learning, we provide a mechanism to shrink the amount of training data required for each iteration. Hence, we produce significant computational saving while preserving the accuracy of the models. Our approach is a couple of times faster than maximum entropy approach and more memory-efficient (50% reduction). Experiments were carried out on an Arabic-English corpus with more than a quarter of a billion words. We achieve BLEU score improvements on top of a strong baseline system with sparse reordering features.

If the inline PDF is not rendering correctly, you can download the PDF file here.

  • Al-Onaizan Yaser and Kishore Papineni. Distortion models for statistical machine translation. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics pages 529-536 Sydney Australia July 2006. Association for Computational Linguistics. URL http://www.aclweb.org/ anthology/P/P06/P06-1067.

  • Alrajeh Abdullah and Mahesan Niranjan. Large-scale reordering model for statistical machine translation using dual multinomial logistic regression. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) pages 1758-1763. Association for Computational Linguistics 2014a. URL http://aclweb.org/anthology/D14-1183.

  • Alrajeh Abdullah and Mahesan Niranjan. Bayesian reordering model with feature selection. In Proceedings of the Ninth Workshop on Statistical Machine Translation pages 477-485 Baltimore Maryland USA June 2014b. Association for Computational Linguistics. URL http://www. aclweb.org/anthology/W/W14/W14-3361.

  • Alrajeh Abdullah Akiko Takeda and Mahesan Niranjan. Memory-efficient large-scale linear support vector machine. In Proceedings of SPIE: Seventh International Conference on Machine Vision (ICMV 2014) volume 9445 pages 944527-944527-6 Milan Italy February 2015. SPIE. doi: 10.1117/12.2180925. URL http://dx.doi.org/10.1117/12.2180925.

  • Andrew Galen and Jianfeng Gao. Scalable training of L1-regularized log-linear models. In Proceedings of the 24th International Conference on Machine Learning ICML ’07 pages 33-40. ACM 2007. ISBN 978-1-59593-793-3.

  • Bishop Christopher M. Pattern Recognition and Machine Learning (Information Science and Statistics). Springer-Verlag New York Inc. Secaucus NJ USA 2006.

  • Boser Bernhard E. Isabelle M. Guyon and Vladimir N. Vapnik. A training algorithm for optimal margin classifiers. In Proceedings of the Fifth Annual Workshop on Computational Learning Theory COLT ’92 pages 144-152 1992.

  • Brown P. J. Cocke S. Della Pietra V. Della Pietra F. Jelinek R. Mercer and P. Roossin. A statistical approach to language translation. In 12th International Conference on Computational Linguistics (COLING) pages 71-76 1988.

  • Brown Peter F. John Cocke Stephen A. Della-Pietra Vincent J. Della-Pietra Frederick Jelinek Robert L. Mercer and Paul Rossin. The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics 19(2):263-311 1993.

  • Chang Yin-Wen Cho-Jui Hsieh Kai-Wei Chang Michael Ringgaard and Chih-Jen Lin. Training and testing low-degree polynomial data mappings via linear SVM. Journal of Machine Learning Research 2:1471-1490 Apr. 2010.

  • Cherry Colin. Improved reordering for phrase-based translation using sparse features. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies pages 22-31 Atlanta Georgia June 2013. Association for Computational Linguistics. URL http://www.aclweb.org/anthology/N13-1003.

  • Crammer Koby and Yoram Singer. On the algorithmic implementation of multiclass kernelbased vector machines. Journal of Machine Learning Research 2:265-292 Mar. 2002.

  • Cristianini Nello and John Shawe-Taylor. An Introduction to Support Vector Machines: And Other Kernel-based Learning Methods. Cambridge University Press New York NY USA 2000. ISBN 0-521-78019-5.

  • Doddington George. Automatic evaluation of machine translation quality using n-gram cooccurrence statistics. In Proceedings of the Second International Conference on Human Language Technology Research HLT ’02 pages 138-145 San Francisco CA USA 2002. Morgan Kaufmann Publishers Inc.

  • Eisele A. and Y. Chen. MultiUN: A multilingual corpus from united nation documents. In Tapias Daniel Mike Rosner Stelios Piperidis Jan Odjik Joseph Mariani Bente Maegaard Khalid Choukri and Nicoletta Calzolari (Conference Chair) editors Proceedings of the Seventh conference on International Language Resources and Evaluation pages 2868-2872. European Language Resources Association (ELRA) 5 2010.

  • Fan Rong-En Kai-Wei Chang Cho-Jui Hsieh Xiang-Rui Wang and Chih-Jen Lin. LIBLINEAR: A library for large linear classification. Journal of Machine Learning Research 9:1871-1874 2008.

  • Galley Michel and Christopher D. Manning. A simple and effective hierarchical phrase reordering model. In Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing pages 848-856 Hawaii October 2008. Association for Computational Linguistics. Glasmachers Tobias and Ürün Dogan. Accelerated coordinate descent with adaptive coordinate frequencies. In Ong Cheng Soon and Tu Bao Ho editors Asian Conference on Machine Learning ACML volume 29 of JMLR Proceedings pages 72-86. JMLR.org 2013.

  • Hopkins Mark and Jonathan May. Tuning as ranking. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing pages 1352-1362 Edinburgh Scotland UK. July 2011. Association for Computational Linguistics. URL http://www.aclweb.org/ anthology/D11-1125.

  • Hsieh Cho-Jui Kai-Wei Chang Chih-Jen Lin S. Sathiya Keerthi and S. Sundararajan. A dual coordinate descent method for large-scale linear SVM. In Proceedings of the 25th International Conference on Machine Learning ICML ’08 pages 408-415 2008.

  • Keerthi S. Sathiya Sellamanickam Sundararajan Kai-Wei Chang Cho-Jui Hsieh and Chih-Jen Lin. A sequential dual method for large scale multi-class linear SVMs. In Proceedings of the Forteenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining pages 408-416 2008. URL http://www.csie.ntu.edu.tw/~cjlin/papers/sdm_kdd.pdf.

  • Kneser Reinhard and Hermann Ney. Improved backing-off for m-gram language modeling. IEEE International Conference on Acoustics Speech and Signal Processing pages 181-184 1995.

  • Koehn Philipp. Pharaoh: a beam search decoder for phrase-based statistical machine translation models. In Proceedings of 6th Conference of the Association for Machine Translation in the Americas (AMTA) pages 115-124 Washington DC 2004a.

  • Koehn Philipp. Statistical significance tests for machine translation evaluation. In Lin Dekang and Dekai Wu editors Proceedings of EMNLP 2004 pages 388-395 Barcelona Spain July 2004b. Association for Computational Linguistics.

  • Koehn Philipp. Statistical Machine Translation. Cambridge University Press 2010.

  • Koehn Philipp and Christof Monz. Shared task: Statistical machine translation between European languages. In Proceedings of ACL Workshop on Building and Using Parallel Texts pages 119-124. Association for Computational Linguistics 2005.

  • Koehn Philipp Amittai Axelrod Alexandra Birch Mayne Chris Callison-Burch Miles Osborne and David Talbot. Edinburgh system description for the 2005 IWSLT speech translation evaluation. In Proceedings of International Workshop on Spoken Language Translation Pittsburgh PA 2005.

  • Koehn Philipp Hieu Hoang Alexandra Birch Chris Callison-Burch Marcello Federico Nicola Bertoldi Brooke Cowan Wade Shen Christine Moran Richard Zens Christopher J. Dyer Ondřej Bojar Alexandra Constantin and Evan Herbst. Moses: Open source toolkit for statistical machine translation. In Proceedings of the ACL 2007 Demo and Poster Sessions pages 177-180 2007.

  • Kumar Shankar and William Byrne. Local phrase reordering models for statistical machine translation. In Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing pages 161-168 Vancouver British Columbia Canada October 2005. Association for Computational Linguistics.

  • Nguyen Vinh Van Akira Shimazu Minh Le Nguyen and Thai Phuong Nguyen. Improving a lexicalized hierarchical reordering model using maximum entropy. In Proceedings of the Twelfth Machine Translation Summit (MT Summit XII). International Association for Machine Translation 2009. Ni Y. C. Saunders S. Szedmak and M. Niranjan. Exploitation of machine learning techniques in modelling phrase movements for machine translation. Journal of Machine Learning Research 12:1-30 Feb. 2011. ISSN 1532-4435.

  • Och Franz Josef and Hermann Ney. Improved statistical alignment models. In Proceedings of the 38th Annual Meeting of the Association of Computational Linguistics (ACL) 2000.

  • Och Franz Josef and Hermann Ney. Discriminative training and maximum entropy models for statistical machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL) 2002.

  • Och Franz Josef and Hermann Ney. The alignment template approach to statistical machine translation. Computational Linguistics 30(4):417-449 2004.

  • Papineni K.A. S. Roukos and R.T. Ward. Maximum likelihood and discriminative training of direct translation models. In Proceedings of ICASSP pages 189-192 1998.

  • Papineni K. S. Roukos T. Ward and W. Zhu. BLEU: a method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics pages 311-318 Stroudsburg PA USA 2002. ACL.

  • Shawe-Taylor John and Nello Cristianini. Kernel Methods for Pattern Analysis. Cambridge University Press New York NY USA 2004.

  • Tillmann Christoph. A unigram orientation model for statistical machine translation. In Proceedings of HLT-NAACL: Short Papers pages 101-104 2004.

  • Xiang Bing Niyu Ge and Abraham Ittycheriah. Improving reordering for statistical machine translation with smoothed priors and syntactic features. In Proceedings of SSST-5 Fifth Workshop on Syntax Semantics and Structure in Statistical Translation pages 61-69 Portland Oregon USA 2011. Association for Computational Linguistics.

  • Xiong Deyi Qun Liu and Shouxun Lin. Maximum entropy based phrase reordering model for statistical machine translation. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the ACL pages 521-528 Sydney July 2006. Association for Computational Linguistics.

  • Zens Richard and Hermann Ney. Discriminative reordering models for statistical machine translation. In Proceedings on the Workshop on Statistical Machine Translation pages 55-63 New York City June 2006. Association for Computational Linguistics.

Suche
Zeitschrifteninformation
Cited By
Metriken
Gesamte Zeit Letztes Jahr Letzte 30 Tage
Abstract Views 0 0 0
Full Text Views 120 65 2
PDF Downloads 79 51 1