We present experiments on multi-task learning for discriminative training in statistical machine translation (SMT), extending standard minimum-error-rate training (MERT) by techniques that take advantage of the similarity of related tasks. We apply our techniques to German-to-English translation of patents from 8 tasks according to the International Patent Classification (IPC) system. Our experiments show statistically significant gains over task-specific training by techniques that model commonalities through shared parameters. However, more finegrained combinations of shared parameters with task-specific ones could not be brought to bear on models with a small number of dense features. The software used in the experiments is released as open-source tool.
If the inline PDF is not rendering correctly, you can download the PDF file here.
Bertoldi Nicola and Marcello Federico. Domain adaptation for statistical machine translation with monolingual resources. In Proceedings of the 4th EACL Workshop on Statistical Machine Translation Athens Greece 2009.
Bertoldi Nicola Barry Haddow and Jean-Baptiste Fouet. Improved minimum error rate training in moses. The Prague Bulletin of Mathematical Linguistics (91):7-16 2009.
Ceauşu Alexandru John Tinsley Jian Zhang and Andy Way. Experiments on domain adaptation for patent machine translation in the PLuTO project. In Proceedings of the 15th Conference of the European Assocation for Machine Translation (EAMT 2011) Leuven Belgium 2011.
Chapelle Olivier Pannagadatta Shivaswamy Srinivas Vadrevu Kilian Weinberger Ya Zhang and Belle Tseng. Boosted multi-task learning. Machine Learning 2011.
Daumé Hal. Frustratingly easy domain adaptation. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics (ACL'07) Prague Czech Republic 2007.
Daumé Hal and Jagadeesh Jagarlamudi. Domain adaptation for machine translation by mining unseen words. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL-HLT 2011) Portland OR 2011.
Dredze Mark Alex Kulesza and Koby Crammer. Multi-domain learning by confidence-weighted parameter combination. Machine Learning 79:123-149 2010.
Evgeniou Theodoros and Massimiliano Pontil. Regularized multi-task learning. In Proceedings of the 10th ACM SIGKDD conference on knowledge discovery and data mining (KDD'04) Seattle WA 2004.
Finkel Jenny Rose and Christopher D. Manning. Hierarchical bayesian domain adaptation. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics - Human Language Technologies (NAACL-HLT'09) Boulder CO 2009.
Foster George and Roland Kuhn. Mixture-model adaptation for SMT. In Proceedings of the Second Workshop on Statistical Machine Translation Prague Czech Republic 2007.
Koehn Philipp and Josh Schroeder. Experiments in domain adaptation for statistical machine translation. In Proceedings of the Second Workshop on Statistical Machine Translation Prague Czech Republic 2007.
Obozinski Guillaume Ben Taskar and Michael I. Jordan. Joint covariate selection and joint subspace selection for multiple classification problems. Statistics and Computing 20:231-252 2010.
Och Franz Josef. Minimum error rate training in statistical machine translation. In Proceedings of the Human Language Technology Conference and the 3rd Meeting of the North American Chapter of the Association for Computational Linguistics (HLT-NAACL'03) Edmonton Cananda 2003.
Papineni Kishore Salim Roukos Todd Ward and Wei-Jing Zhu. Bleu: a method for automatic evaluation of machine translation. Technical Report IBM Research Division Technical Report RC22176 (W0190-022) Yorktown Heights N.Y. 2001.
Quattoni Ariadna Xavier Carreras Michael Collins and Trevor Darrell. An efficient projection for l1∞ regularization. In Proceedings of the 26th International Conference on Machine Learning (ICML'09) Montreal Canada 2009.
Riezler Stefan and John Maxwell. On some pitfalls in automatic evaluation and significance testing for MT. In Proceedings of the ACL-05 Workshop on Intrinsic and Extrinsic Evaluation Measures for MT and/or Summarization Ann Arbor MI 2005.
Schwenk Holger. Investigations on large-scale lightly-supervised training for statistical machine translation. In Proceedings of the International Workshop on Spoken Language Translation (IWSLT'08) Hawaii 2008.
Snover Matthew Bonnie Dorr and Richard Schwartz. Language and translation model adaptation using comparable corpora. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP'08) Honolulu Hawaii 2008.
Tinsley John Andy Way and Paraic Sheridan. PLuTO: MT for online patent translation. In Proceedings of the 9th Conference of the Association for Machine Translation in the Americas (AMTA 2010) Denver CO 2010.
Tsuruoka Yoshimasa Jun'ichi Tsujii and Sophia Ananiadou. Stochastic gradient descent training for l1-regularized log-linear models with cumulative penalty. In Proceedings of the 47th Annual Meeting of the Association for Computational Linguistics (ACL-IJCNLP'09) Singapore 2009.
Ueffing Nicola Gholamreza Haffari and Anoop Sarkar. Transductive learning for statistical machine translation. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics (ACL'07) Prague Czech Republic 2007.
Utiyama Masao and Hitoshi Isahara. A japanese-english patent parallel corpus. In Proceedings of MT Summit XI Copenhagen Denmark 2007.
Zhao Bing Matthias Eck and Stephan Vogel. Language model adaptation for statistical machine translation with structured query models. In Proceedings of the 20th International Conference on Computational Linguistics (COLING'04) Geneva Switzerland 2004.
Zinkevich Martin A. Markus Weimer Alex Smola and Lihong Li. Parallelized stochastic gradient descent. In Proceedings of the 24th Annual Conference on Neural Information Processing Sytems (NIPS'10) Vancouver Canada 2010.