Continuous Learning from Human Post-Edits for Neural Machine Translation

Marco Turchi 1 , Matteo Negri 1 , M. Amin Farajian 1 , 2  and Marcello Federico 1
  • 1 Fondazione Bruno Kessler, Trento, Italy
  • 2 University of Trento, Italy


Improving machine translation (MT) by learning from human post-edits is a powerful solution that is still unexplored in the neural machine translation (NMT) framework. Also in this scenario, effective techniques for the continuous tuning of an existing model to a stream of manual corrections would have several advantages over current batch methods. First, they would make it possible to adapt systems at run time to new users/domains; second, this would happen at a lower computational cost compared to NMT retraining from scratch or in batch mode. To attack the problem, we explore several online learning strategies to stepwise fine-tune an existing model to the incoming post-edits. Our evaluation on data from two language pairs and different target domains shows significant improvements over the use of static models.

If the inline PDF is not rendering correctly, you can download the PDF file here.

  • Bahdanau, Dzmitry, Kyunghyun Cho, and Yoshua Bengio. “Neural Machine Translation by Jointly Learning to align and translate”. arXiv preprint arXiv:1409.0473, 2014.

  • Bertoldi, Nicola, Mauro Cettolo, and Marcello Federico. Cache-based Online Adaptation for Machine Translation Enhanced Computer Assisted Translation. In Proc. of the XIV Machine Translation Summit, pages 35–42, Nice, France, September 2013.

  • Bojar, Ondřej et al. Findings of the 2016 Conference on Machine Translation. In Proc. of the First Conference on Machine Translation, pages 131–198, Berlin, Germany, August 2016.

  • Bottou, Léon. “Large-Scale Machine Learning with Stochastic Gradient Descent”. In Proc. of COMPSTAT’2010, pages 177–187, Paris, France, August 2010. Springer.

  • Cettolo, Mauro, Jan Niehues, Sebastian Stüker, Luisa Bentivogli, Roldano Cattoni, and Marcello Federico. The IWSLT 2015 Evaluation Campaign. In Proc. of the 12th International Workshop on Spoken Language Translation (IWSLT 2015), Da Nang, Vietnam, 2015.

  • Denkowski, Michael, Chris Dyer, and Alon Lavie. Learning from Post-Editing: Online Model Adaptation for Statistical Machine Translation. In Proc. of the 14th Conference of the European Chapter of the Association for Computational Linguistics, Gothenburg, Sweden, April 2014.

  • Duchi, John, Elad Hazan, and Yoram Singer. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization. Journal of Machine Learning Research, 2011.

  • Germann, Ulrich. Dynamic Phrase Tables for Machine Translation in an Interactive Post-editing Scenario. In Proc. of the Workshop on interactive and adaptive machine translation, pages 20–31, Vancouver, BC, Canada, 2014.

  • Kingma, Diederik P. and Jimmy Ba. Adam: A Method for Stochastic Optimization. In Proc. of the 3rd Int. Conference on Learning Representations, pages 1–13, San Diego, USA, May 2015.

  • Koehn, Philipp. Statistical Significance Tests for Machine Translation Evaluation. In Proceedings of the Empirical Methods on Natural Language Processing, pages 388–395, 2004.

  • Koehn, Philipp. Europarl: A Parallel Corpus for Statistical Machine Translation. In Proc. of the tenth Machine Translation Summit, pages 79–86, Phuket, Thailand, 2005.

  • Li, Xiaoqing, Jiajun Zhang, and Chengqing Zong. “One Sentence One Model for Neural Machine Translation”. arXiv preprint arXiv:1609.06490, 2016.

  • Luong, Minh-Thang and Christopher D. Manning. Mixture-Model Adaptation for SMT. In Proc. of the 12th International Workshop on Spoken Language Translation, pages 76–79, Da Nang, Vietnam, December 2015.

  • McCandless, Michael, Erik Hatcher, and Otis Gospodnetic. Lucene in Action. Manning Publications Co., Greenwich, CT, USA, 2010.

  • Ortiz-Martínez, Daniel. Online Learning for Statistical Machine Translation. Computational Linguistics, 42(1):121–161, 2016.

  • Ortiz-Martínez, Daniel, Ismael García-Varea, and Francisco Casacuberta. Online Learning for Interactive Statistical Machine Translation. In Proc. of NAACL-HLT 2010, pages 546–554, Los Angeles, California, June 2010.

  • Pinnis, Marcis, Rihards Kalnins, Raivis Skadins, and Inguna Skadina. What Can We Really Learn from Post-editing? In Proc. of AMTA 2016 vol. 2: MT Users’ Track, pages 86–91, Austin, Texas, November 2016.

  • Sennrich, Rico, Barry Haddow, and Alexandra Birch. Neural Machine Translation of Rare Words with Subword Units. In Proc. of the 54th Annual Meeting on Association for Computational Linguistics, pages 1715––1725, Berlin, Germany, August 2016. Association for Computational Linguistics.

  • Wäschle, Katharina, Patrick Simianer, Nicola Bertoldi, Stefan Riezler, and Marcello Federico. Generative and Discriminative Methods for Online Adaptation in SMT. In Proc. of Machine Translation Summit XIV, pages 11–18, Nice, France, September 2013.

  • Wuebker, Joern, Spence Green, and John DeNero. Hierarchical Incremental Adaptation for Statistical Machine Translation. In Proc. of the 2015 Conference on Empirical Methods in Natural Language Processing, pages 1059–1065, Lisbon, Portugal, September 2015.

  • Zeiler, Matthew D. “ADADELTA: An Adaptive Learning Rate Method”. arXiv preprint arXiv:1212.5701, 2012.


Journal + Issues