In this paper, we report an analysis of the strengths and weaknesses of several Machine Translation (MT) engines implementing the three most widely used paradigms. The analysis is based on a manually built test suite that comprises a large range of linguistic phenomena. Two main observations are on the one hand the striking improvement of an commercial online system when turning from a phrase-based to a neural engine and on the other hand that the successful translations of neural MT systems sometimes bear resemblance with the translations of a rule-based MT system.
If the inline PDF is not rendering correctly, you can download the PDF file here.
Alonso, Juan A and Gregor Thurmair. The Comprendium Translator system. In Proceedings of the Ninth Machine Translation Summit. International Association for Machine Translation (IAMT), 2003.
Bentivogli, Luisa, Arianna Bisazza, Mauro Cettolo, and Marcello Federico. Neural versus Phrase-Based Machine Translation Quality: a Case Study. CoRR, abs/1608.04631, 2016.
Guillou, Liane and Christian Hardmeier. PROTEST: A Test Suite for Evaluating Pronouns in Machine Translation. In Chair), Nicoletta Calzolari (Conference, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, and Stelios Piperidis, editors, Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), Paris, France, may 2016. European Language Resources Association (ELRA). ISBN 978-2-9517408-9-1.
Isahara, Hitoshi. JEIDA’s test-sets for quality evaluation of MT systems: Technical evaluation from the developer’s point of view. In Proceedings of the MT Summit V. Luxembourg, 1995.
King, Margaret and Kirsten Falkedal. Using Test Suites in Evaluation of Machine Translation Systems. In Proceedings of the 13th Conference on Computational Linguistics - Volume 2, COLING ’90, pages 211–216, Stroudsburg, PA, USA, 1990. Association for Computational Linguistics.
Koehn, Philipp, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, Chris Dyer, Ondřej Bojar, Alexandra Constantin, and Evan Herbst. Moses: Open Source Toolkit for Statistical Machine Translation. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions, pages 177–180, Prague, Czech Republic, June 2007. Association for Computational Linguistics.
Koh, Sungryong, Jinee Maeng, Ji-Young Lee, Young-Sook Chae, and Key-Sun Choi. A test suite for evaluation of English-to-Korean machine translation systems. In Proceedings of the MT Summit VIII. Santiago de Compostela, Spain, 2001.
Lehmann, Sabine, Stephan Oepen, Sylvie Regnier-Prost, Klaus Netter, Veronika Lux, Judith Klein, Kirsten Falkedal, Frederik Fouvry, Dominique Estival, Eva Dauphin, Hervé Compagnion, Judith Baur, Lorna Balkan, and Doug Arnold. TSNLP - Test Suites for Natural Language Processing. In Proceedings of the 16th International Conference on Computational Linguistics, pages 711–716, 1996.
Peter, Jan-Thorsten, Andreas Guta, Nick Rossenbach, Miguel Graça, and Hermann Ney. The RWTH Aachen Machine Translation System for IWSLT 2016. In International Workshop on Spoken Language Translation, Seattle, USA, Dec. 2016.
Popovic, Maja. Hjerson: An Open Source Tool for Automatic Error Classification of Machine Translation Output. The Prague Bulletin of Mathematical Linguistics, 96:59–68, 10 2011.
Schottmüller, Nina and Joakim Nivre. Issues in Translating Verb-Particle Constructions from German to English. In Proc. of the 10th Workshop on Multiword Expressions (MWE), pages 124–131, Gothenburg, Sweden, April 2014. Association for Computational Linguistics.
Sennrich, Rico, Barry Haddow, and Alexandra Birch. Edinburgh Neural Machine Translation Systems for WMT 16. CoRR, abs/1606.02891, 2016.