MT-ComparEval: Graphical evaluation interface for Machine Translation development

Open access

Abstract

The tool described in this article has been designed to help MT developers by implementing a web-based graphical user interface that allows to systematically compare and evaluate various MT engines/experiments using comparative analysis via automatic measures and statistics. The evaluation panel provides graphs, tests for statistical significance and n-gram statistics. We also present a demo server http://wmt.ufal.cz with WMT14 and WMT15 translations.

If the inline PDF is not rendering correctly, you can download the PDF file here.

  • Aziz Wilker S Castilho and Lucia Specia. PET: a Tool for Post-editing and Assessing Machine Translation. In Eighth International Conference on Language Resources and Evaluation pages 3982–3987 Istanbul Turkey 2012. URL http://wilkeraziz.github.io/dcs-site/publications/2012/AZIZ+LREC2012.pdf.

  • Berka Jan Ondřej Bojar Mark Fishel Maja Popović and Daniel Zeman. Automatic MT Error Analysis: Hjerson Helping Addicter. In Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC 2012) pages 2158–2163 İstanbul Turkey 2012. European Language Resources Association. ISBN 978-2-9517408-7-7.

  • Bojar Ondřej Christian Buck Christian Federmann Barry Haddow Philipp Koehn Johannes Leveling Christof Monz Pavel Pecina Matt Post Herve Saint-Amand Radu Soricut Lucia Specia and Aleš Tamchyna. Findings of the 2014 Workshop on Statistical Machine Translation. In Proceedings of the Ninth Workshop on Statistical Machine Translation pages 12–58 Baltimore USA 2014. ACL. URL http://www.aclweb.org/anthology/W/W14/W14-3302.

  • Bojar Ondřej Rajen Chatterjee Christian Federmann Barry Haddow Matthias Huck Chris Hokamp Philipp Koehn Varvara Logacheva Christof Monz Matteo Negri Matt Post Carolina Scarton Lucia Specia and Marco Turchi. Findings of the 2015 Workshop on Statistical Machine Translation. In Proceedings of the Tenth Workshop on Statistical Machine Translation pages 1–46 Lisboa Portugal September 2015. Association for Computational Linguistics. URL http://aclweb.org/anthology/W15-3001.

  • Callison-Burch Chris Cameron Fordyce Philipp Koehn Christof Monz and Josh Schroeder. (Meta-) Evaluation of Machine Translation. In Proceedings of the Second Workshop on Statistical Machine Translation pages 136–158 Prague Czech Republic June 2007. ACL.

  • Clark Jonathan H. Chris Dyer Alon Lavie and Noah A. Smith. Better Hypothesis Testing for Statistical Machine Translation: Controlling for Optimizer Instability. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies pages 176–181 Portland Oregon USA June 2011. Association for Computational Linguistics. URL http://www.aclweb.org/anthology/P11-2031.

  • Federmann Christian. Appraise: An Open-Source Toolkit for Manual Phrase-Based Evaluation of Translations. In LREC 2010 pages 1731–1734 Valletta Malta May 2010. European Language Resources Association (ELRA). ISBN 2-9517408-6-7. URL http://www.lrec-conf.org/proceedings/lrec2010/pdf/197_Paper.pdf.

  • Giménez J and L Màrquez. Asiya: An Open Toolkit for Automatic Machine Translation (Meta-)Evaluation. The Prague Bulletin of Mathematical Linguistics 2010. URL http://ufal.mff.cuni.cz/pbml/94/art-gimenez-marques-evaluation.pdf.

  • Girardi Christian Luisa Bentivogli Mohammad Amin Farajian and Marcello Federico. MTEQuAl: a Toolkit for Human Assessment of Machine Translation Output. In COLING 2014 pages 120–123 Dublin Ireland Aug. 2014. Dublin City University and ACL. URL http://www.aclweb.org/anthology/C14-2026.

  • Koehn Philipp. Statistical significance tests for machine translation evaluation. In Lin Dekang and Dekai Wu editors Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing pages 388–395 Barcelona Spain 2004. ACL.

  • Koehn Philipp. An Experimental Management System. The Prague Bulletin of Mathematical Linguistics 94:87–96 2010. doi: 10.2478/v10108-010-0023-5. URL http://ufal.mff.cuni.cz/pbml/94/art-koehn-ems.pdf.

  • Koehn Philipp Wade Shen Marcello Federico Nicola Bertoldi Chris Callison-Burch Brooke Cowan Chris Dyer Hieu Hoang Ondrej Bojar Richard Zens Alexandra Constantin Evan Herbst and Christine Moran. Open Source Toolkit for Statistical Machine Translation. In Proceedings of ACL pages 177–180 Prague Czech Republic June 2006.

  • Papineni Kishore Salim Roukos Todd Ward and Wei-Jing Zhu. BLEU: a method for automatic evaluation of machine translation. In Proc. of ACL pages 311–318 Stroudsburg PA USA 2002. ACL. URL http://dx.doi.org/10.3115/1073083.1073135.

  • Popović Maja. Hjerson: An Open Source Tool for Automatic Error Classification of Machine Translation Output. The Prague Bulletin of Mathematical Linguistics 96:59–68 2011. doi: 10.2478/v10108-011-0011-4. URL http://ufal.mff.cuni.cz/pbml/96/art-popovic.pdf.

  • Riezler Stefan and John T. Maxwell. On Some Pitfalls in Automatic Evaluation and Significance Testing for MT. In Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization pages 57–64 Ann Arbor Michigan June 2005. Association for Computational Linguistics. URL http://www.aclweb.org/anthology/W/W05/W05-0908.

  • Zeman Daniel Mark Fishel Jan Berka and Ondřej Bojar. Addicter: What Is Wrong with My Translations? The Prague Bulletin of Mathematical Linguistics 96:79–88 2011. ISSN 0032-6585.

Search
Journal information
Cited By
Metrics
All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 198 120 2
PDF Downloads 109 74 9