Appraise: an Open-Source Toolkit for Manual Evaluation of MT Output

Open access

Appraise: an Open-Source Toolkit for Manual Evaluation of MT Output

We describe Appraise, an open-source toolkit supporting manual evaluation of machine translation output. The system allows to collect human judgments on translation output, implementing annotation tasks such as 1) quality checking, 2) translation ranking, 3) error classification, and 4) manual post-editing. It features an extensible, XML-based format for import/export and can easily be adapted to new annotation tasks. The current version of Appraise also includes automatic computation of inter-annotator agreements allowing quick access to evaluation results. Appraise is actively developed and used in several MT projects.

If the inline PDF is not rendering correctly, you can download the PDF file here.

  • Arcan Mihael Christian Federmann and Paul Buitelaar. Using Domain-specific and Collaborative Resources for Term Translation. In In Proceedings of the Sixth workshop on Syntax Structure and Semantics in Statistical Translation Jeju South Korea July 2012. Association for Computational Linguistics (ACL).

  • Avramidis Eleftherios Aljoscha Burchardt Christian Federmann Maja Popovic Cindy Tscherwinka and David Vilar Torres. Involving Language Professionals in the Evaluation of Machine Translation. In 8th ELRA Conference on Language Resources and Evaluation. European Language Resources Association (ELRA) 2012.

  • Bennett E. M. R. Alpert and A. C. Goldstein. Communications Through Limited-response Questioning. Public Opinion Quarterly 18(3):303-308 1954. doi: 10.1086/266520.

  • Bird Steven Ewan Klein and Edward Loper. Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit. O'Reilly Beijing 2009. ISBN 978-0-596-51649-9. doi: http://my.safaribooksonline.com/9780596516499 URL http://www.nltk.org/book. http://www.nltk.org/book

  • Bojar Ondrej Miloš Ercegovčević Martin Popel and Omar Zaidan. A Grain of Salt for the WMT Manual Evaluation. In Proceedings of the Sixth Workshop on Statistical Machine Translation pages 1-11 Edinburgh Scotland July 2011. Association for Computational Linguistics. URL http://www.aclweb.org/anthology/W11-2101. http://www.aclweb.org/anthology/W11-2101

  • Callison-Burch Chris Cameron Fordyce Philipp Koehn Christof Monz and Josh Schroeder. Further Meta-Evaluation of Machine Translation. In Proceedings of the Third Workshop on Statistical Machine Translation pages 70-106 Columbus Ohio June 2008. Association for Computational Linguistics. URL http://www.aclweb.org/anthology/W/W08/W08-0309. http://www.aclweb.org/anthology/W/W08/W08-0309

  • Callison-Burch Chris Philipp Koehn Christof Monz Matt Post Radu Soricut and Lucia Specia editors. Proceedings of the Seventh Workshop on Statistical Machine Translation. Association for Computational Linguistics Montréal Canada June 2012. URL http://www.aclweb.org/anthology/W12-31. http://www.aclweb.org/anthology/W12-31

  • Cohen J. A Coefficient of Agreement for Nominal Scales. Educational and Psychological Measurement 20(1):37-46 1960. ISSN 0013-1644.

  • Denkowski Michael and Alon Lavie. Meteor 1.3: Automatic Metric for Reliable Optimization and Evaluation of Machine Translation Systems. In Proceedings of the Sixth Workshop on Statistical Machine Translation pages 85-91 Edinburgh Scotland July 2011. Association for Computational Linguistics. URL http://www.aclweb.org/anthology-new/W/W11/W11-2107. http://www.aclweb.org/anthology-new/W/W11/W11-2107

  • Federmann Christian. Appraise: An Open-Source Toolkit for Manual Phrase-Based Evaluation of Translations. In Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10) Valetta Malta May 2010. URL http://www.lrec-conf.org/proceedings/lrec2010/pdf/197_Paper.pdf. http://www.lrec-conf.org/proceedings/lrec2010/pdf/197_Paper.pdf

  • Federmann Christian. Results from the ML4HMT Shared Task on Applying Machine Learning Techniques to Optimise the Division of Labour in Hybrid Machine Translation. In Proceedings of the International Workshop on Using Linguistic Information for Hybrid Machine Translation (LIHMT 2011) and of the Shared Task on Applying Machine Learning Techniques to Optimise the Division of Labour in Hybrid Machine Translation (ML4. META-NET 11 2011.

  • Federmann Christian. Can Machine Learning Algorithms Improve Phrase Selection in Hybrid Machine Translation? In Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics pages 113-118. Association for Computational Linguistics (ACL) European Chapter of the Association for Computational Linguistics (EACL) 4 2012.

  • Federmann Christian and Sabine Hunsicker. Stochastic Parse Tree Selection for an Existing RBMT System. In Proceedings of the Sixth Workshop on Statistical Machine Translation pages 351-357 Edinburgh Scotland July 2011. Association for Computational Linguistics. URL http://www.aclweb.org/anthology/W11-2141. http://www.aclweb.org/anthology/W11-2141

  • Federmann Christian Silke Theison Andreas Eisele Hans Uszkoreit Yu Chen Michael Jellinghaus and Sabine Hunsicker. Translation Combination using Factored Word Substitution. In Proceedings of the Fourth Workshop on Statistical Machine Translation pages 70-74 Athens Greece March 2009. Association for Computational Linguistics. URL http://www.aclweb.org/anthology/W/W09/W09-0x11. http://www.aclweb.org/anthology/W/W09/W09-0x11

  • Federmann Christian Andreas Eisele Yu Chen Sabine Hunsicker Jia Xu and Hans Uszkoreit. Further Experiments with Shallow Hybrid MT Systems. In Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR pages 77-81 Uppsala Sweden July 2010. Association for Computational Linguistics. URL http://www.aclweb.org/anthology/W10-1708. http://www.aclweb.org/anthology/W10-1708

  • Federmann Christian Eleftherios Avramidis Marta R. Costa-jussa Josef van Genabith Maite Melero and Pavel Pecina. The ML4HMT Workshop on Optimising the Division of Labour in Hybrid Machine Translation. In 8th ELRA Conference on Language Resources and Evaluation. European Language Resources Association (ELRA) 5 2012a.

  • Federmann Christian Maite Melero and Josef van Genabith. Towards Optimal Choice Selection for Improved Hybrid Machine Translation. The Prague Bulletin of Mathematical Linguistics 97:5-22 4 2012b.

  • Fleiss J. L. Measuring Nominal Scale Agreement among Many Raters. Psychological Bulletin 76 (5):378-382 1971.

  • Hunsicker Sabine Yu Chen and Christian Federmann. Machine Learning for Hybrid Machine Translation. In Proceedings of the Seventh Workshop on Statistical Machine Translation pages 312-316 Montréal Canada June 2012. Association for Computational Linguistics. URL http://www.aclweb.org/anthology/W12-3138. http://www.aclweb.org/anthology/W12-3138

  • Krippendorff Klaus. Reliability in Content Analysis. Some Common Misconceptions and Recommendations. Human Communication Research 30(3):411-433 2004.

  • Och Franz Josef. Minimum error rate training in statistical machine translation. In ACL '03: Proceedings of the 41st Annual Meeting on Association for Computational Linguistics pages 160-167 Morristown NJ USA 2003. Association for Computational Linguistics. doi: http://dx.doi.org/10.3115/1075096.1075117.

  • Papineni Kishore Salim Roukos Todd Ward and Wei-Jing Zhu. BLEU: A Method for Automatic Evaluation of Machine Translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics ACL '02 pages 311-318 Stroudsburg PA USA 2002. Association for Computational Linguistics. URL http://acl.ldc.upenn.edu/P/P02/P02-1040.pdf. http://acl.ldc.upenn.edu/P/P02/P02-1040.pdf

  • Scott William A. Reliability of Content Analysis: The Case of Nominal Scale Coding. The Public Opinion Quarterly 19(3):321-325 1955.

  • Vilar David Jia Xu Luis Fernando D'Haro and Hermann Ney. Error Analysis of Machine Translation Output. In International Conference on Language Resources and Evaluation pages 697-702 Genoa Italy may 2006.

Search
Journal information
Cited By
Metrics
All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 211 98 1
PDF Downloads 134 63 5