Qualitative: Python Tool for MT Quality Estimation Supporting Server Mode and Hybrid MT

Open access

Abstract

We are presenting the development contributions of the last two years to our Python opensource Quality Estimation tool, a tool that can function in both experiment-mode and online web-service mode. The latest version provides a new MT interface, which communicates with SMT and rule-based translation engines and supports on-the-fly sentence selection. Additionally, we present an improved Machine Learning interface allowing more efficient communication with several state-of-the-art toolkits. Additions also include a more informative training process, a Python re-implementation of QuEst baseline features, a new LM toolkit integration, an additional PCFG parser and alignments of syntactic nodes.

If the inline PDF is not rendering correctly, you can download the PDF file here.

  • Alonso Juan A and Gregor Thurmair. The Comprendium Translator system. In Proceedings of the Ninth Machine Translation Summit. International Association for Machine Translation (IAMT) 2003.

  • Avramidis Eleftherios. Comparative Quality Estimation: Automatic Sentence-Level Ranking of Multiple Machine Translation Outputs. In Proceedings of 24th International Conference on Computational Linguistics pages 115-132 Mumbai India dec 2012a. The COLING 2012 Organizing Committee. URL http://www.aclweb.org/anthology/C12-1008.

  • Avramidis Eleftherios. Quality estimation for Machine Translation output using linguistic analysis and decoding features. In Proceedings of the Seventh Workshop on Statistical Machine Translation pages 84-90 Montréal Canada jun 2012b. Association for Computational Linguistics. URL http://www.aclweb.org/anthology/W12-3108.

  • Avramidis Eleftherios. Interoperability in MT Quality Estimation or wrapping useful stuff in various ways. In Proceedings of the LREC 2016 Workshop on Translation Evaluation: From Fragmented Tools and Data Sets to an Integrated Ecosystem pages 1-6. Language Science Press 2016. URL http://www.cracking-the-language-barrier.eu/wp-content/uploads/Avramidis.pdf.

  • Avramidis Eleftherios and Maja Popovic. Correlating decoding events with errors in Statistical Machine Translation. In Sangal Rajeev Jyoti Pawar and Dipti Misra Sharma editors Proceedings of the 11th International Conference on Natural Language Processing. International Conference on Natural Language Processing (ICON-2014) 11th International Conference on Natural Language Processing December 18-21 Goa India. International Institute of Information Technology Natural Language Processing Association India 2014. URL https://www.dfki.de/lt/publication{\_}show.php?id=7605.

  • Avramidis Eleftherios Lukas Poustka and Sven Schmeier. Qualitative: Open source Python tool for Quality Estimation over multiple Machine Translation outputs. The Prague Bulletin of Mathematical Linguistics 102:5-16 2014. URL http://ufal.mff.cuni.cz/pbml/102/art-avramidis-poustka-schmeier.pdf.

  • Avramidis Eleftherios Maja Popovic and Aljoscha Burchardt. DFKI’s experimental hybrid MT system for WMT 2015. In Proceedings of the Tenth Workshop on Statistical Machine Translation. Workshop on Statistical Machine Translation (WMT-2015) 10th September 17-18 Lisbon Portugal pages 66-73. Association for Computational Linguistics 2015. URL http://aclweb.org/anthology/W15-3004.

  • Avramidis Eleftherios Burchardt Aljoscha Vivien Macketanz and Ankit Srivastava. DFKI’s system for WMT16 IT-domain task including analysis of systematic errors. In Proceedings of the First Conference on Machine Translation pages 415-422 Berlin Germany aug 2016. Association for Computational Linguistics. URL http://www.aclweb.org/anthology/W/W16/W16-2329.

  • Banerjee Somnath and Alon Lavie. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization. Association for Computational Linguistics 2005.

  • Bojar Ondřej Rajen Chatterjee Christian Federmann Barry Haddow Matthias Huck Chris Hokamp Philipp Koehn Varvara Logacheva Christof Monz Matteo Negri Matt Post Carolina Scarton Lucia Specia and Marco Turchi. Findings of the 2015 Workshop on Statistical Machine Translation. In Proceedings of the Tenth Workshop on Statistical Machine Translation pages 1-46 Lisbon Portugal sep 2015. Association for Computational Linguistics. URL http://aclweb.org/anthology/W15-3001.

  • Brown Peter F Vincent J Della Pietra Stephen A Della Pietra and Robert L Mercer. The mathematics of statistical machine translation: parameter estimation. Comput. Linguist. 19(2): 263-311 1993. ISSN 0891-2017.

  • Callison-Burch Chris Cameron Fordyce Philipp Koehn Christof Monz and Josh Schroeder. (Meta-) evaluation of machine translation. In Proceedings of the Second Workshop on Statistical Machine Translation (StatMT’07) pages 136-158 Prague Czech Republic jun 2007. Association for Computational Linguistics. doi:

    • Crossref
    • Export Citation
  • Callison-Burch Chris Cameron Fordyce Philipp Koehn Christof Monz and Josh Schroeder. Further Meta-Evaluation of Machine Translation. In Proceedings of the Third Workshop on Statistical Machine Translation pages 70-106 Columbus Ohio jun 2008. Association for Computational Linguistics. URL www.aclweb.org/anthology/W08-0309.

  • Callison-Burch Chris Philipp Koehn Christof Monz Matt Post Radu Soricut and Lucia Specia. Findings of the 2012 Workshop on Statistical Machine Translation. In Proceedings of the Seventh Workshop on Statistical Machine Translation pages 10-51 Montréal Canada jun 2012. Association for Computational Linguistics. URL http://www.aclweb.org/anthology/W12-3102.

  • Cao Zhe Tao Qin Tie-Yan Liu Ming-Feng Tsai and Hang Li. Learning to rank: from pairwise approach to listwise approach. In Proceedings of the 24th international conference on Machine learning pages 129-136. ACM 2007.

  • Demšar Janez Blaž Zupan Gregor Leban and Tomaz Curk. Orange: From Experimental Machine Learning to Interactive Data Mining. In Principles of Data Mining and Knowledge Discovery pages 537-539 2004. doi:

    • Crossref
    • Export Citation
  • Heafield Kenneth. KenLM : Faster and Smaller Language Model Queries. In Proceedings of the Sixth Workshop on Statistical Machine Translation number 2009 pages 187-197 Edinburgh Scotland jul 2011. Association for Computational Linguistics. URL http://www.aclweb.org/anthology/W11-2123$\backslash$nhttp://kheafield.com/code/kenlm.

  • Koehn Philipp Wade Shen Marcello Federico Nicola Bertoldi Chris Callison-Burch Brooke Cowan Chris Dyer Hieu Hoang Ondrej Bojar Richard Zens Alexandra Constantin Evan Herbst and Christine Moran. Open Source Toolkit for Statistical Machine Translation. In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics (ACL) pages 177-180 Prague Czech Republic jun 2006.

  • Logacheva Varvara Chris Hokamp and Lucia Specia. MARMOT: A Toolkit for Translation Quality Estimation at the Word Level. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016) Paris France may 2016. European Language Resources Association (ELRA). ISBN 978-2-9517408-9-1.

  • Papineni Kishore Salim Roukos Todd Ward and Wei-Jing Zhu. BLEU: a Method for Automatic Evaluation of Machine Translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics pages 311-318 Philadelphia Pennsylvania USA jul 2002. Association for Computational Linguistics. doi:

    • Crossref
    • Export Citation
  • Pedregosa F G Varoquaux A Gramfort V Michel B Thirion O Grisel M Blondel P Prettenhofer R Weiss V Dubourg J Vanderplas A Passos D Cournapeau M Brucher M Perrot and E Duchesnay. Scikit-learn: Machine Learning in {P}ython. Journal of Machine Learning Research 12:2825-2830 2011.

  • Petrov Slav Leon Barrett Romain Thibaux and Dan Klein. Learning Accurate Compact and Interpretable Tree Annotation. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics pages 433-440 Sydney Australia jul 2006. Association for Computational Linguistics.

  • Popović Maja. Hjerson: An Open Source Tool for Automatic Error Classification of Machine Translation Output. The Prague Bulletin of Mathematical Linguistics 96(-1):59-68 2011. doi:

    • Crossref
    • Export Citation
  • Rückstieß Thomas and Jürgen Schmidhuber. A Python Experiment Suite. The Python Papers 6 (1):2 2011.

  • Schmid Helmut. Probabilistic part-of-speech tagging using decision trees. In Proceedings of International Conference on New Methods in Language Processing Manchester UK 1994.

  • Schmid Helmut. Efficient Parsing of Highly Ambiguous Context-free Grammars with Bit Vectors. In Proceedings of the 20th International Conference on Computational Linguistics COLING ’04 Stroudsburg PA USA 2004. Association for Computational Linguistics. doi:

    • Crossref
    • Export Citation
  • Servan Christophe Ngoc-Tien Le Ngoc Quang Luong Benjamin Lecouteux and Laurent Besacier. An Open Source Toolkit for Word-level Confidence Estimation in Machine Translation. In The 12th International Workshop on Spoken Language Translation (IWSLT’15) Da Nang Vietnam dec 2015. URL https://hal.archives-ouvertes.fr/hal-01244477.

  • Specia Lucia Kashif Shah José Guilherme Camargo de Souza and Trevor Cohn. QuEst - A translation quality estimation framework. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics: System Demonstrations pages 79-84 Sofia Bulgaria aug 2013. Association for Computational Linguistics. URL http://www.aclweb.org/anthology/P13-4014.

  • Specia Lucia Gustavo Paetzold and Carolina Scarton. Multi-level Translation Quality Prediction with QuEst++. In Proceedings of ACL-IJCNLP 2015 System Demonstrations pages 115-120 Beijing China jul 2015. Association for Computational Linguistics and The Asian Federation of Natural Language Processing. URL http://www.aclweb.org/anthology/P15-4020.

  • Tamchyna Aleš Ondřej Dušek Rudolf Rosa and Pavel Pecina. MTMonkey: A Scalable Infrastructure for a Machine Translation Web Service. The Prague Bulletin of Mathematical Linguistics 100:31-40 oct 2013. ISSN 0032-6585. URL http://ufal.mff.cuni.cz/pbml/100/art-tamchyna-dusek-rosa-pecina.pdf.

  • van Cranenburgh Andreas. Enriching Data-Oriented Parsing by blending morphology and syntax. Technical report University of Amsterdam Amsterdam 2010. URL https://unstable.nl/andreas/ai/coglang/report.pdf.

  • van Cranenburgh Andreas Galit W Sassoon and Raquel Fernández. Invented antonyms: Esperanto as a semantic lab. In Proceedings of the 26th Annual Meeting of the Israel Association for Theoretical Linguistics (IATL 26). volume 26 2010.

  • Wang Wei and Knight Kevin and Marcu Daniel. Capitalizing machine translation. In Proceedings of the Human Language Technology Conference of the NAACL Main Conferenc pages 1-8 New York 2006. URL http://dl.acm.org/citation.cfm?id=1220836.

  • Weissenborn Dirk Leonhard Hennig Feiyu Xu and Hans Uszkoreit. Multi-Objective Optimization for the Joint Disambiguation of Nouns and Named Entities. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing pages 596-605 Beijing China 2015. Association for Computer Linguistics. ISBN 978-1-941643-72-3. URL http://aclweb.org/anthology/P/P15/P15-1058.pdf.

Search
Journal information
Cited By
Metrics
All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 192 112 4
PDF Downloads 101 70 2