Machine Translation: Phrase-Based, Rule-Based and Neural Approaches with Linguistic Evaluation

Vivien Macketanz 1 , Eleftherios Avramidis 1 , Aljoscha Burchardt 1 , Jindrich Helcl 2  and Ankit Srivastava 1
  • 1 German Research Center for Artificial Intelligence (DFKI), Berlin, Germany
  • 2 Institute of Formal and Applied Linguistics

Abstract

In this article we present a novel linguistically driven evaluation method and apply it to the main approaches of Machine Translation (Rule-based, Phrase-based, Neural) to gain insights into their strengths and weaknesses in much more detail than provided by current evaluation schemes. Translating between two languages requires substantial modelling of knowledge about the two languages, about translation, and about the world. Using English-German IT-domain translation as a case-study, we also enhance the Phrase-based system by exploiting parallel treebanks for syntax-aware phrase extraction and by interfacing with Linked Open Data (LOD) for extracting named entity translations in a post decoding framework.

If the inline PDF is not rendering correctly, you can download the PDF file here.

  • 1. Burchardt, A., K. Harris, G. Rehm, H. Uszkoreit. Towards a Systematic and Human-Informed Paradigm for High-Quality Machine Translation. – In: Proc. of LREC 2016 Workshop Translation Evaluation: From Fragmented Tools and Data Sets to an Integrated Ecosystem (LREC’16), Located at International Co, 2016.

  • 2. Avramidis, E., A. Burchardt, V. Macketanz, A. Srivastava. DFKI’s System for WMT16 IT-Domain Task, Including Analysis of Systematic Errors. – In: Proc. of 1st Conference on Machine Translation, 2016, pp. 415-422.

  • 3. Guillou, L., C. Hardmeier. PROTEST: A Test Suite for Evaluating Pronouns in Machine Translation. – In: 10th International Conference on Language Resources and Evaluation, Portorož, Slovenia, 2016.

  • 4. Schottmüller, N., J. Nivre. Issues in Translating Verb-Particle Constructions from German to English. – In: 10th Workshop on Multiword Expressions, Gothenburg, Sweden, 2014, pp. 124-131.

  • 5. Guillou, L., C. Hardmeier, P. Nakov, S. Stymne, J. Tiedemann, Y. Verslay, M. Cettolo, B. Webber, A. Popescu-Belis. Findings of the 2016 WMT Shared Task on Cross-Lingual Pronoun Prediction. – In: Proc. of 1st Conference on Machine Translation, 2016, Berlin, Germany, pp. 525-542.

  • 6. Steedman, M. Romantics and Revolutionaries. – Linguistic Issues in Language Technology, Vol. 6, 2011, No 11, pp. 1-20.

  • 7. Chiang, D. A Hierarchical Phrase-Based Model for Statistical Machine Translation. – In: Proc. of 45th Annual Meeting of the Association for Computational Linguistics (ACL’05), Ann Arbor, Michigan, 2005, pp. 263-270.

  • 8. Quirk, C., A. Menezes, C. Cherry. Dependency Treelet Translation: Syntactically-Informed Phrasal SMT. – In: Proc. of 45th Annual Meeting of the Association for Computational Linguistics (ACL’05), Ann Arbor, Michigan, 2005, pp. 271-279.

  • 9. Galley, M., et al. Scalable Inference and Training of Context-Rich Syntactic Models. – In: Proc. of 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics (COLING-ACL’06), Sydney, Australia, 2006, pp. 961-968.

  • 10. Tinsley, J., M. Hearne, A. Way. Exploiting Parallel Treebanks to Improve Phrase-Based Statistical Machine Translation. – In: Proc. of 6th International Workshop on Treebanks and Linguistic Theories (TLT’07), Bergen, Norway, 2007, pp. 175-187.

  • 11. Hearne, M., S. Ozdowska, J. Tinsley. Comparing Constituency and Dependency Representations for SMT Phrase-Extraction. – In: 15ème Conférence sur le Traitement Automatique des Langues Naturelles (TALN’08), Avignon, France, 2008.

  • 12. Srivastava, A. K., A. Way. Using Percolated Dependencies for Phrase Extraction in SMT. – In: Proc. of Machine Translation Summit XII, Ottawa, Canada, 2009, pp. 316-323.

  • 13. McCrae, J. P., P. Cimiano. Mining Translations from the Web of Open Linked Data. – In: Proc. of Joint Workshop on NLP, LOD and SWAIE, Hissar, Bulgaria, 2013, pp. 8-11.

  • 14. Du, J., A. Way, A. Zydron. Using BabelNet to Improve OOV Coverage in SMT. – In: Proc. of 10th International Conference on Language Resources and Evaluation (LREC’16), Portoroz, Slovenia, 2016.

  • 15. Bojar, O., et al. Findings of the 2013 Workshop on Statistical Machine Translation. – In: 8th Workshop on Statistical Machine Translation, 2013.

  • 16. Nadejde, M., P. Williams, P. Koehn. Edinburgh’s Syntax-Based Machine Translation Systems. – In: Proc. of 8th Workshop on Statistical Machine Translation, 2013, pp. 170-176.

  • 17. Durrani, N., B. Haddow, K. Heafield, P. Koehn. Edinburgh’s Machine Translation Systems for European Language Pairs. – In: Proc. of 8th Workshop on Statistical Machine Translation, 2013, pp. 114-121.

  • 18. Koehn, P. Europarl: A Parallel Corpus for Statistical Machine Translation. – In: Proc. of 10th Machine Translation Summit, Vol. 5, 2005, pp. 79-86.

  • 19. Eisele, A., Y. Chen. MultiUN: A Multilingual Corpus from United Nation Documents. – In: Proc. of 7th Conference on International Language Resources and Evaluation (LREC’10), 19-21 May 2010, La Valletta, Malta, pp. 2868-2872.

  • 20. Buck, C., K. Heafield, B. Van Ooyen. N-Gram Counts and Language Models from the Common Crawl. – In: Proc. of Language Resources and Evaluation Conference, 2014.

  • 21. Tiedemann, J. News from OPUS – A Collection of Multilingual Parallel Corpora with Tools and Interfaces. – In: Advances in Natural Language Processing. Vol. V. N. Nicolov, K. Bontcheva, G. Angelova, R. Mitkov, Eds. Borovets, Bulgaria. Amsterdam/Philadelphia, John Benjamins, 2009, pp. 237-248.

  • 22. Alonso, J. A., G. Thurmair. The Comprendium Translator System. – In: Proc. of 9th Machine Translation Summit, 2003.

  • 23. Bahdanau, D., K. Cho, Y. Bengio. Neural Machine Translation by Jointly Learning to Align and Translate. – In: 3rd International Conference on Learning Representations, 2015.

  • 24. Cho, K., B. Van Merrienboer, D. Bahdanau, Y. Bengio. On the Properties of Neural Machine Translation: Encoder-Decoder Approaches. – In: Proc. of SSST-8, 8th Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 2014, pp. 103-111.

  • 25. Sennrich, R., B. Haddow, A. Birch. Neural Machine Translation of Rare Words with Subword Units. – CoRR, Vol. abs/1508.0, 2015.

  • 26. Helcl, J., J. Libovický. Neural Monkey: An Open-Source Tool for Sequence Learning. – Prague Bulleting of Mathematical Linguistics, Vol. 107, 2017, pp. 5-17.

  • 27. Abadi, M., et al. Tensorflow: Large-Scale Machine Learning on Heterogeneous Distributed Systems. arXiv Preprint arXiv:1603. 04467, 2016.

  • 28. Koehn, P., F. J. Och, D. Marcu. Statistical Phrase-Based Translation. – In: Proc. of 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, 2003, pp. 48-54.

  • 29. Petrov, S., D. Klein. Improved Inference for Unlexicalized Parsing. – In: Proc. of Annual Conference of the North American Chapter of the Association for Computational Linguistics, Rochester, New York, 2007, pp. 404-411.

  • 30. Zhechev, V. Unsupervised Generation of Parallel Treebank through Sub-Tree Alignment. – Prague Bulletin of Mathematical Linguistics, Vol. 91, 2009, pp. 89-98.

  • 31. Srivastava, A. K. Phrase Extraction and Rescoring in Statistical Machine Translation. Dublin City University, 2014.

  • 32. Srivastava, A. K., F. Sasaki, P. Bourgonje, J. M. Schneider, J. Nehring, G. Rehm. How to Configure Statistical Machine Translation for Linked Open Data. – In: Proc. of 38th Annual Conference on Translating and Computer, London, United Kingdom, 2016, pp. 138-148.

  • 33. Avramidis, E., V. Macketanz, A. Burchardt, J. Helcl, H. Uszkoreit. Deeper Machine Translation and Evaluation for German. – In: Proc. of 2nd Deep Machine Translation Workshop. Deep Machine Translation Workshop (DMTW’16), 21 October 2016, Lisbon, Portugal, pp. 29-38.

OPEN ACCESS

Journal + Issues

Search