Improving Machine Translation through Linked Data

Ankit Srivastava 1 , Georg Rehm 1  and Felix Sasaki 1
  • 1 German Research Center for Artificial Intelligence (DFKI), Language Technology Lab, Berlin, Germany


With the ever increasing availability of linked multilingual lexical resources, there is a renewed interest in extending Natural Language Processing (NLP) applications so that they can make use of the vast set of lexical knowledge bases available in the Semantic Web. In the case of Machine Translation, MT systems can potentially benefit from such a resource. Unknown words and ambiguous translations are among the most common sources of error. In this paper, we attempt to minimise these types of errors by interfacing Statistical Machine Translation (SMT) models with Linked Open Data (LOD) resources such as DBpedia and BabelNet. We perform several experiments based on the SMT system Moses and evaluate multiple strategies for exploiting knowledge from multilingual linked data in automatically translating named entities. We conclude with an analysis of best practices for multilingual linked data sets in order to optimise their benefit to multilingual and cross-lingual applications.

If the inline PDF is not rendering correctly, you can download the PDF file here.

  • Arcan, Mihael, Marco Turchi, Sara Tonelli, and Paul Buitelaar. Enhancing Statistical Machine Translation with bilingual terminology in a CAT environment. In 11th Conference of the Association for Machine Translation in the Americas, pages 54–68, 2014.

  • Bojar, Ondrej, Christian Buck, Rajen Chatterjee, Christian Federmann, Liane Guillou, Barry Haddow, Matthias Huck, Antonio Jimeno Yepes, Auralie Navaol, Mariana Neves, Pavel Pecina, Martin Popel, Philipp Koehn, Christof Monz, Matteo Negri, Matt Post, Lucia Specia, Karin Verspoor, Jorg Tiedemann, and Marco Turchi, editors. Proceedings of the First Conference on Machine Translation. Association for Computational Linguistics, Berlin, Germany, August 2016. URL

  • Bouamor, Dhouha, Nasredine Semmar, and Pierre Zweigenbaum. Identifying bilingual Multi-Word Expressions for Statistical Machine Translation. In Calzolari, Nicoletta, Khalid Choukri, Thierry Declerck, Mehmet Ugur Dogan, Bente Maegaard, Joseph Mariani, Jan Odijk, and Stelios Piperidis, editors, Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC-2012), pages 674–679, Istanbul, Turkey, May 2012. European Language Resources Association (ELRA). ISBN 978-2-9517408-7-7. URL ACL Anthology Identifier: L12-1527.

  • Carpuat, Marine Jacinthe. Word Sense Disambiguation for Statistical Machine Translation. PhD thesis, 2008. AAI3350676.

  • Du, Jinhua, Andy Way, and Andrzej Zydron. Using BabelNet to Improve OOV Coverage in SMT. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), 2016.

  • Fiorelli, Manuel, Armando Stellato, John P. Mccrae, Philipp Cimiano, and Maria Teresa Pazienza. LIME: The Metadata Module for OntoLex. In Proceedings of the 12th European Semantic Web Conference on The Semantic Web. Latest Advances and New Domains - Volume 9088, pages 321–336, New York, NY, USA, 2015. Springer-Verlag New York, Inc. ISBN 978-3-319-18817-1. doi: 10.1007/978-3-319-18818-8_20. URL

  • Hokamp, Chris. Leveraging NLP Technologies and Linked Open Data to Create Better CAT Tools. In International Journal of Localisation, Vol 14, pages 14–18, 2014.

  • Hutchins, John. John W. Hutchins (Eds.), Early Years in Machine Translation, chapter The first decades of Machine Translation: overview, chronology, sources, pages 1–16. John Benjamins B. V., 2000.

  • Koehn, Philipp, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, Chris Dyer, Ondrej Bojar, Alexandra Constantin, and Evan Herbst. Moses: Open Source Toolkit for Statistical Machine Translation. In Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions, ACL ’07, pages 177–180, Stroudsburg, PA, USA, 2007. Association for Computational Linguistics. URL

  • McCrae, John and Philipp Cimiano. Mining Translations from the Web of Open Linked Data. In Proceedings of the Joint Workshop on NLP, LOD and SWAIE, pages 8–11, 2013.

  • Navigli, Roberto and Simone Paolo Ponzetto. BabelNet: The Automatic Construction, Evaluation and Application of a Wide-Coverage Multilingual Semantic Network. In Artificial Intelligence, pages 217–250, 2012.

  • Nebhi, Kamel, Luka Nerima, and Eric Wehrli. NERTIS - A Machine Translation Mashup System using Wikimeta and DBpedia. In Semantic Web (ESWC) 2013 Satellite Events, pages 312–318, 2013.

  • Papineni, Kishore, Salim Roukos, Todd Ward, and Wei-Jung Zhu. BLEU: A Method for Automatic Evaluation of Machine Translation. In 40th Annual Meeting of the Association for Computational Linguistics, pages 311–318, 2002.

  • Sennrich, Rico, Barry Haddow, and Alexandra Birch. Edinburgh Neural Machine Translation Systems for WMT 16. In Proceedings of the First Conference on Machine Translation, pages 371–376, Berlin, Germany, August 2016. Association for Computational Linguistics. URL

  • Snover, Matthew, Bonnie Dorr, Richard Schwartz, Linnea Micciula, and John Makhoul. A Study of Translation Edit Rate with targeted Human Annotation. In 7th Conference of the Association for Machine Translation in the Americas, pages 223–231, 2006.

  • Srivastava, Ankit, F. Sasaki, P. Bourgonje, J. Moreno-Schneider, J. Nehring, and G. Rehm. How To Configure Statistical Machine Translation with Linked Open Data Resources. In Proceedings of the 38th Annual Translating and Computer Conference, TC 38, 2016.

  • Steinberger, Ralf, Bruno Pouliquen, Mijail Kabadjov, Jenya Belyaeva, and Erik van der Goot. JRC-NAMES: A Freely Available, Highly Multilingual Named Entity Resource. In Proceedings of the International Conference Recent Advances in Natural Language Processing 2011, pages 104–110. Association for Computational Linguistics, 2011. URL


Journal + Issues