Improving English-Czech Tectogrammatical MT

Open access

Improving English-Czech Tectogrammatical MT

The present paper summarizes our recent results concerning English-Czech Machine Translation implemented in the TectoMT framework. The system uses tectogrammatical trees as the transfer medium. A detailed analysis of errors made by the previous version of the system (considered as the baseline) is presented first. Then several improvements of the system are described that led to better translation quality in terms of BLEU and NIST scores. The biggest performance gain comes from applying Hidden Tree Markov Model in the transfer phase, which is a novel technique in the field of Machine Translation.

If the inline PDF is not rendering correctly, you can download the PDF file here.

  • Bojar Ondrěj Silvie Cinková and Jan Ptáček. Towards English-to-Czech MT via Tectogrammatical Layer. Prague Bulletin of Mathematical Linguistics 90 2008a. ISSN 0032-6585.

  • Bojar Ondrěj Miroslav Janíček Zdeněk Žabokrtský Pavel &Česka and Peter Beňa. CzEng 0.7: Parallel Corpus with Community-Supplied Translations. In Proceedings of the Sixth International Language Resources and Evaluation (LREC'08) Marrakech Morocco May 2008b. ELRA.

  • Bojar Ondrěj David Mareček Václav Novák Martin Popel Jan Ptáček Jan Rouš and Zdeněk Žabokrtský. English-Czech MT in 2008. In Proceedings of the Fourth Workshop on Statistical Machine Translation pages 125-129 Athens Greece March 2009. Association for Computational Linguistics. URL×22.×22

  • Brown Peter E. Vincent J. Della Pietra Stephen A. Della Pietra and Robert L. Mercer. The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics 1993. URL

  • Cinková Silvie Jan Hajič Marie Mikulová Lucie Mladová Anja Nedolužko Petr Pajas Jarmila Panevová Jiři Semecký Jana Šindlerová Josef Toman Zdeňka Urešová and Zdeněk Žabokrtský. Annotation of English on the tectogrammatical level. Technical Report 35 ÚFAL MFF UK 2006.

  • Crouse Matthew Robert Nowak and Richard Baraniuk. Wavelet-Based Statistical Signal Processing Using Hidden Markov Models. IEEE Transactions on Signal Processing 46(4):886-902 1998.

  • Cušin Jan Martin Čmejrek Jiři Havelka Jan Hajič Vladislav Kuboň and Zdeněk Žabokrtský. Prague Czech-English Dependency Treebank Version 1.0. Linguistics Data Consortium Catalog No.: LDC2004T25 2004.

  • Diligenti Michelangelo Paolo Frasconi and Marco Gori. Hidden tree Markov models for document image classification. IEEE Transactions on Pattern Analysis and Machine Intelligence 25: 2003 2003.

  • Durand Jean-Baptiste Paulo Goncalvès and Yann Guédon. Computational methods for hidden Markov tree models - An application to wavelet trees. IEEE Transactions on Signal Processing 52(9):2551-2560 2004.

  • Hajič Jan. Disambiguation of Rich Inflection - Computational Morphology of Czech. Charles University - The Karolinum Press Prague 2004.

  • Hopkins Mark and Jonas Kuhn. Machine Translation as Tree Labeling. In Proceedings of SSST NAACL-HLT pages 41-48 2007.

  • Koehn Philipp and Christof Monz. Manual and automatic evaluation of machine translation between European languages. In Proceedings of the Workshop on Statistical Machine Translation pages 102-121 2006.

  • Manning Christopher D. and Hinrich Schütze. Foundations of Statistical Natural Language Processing. The MIT Press 1999.

  • McDonald Ryan Fernando Pereira Kiril Ribarov and Jan Hajič. Non-Projective Dependency Parsing using Spanning Tree Algorithms. In Proceedings of Human Langauge Technology Conference and Conference on Empirical Methods in Natural Language Processing (HTL/EMNLP) pages 523-530 Vancouver BC Canada 2005.

  • Popel Martin. Ways to Improve the Quality of English-Czech Machine Translation. Master's thesis ÚFAL MFF UK Prague Czech Republic 2009.

  • Rouš Jan. Probabilistic translation dictionary. Master's thesis ÚFAL MFF UK Prague Czech Republic 2009.

  • Simard Michel Nicola Cancedda Bruno Cavestro Marc Dymetman Eric Gaussier Cyril Goutte Kenji Yamada Philippe Langlais and Arne Mauser. Translating with noncontiguous phrases. In Proceedings of HLT-EMNLP pages 755-762 October 2005.

  • Skounakis Marios Mark Craven and Soumya Ray. Hierarchical Hidden Markov Models for Information Extraction. In International Joint Conference on Artificial Intelligence volume 18 pages 427-433. Morgan Kaufmann 2003.

  • Spoustová Drahomíra Jan Hajič Jan Votrubec Pavel Krbec and Pavel Květoň. The Best of Two Worlds: Cooperation of Statistical and Rule-Based Taggers for Czech. In Proceedings of the Workshop on Balto-Slavonic Natural Language Processing ACL 2007 pages 67-74 Praha 2007.

  • Vilar David Jia Xu Luis Fernando D'Haro and Hermann Ney. Error Analysis of Machine Translation Output. In Proceedings of the Fifth International Language Resources and Evaluation (LREC'06) pages 697-702 Genoa Italy May 2006.

  • Žabokrtský Zdeněk and Martin Popel. Hidden Markov Tree Model in Dependency-based Machine Translation. In Proceedings of the 47th Annual Meeting of the Association for Computational Linguistics August 2009.

  • Žabokrtský Zdeněk Jan Ptácěk and Petr Pajas. TectoMT: Highly Modular MT System with Tectogrammatics Used as Transfer Layer. In Proceedings of the 3rd Workshop on Statistical Machine Translation ACL 2008.

Journal information
All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 92 26 0
PDF Downloads 85 40 1