We describe a collection of open source tools for learning tree-to-string and tree-to-tree transducers and the extensions to the cdec decoder that enable translation with these. Our modular, easy-to-extend tools extract rules from trees or forests aligned to strings and trees subject to different structural constraints. A fast, multithreaded implementation of the Cohn and Blunsom (2009) model for extracting compact tree-to-string rules is also included. The implementation of the tree composition algorithm used by cdec is described, and translation quality and decoding time results are presented. Our experimental results add to the body of evidence suggesting that tree transducers are a compelling option for translation, particularly when decoding speed and translation model size are important.
Cohn, Trevor and Phil Blunsom. A bayesian model of syntax-directed tree to string grammar induction. In Proc. of EMNLP, 2009.
Dyer, Chris, Adam Lopez, Juri Ganitkevitch, Johnathan Weese, Ferhan Ture, Phil Blunsom, Hendra Setiawan, Vladimir Eidelman, and Philip Resnik. cdec: A decoder, alignment, and learning framework for finite-state and context-free translation models. In Proc. of ACL, 2010.
Dyer, Chris, Victor Chahuneau, and Noah A. Smith. A simple, fast, and effective reparameter-ization of IBM model 2. In Proc. of NAACL, 2013.
Galley, Michel, Mark Hopkins, Kevin Knight, and Daniel Marcu. What's in a translation rule? In HLT-NAACL, 2004.
Galley, Michel, Jonathan Graehl, Kevin Knight, Daniel Marcu, Steve DeNeefe, Wei Wang, and Ignacio Thayer. Scalable inference and training of context-rich syntactic translation models. In Proc. of NAACL, 2006.
Graehl, Jonathan, Kevin Knight, and Jonathan May. Training tree transducers. Computational Linguistics, 34(3), 2008.
Hanneman, Greg, Michelle Burroughs, and Alon Lavie. A general-purpose rule extractor for SCFG-based machine translation. In Proc. of SSST, 2011.
Huang, Liang, Kevin Knight, and Aravind Joshi. Statistical syntax-directed translation with extended domain of locality. In Proc. of AMTA, 2006.
Klein, Dan and Christopher D. Manning. Parsing and hypergraphs. In Proc. of IWPT, 2001.
Rounds, William C. Mappings and grammars on trees. Mathematical Systems Theory, 4(3):257– 287, 1970.
Teh, Yee Whye. Dirichlet process. In Encyclopedia of Machine Learning, pages 280–287. 2010.
Thatcher, James W. Generalized sequential machine maps. Journal of Computer and System Sciences, 4:339–367, 1970.