pycdec: A Python Interface to cdec

Open access

pycdec: A Python Interface to cdec

This paper describes pycdec, a Python module for the cdec decoder. It enables Python code to use cdec's fast C++ implementation of core finite-state and context-free inference algorithms for decoding and alignment. The high-level interface allows developers to build integrated MT applications that take advantage of the rich Python ecosystem without sacrificing computational performance. We give examples of how to interact directly with the main cdec data structures (lattices, hypergraphs, sparse feature vectors), evaluate translation quality, and use the suffix-array grammar extraction code. This permits rapid prototyping of new algorithms for training, data visualization, and utilizing MT and related structured prediction tasks.

If the inline PDF is not rendering correctly, you can download the PDF file here.

  • Bazrafshan M. T. Chung and D. Gildea. Tuning as linear regression. In Proc. of NAACL-HLT pages 543-547. Association for Computational Linguistics 2012.

  • Behnel S. R. Bradshaw C. Citro L. Dalcin D. S. Seljebotn and K. Smith. Cython: The best of both worlds. Computing in Science Engineering 13(2):31-39 March-April 2011.

  • Bird S. E. Klein and E. Loper. Natural language processing with Python. O'Reilly Media 2009. URL http://nltk.org. http://nltk.org

  • Chiang D. Y. Marton and P. Resnik. Online large-margin training of syntactic and structural translation features. In Proceedings of the Conference on Empirical Methods in Natural Language Processing pages 224-233. Association for Computational Linguistics 2008.

  • Clark J. H. J. Weese B. G. Ahn A. Zollmann Q. Gao K. Heafield and A. Lavie. The machine translation toolpack for LoonyBin: Automated management of experimental machine translation hyperworkflows. The Prague Bulletin of Mathematical Linguistics 93:117-126 2010.

  • Dyer C. J. Weese H. Setiawan A. Lopez F. Ture V. Eidelman J. Ganitkevitch P. Blunsom and P. Resnik. cdec: A decoder alignment and learning framework for finite-state and contextfree translation models. In Proc. of the ACL (Demonstration track) pages 7-12. Association for Computational Linguistics 2010.

  • Ellson J. E. R. Gansner E. Koutsofios S. C. North and G. Woodhull. Graphviz and Dynagraph - static and dynamic graph drawing tools. In Junger M. and P. Mutzel editors Graph Drawing Software pages 127-148. Springer-Verlag 2003. URL http://graphviz.org. http://graphviz.org

  • Federmann C. and A. Eisele. MT server land: An open-source MT architecture. The Prague Bulletin of Mathematical Linguistics 94:57-66 2010.

  • Gimpel K. and N. A. Smith. Structured ramp loss minimization for machine translation. In Proceedings of NAACL 2012.

  • Hopkins M. and J. May. Tuning as ranking. In Proc. of EMNLP pages 1352-1362. Association for Computational Linguistics 2011.

  • Koehn P. An experimental management system. The Prague Bulletin of Mathematical Linguistics 94:87-96 2010.

  • Lopez A. Hierarchical phrase-based translation with suffix arrays. In Proc. of EMNLP-CoNLL pages 976-985 2007.

  • Lopez A. Tera-scale translation models via pattern matching. In Proc. COLING pages 505-512 2008.

  • Pedregosa F. G. Varoquaux A. Gramfort V. Michel B. Thirion O. Grisel M. Blondel P. Prettenhofer R. Weiss V. Dubourg J. Vanderplas A. Passos D. Cournapeau M. Brucher M. Perrot and E. Duchesnay. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research 12:2825-2830 2011. URL http://scikit-learn.org. http://scikit-learn.org

  • Pérez F. and B. E. Granger. IPython: a system for interactive scientific computing. Comput. Sci. Eng. 9(3):21-29 2007. URL http://ipython.org. http://ipython.org

  • Weese J. and C. Callison-Burch. Visualizing data structures in parsing-based machine translation. The Prague Bulletin of Mathematical Linguistics 93:127-136 2010.

Search
Journal information
Cited By
Metrics
All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 161 71 2
PDF Downloads 93 45 2