Better Splitting Algorithms for Parallel Corpus Processing

Open access

Better Splitting Algorithms for Parallel Corpus Processing

Each iteration of minimum error rate training involves re-translating a development set. Distributing this work across computational nodes can speed up translation time, but in practice some parts may take much longer to complete than others, leading to computational slack time. To address this problem, we develop three novel algorithms for distributing translation tasks in a parallel computing environment, drawing on research in parallel machine scheduling. We present results showing a substantial speedup in overall decoding time.

If the inline PDF is not rendering correctly, you can download the PDF file here.

  • Cheng T. C. E. and C. C. S. Sin. A state-of-the-art review of parallel-machine scheduling research. European Journal of Operational Research 47:271-292 1999.

  • De Prabuddha and Thomas E. Morton. Scheduling to minimum makespan on unequal parallel processors. Decision Sciences 11(4):586-602 October 1980.

  • Graham Ron L. Bounds on certain multiprocessing timing anomalies. The Bell Systems Technical Journal 45(9):1563-1581 November 1966.

  • Graham Ron L. Bounds on certain multiprocessing timing anomalies. SIAM Journal of Applied Mathematics 17(2):416-429 March 1969.

  • Hu T. C. Parallel sequencing and assembly line problems. Operations Research 9(6):841-848 1961.

  • Koehn Philipp Hieu Hoang Alexandra Birch Chris Callison-Burch Marcello Federico Nicola Bertoldi Brooke Cowan Wade Shen Christine Moran Richard Zens Chris Dyer Ondřej Bojar Alexandra Constantin and Evan Herbst. Moses: Open source toolkit for statistical machine translation. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Demo and Poster Sessions pages 177-180 Prague Czech Republic June 2007.

  • Och Franz. Minimum error rate training in statistical machine translation. In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics pages 160-167 Sapporo Japan July 2003.

  • Och Franz and Hermann Ney. Discriminative training and maximum entropy models for statistical machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics pages 295-302 Philadelphia Pennsylvania July 2002.

  • Panwalkar S. S. and Wafik Iskander. A survey of scheduling rules. Operations Research 25(1): 45-61 1977.

  • Papineni Kishore Salim Roukos Todd Ward and Wei-Jing Zhu. BLEU: a method for automatic evaluation of machine translation. In Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics pages 311-318 Toulouse France 2001.

  • Schwartz Lane Chris Callison-Burch William Schuler and Stephen Wu. Incremental syntactic language models for phrase-based translation. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Techologies pages 620-631 Portland Oregon USA June 2011.

Search
Journal information
Metrics
All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 107 50 2
PDF Downloads 77 39 0