Learning Morphological Normalization for Translation from and into Morphologically Rich Languages

Franck Burlot 1  und François Yvon 1
  • 1 LIMSI, CNRS, Université Paris-Saclay, France


When translating between a morphologically rich language (MRL) and English, word forms in the MRL often encode grammatical information that is irrelevant with respect to English, leading to data sparsity issues. This problem can be mitigated by removing from the MRL irrelevant information through normalization. Such preprocessing is usually performed in a deterministic fashion, using hand-crafted rules and yielding suboptimal representations. We introduce here a simple way to automatically compute an appropriate normalization of the MRL and show that it can improve machine translation in both directions.

