Average Word Length from the Diachronic Perspective: The Case of Arabic

Open access


Previous studies based on English, Russian and Ch inese corpora show that the average word length in texts grows steadily across centuries. These findings are in accordance with our results: the average word length in Arabic texts also grows during the analysed time span (8th century to the first half of the 20th century). Our paper shows the detailed statistics of the word length distribution century by century. The dynamics of the average word length correlates with the dynamics of the average word distribution entropy, which encourages an explanation of the phenomenon based on the Shannonian theory of communication.

If the inline PDF is not rendering correctly, you can download the PDF file here.

  • [1] Mendenhall T.C. 1887. The characteristic curves of composition. Science 9(214) 237–249.

  • [2] Elderton W.P. 1949. A few statistics on the length of English words. Journal of the Royal Statistical Society. Series A (General) 112(4) 436–445.

  • [3] Chebanow S.G. 1947. On conformity of language structures within the Indo-European family to Poisson’s law. Comptes rendus de l’Academie de science de l’URSS 55(2) 99–102.

  • [4] Grzybek P. 2007. History and methodology of word length studies. In Grzybek P. (Ed.) Contributions to the Science of Text and Language. Dordrecht: Springer pp. 15–90.

  • [5] Zipf G.K. 1935. The Psycho-Biology of Language (Vol. ix). Oxford England: Houghton Mifflin.

  • [6] Menzerath P. 1928. Über einige phonetische Probleme. In Actes du premier Congres international de linguistes. Leiden: Sijthoff.

  • [7] Altmann G. 1980. Prolegomena to Menzerath’s law. Glottometrika 2(2) 1–10.

  • [8] Altmann G. 2014. Bibliography: Menzerath’s law. Glottotheory 5(2) 121–123.

  • [9] Bochkarev V.V. Shevlyakova A.V. Solovyev V.D. 2015. The average word length dynamics as an indicator of cultural changes in society. Social Evolution & History 14(2) 153–175.

  • [10] Chen H. Liu H. 2014. A diachronic study of Chinese word length distribution. Glottometrics 29 81–94.

  • [11] Chen H. Liang J. Liu H. 2015. How does word length evolve in written Chinese? PLoS One 10(9) e0138567.

  • [12] Zemánek P. Milička J. 2014. Quotations relevance and time depth: medieval Arabic literature in grids and networks. In: Proceedings of the 3rd Workshop on Computational Linguistics for Literature (CLFL) pp. 17–24.

  • [13] Zemánek P. Milička J. 2017. Words Lost and Found. The Diachronic Dynamics of the Arabic Lexicon. Lüdenscheid: RAM-Verlag.

  • [14] Chennoufi A. Mazroui A. 2016. Impact of morphological analysis and a large training corpus on the performances of Arabic diacritization. International Journal of Speech Technology 19(2) 269–280.

  • [15] Milička J. 2015. Teorie komunikace jakožto explanatorní princip přirozené víceúrovňové segmentace textů [The Theory of Communication as an Explanatory Principle for the Natural Multilevel Text Segmentation]. PhD thesis Charles University Prague Czech Republic.

  • [16] Kanwal J. Smith K. Culbertson J. et al. 2017. Zipf’s law of abbreviation and the principle of least effort: language users optimise a miniature lexicon for efficient communication. Cognition 165 45–52.

  • [17] Piantadosi S.T. Tily H. Gibson E. 2011. Word lengths are optimized for efficient communication. Proceedings of the National Academy of Sciences 108(9) 3526–3529.

  • [18] Ferrer-i-Cancho R. del Prado Martín F.M. 2011. Information content versus word length in random typing. Journal of Statistical Mechanics: Theory and Experiment 2011(12) L12002.

  • [19] Milička J. 2013 MaWaTaTaRaD software available at: <www.milicka.cz/en/mawatatarad/>.

  • [20] Kubát M. Milička J. 2013. Vocabulary richness measure in genres. Journal of Quantitative Linguistics 20(4) 339–349.

Journal information
All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 41 41 19
PDF Downloads 27 27 13