The topic of this paper is the interaction of aspectual verb coding, information content and lengths of verbs, as generally stated in Shannon’s source coding theorem on the interaction between the coding and length of a message. We hypothesize that, based on this interaction, lengths of aspectual verb forms can be predicted from both their aspectual coding and their information. The point of departure is the assumption that each verb has a default aspectual value and that this value can be estimated based on frequency – which has, according to Zipf’s law, a negative correlation with length. Employing a linear mixed-effects model fitted with a random effect for LEMMA, effects of the predictors’ DEFAULT – i.e. the default aspect value of verbs, the Zipfian predictor FREQUENCY and the entropy-based predictor AVERAGE INFORMATION CONTENT – are compared with average aspectual verb form lengths. Data resources are 18 UD treebanks. Significantly differing impacts of the predictors on verb lengths across our test set of languages have come to light and, in addition, the hypothesis of coding asymmetry does not turn out to be true for all languages in focus.
If the inline PDF is not rendering correctly, you can download the PDF file here.
Altmann G. T. M. Kamide Y. 1999. Incremental interpretation at verbs: Restricting the domain of subsequent reference. Cognition vol. 73 no. 3 pp. 247–264.
Aylett M.Turk A. 2004. The Smooth Signal Redundancy Hypothesis: An explanation for relationships between redundancy prosodic prominence and duration in spontaneous speech. Language and Speech vol. 47 no.1 pp. 31 – 56.
Bard E. G. Anderson A. H Sotillo C. Aylett M. Doherty-Sneddon G. and Newlands A. 2000. Controlling the intelligibility of referring expressions in dialogue. Journal of Memory and Language vol. 42 pp. 1 – 22.
Bates D. Mächler M. Bolker B. and Walker S. 2015. Fitting Linear Mixed-Effects Models using lme4. Journal of Statistics Software vol. 67 no. 1 pp.-1–48.
Bell A Brenier J.M. Gregory M. L. Girand C. and Jurafsky D. 2009. Predictability effects on durations of content and function words in conversational English. Journal of Memory and Language vol. 60 pp. 92 – 111.
Bentz C. Ferreri Cancho R. 2016. Zipfs law of abbreviation as a language universal. Paper presented at The Leiden Workshop on Capturing Phylogenetic Algorithms for Linguistics. Leiden The Netherlands.
Bickerton D. 1981. Roots of language. Ann Arbor: Karoma.
Bohnemeyer J. Swift M. 2004. Event realization and default aspect. Linguistics and Philosophy vol. 27 no. 3 pp. 263 – 296.
Bybee J. L. Scheibman. J. 1999. The effect of usage on degrees of constituency: the reduction of don’t in English. Linguistics vol. 37 no. 4 pp. 575 – 596.
Bybee J. L. 1985. Morphology: A Study of the Relation between Meaning and Form. Amsterdam: Benjamins.
Bybee J. L. 1994. The grammaticization of zero: asymmetries in tense and aspect systems. In: W. Pagliuca ed. Perspectives on grammaticalization vol. 109. Amsterdam: Benjamins pp. 235 – 254.
Celano G. G. A. Richter M. Voll R. and Heyer G. 2018. Aspect coding asymmetries of verbs: The case of Russian. In: A. Barbaresi H. Biber F. Neubarth R. Osswald. (eds.) KONVENS 2018. PROCEEDINGS of the 14th Conference on Natural Language Processing pp. 34 – 39
Cohen Priva U. 2008. Using information content to predict phone deletion. Paper presented at The 27th West Coast Conference on Formal Linguistics. University of California Los Angeles May 16 – 18 2008.
Croft W. 2003. Typology and universals. 2nd edition. Cambridge University Press.
Croft W. 2012. Verbs: Aspect and causal structure. Oxford University Press.
Demberg V. Keller F. and Koller. A. 2013. Incremental predictive parsing with psycholinguistically motivated tree-adjoining grammar. Computational Linguistics vol. 39 no. 4 pp. 1025–1066.
Fenk-Oczlon G. 1990. Ikonismus versus Ökonomieprinzip: Am Beispiel russischer Aspekt- und Kasusbildungen. Papiere zur Linguistik vol. 42 no. 1 pp. 49 – 69.
Fowler C. A. 1988. Differential shortening of repeated content words produced in various communicative contexts. Language and Speech vol. 31 pp. 307 – 319.
Fowler C. A. Housum. J. 1987. Talkers’ signaling of “new” and “old” words in speech and listeners’ perception and use of the distinction. Journal of Memory and Language vol. 26 pp. 489 – 504.
Greenberg J. H. 1963. Some universals of grammar with particular reference to the order of meaningful elements. In: J. H. Greenberg ed. Universals of language. Cambridge. MA: MIT Press pp. 73 – 113.
Greenberg J.H. 1966. Language universals with special reference to feature hierarchies. The Hague: Mouton.
Gregory M. L. Raymond W. D. Bell A. Fosler-Lussier E. and Jurafsky D. 1999. The effects of collocational strength and contextual predictability in lexical production. Chicago Linguistics Society (CLS-99) vol. 35 pp. 151 – 166.
Hale. J. 2001. A probabilistic Earley parser as a psycholinguistic model. Paper presented at NAACL. Carnegie Mellon University Pittsburgh 2-7 June 2001.
Haspelmath M. 2008. Creating economical patterns in language change. In: J. Good ed. Linguistic universals and language change. Oxford: Oxford University Press pp. 185–214.
Haspelmath M. Calude A. Spagnol M. Narrog H. and BamyacÌ E. 2014. Coding causal-noncausal verb alternations: A form-frequency correspondence explanation. Journal of Linguistics vol. 50 no. 3 pp. 587–625.
Haspelmath M. Karjus A. 2017. Explaining asymmetries in number marking: Singulatives pluratives and usage frequency. Linguistics vol. 55 no. 6 pp. 1213–1235.
Hawkins S. Warren P. 1994. Implications for lexical access of phonetic influences on the intelligibility of conversational speech. Journal of Phonetics vol. 22 pp. 493 – 511.
Jaeger T. F. 2010. Redundancy and reduction: Speakers manage information density. Cognitive Psychology vol. 61 no 1. pp. 23–62.
Jakobson R. 1939. Signe zéro. Mélanges linguistiques offerts à Charles Bally pp. 143 – 152. Genève.
Johanson L. 2000. Viewpoint operators in European languages. In: Ö. Dahl ed. Tense and Aspect in the Languages of Europe. Berlin: Mouton de Gruyter pp. 27–187.
Levshina N. 2017. Communicative efficiency and syntactic predictability: A crosslinguistic study based on the Universal Dependencies corpora. Paper presented at The NoDaLiDa 2017 Workshop on Universal Dependencies (UDW 2017). Gothenburg Sweden May 22 – 24 2017.
Levy R. Jaeger T. F. 2007. Speaker Optimize Information Density Through Syntactic Reduction. Proceedings of the 20th Conference on Neural Information Processing Systems (NIPS).
Levy R. 2008. Expectation–based syntactic comprehension. Cognition vol. 106 no. 3 pp. 1126–77.
Levy R. 2013. Memory and Surprisal in Human Sentence Comprehension. In: R. van Gompel ed. Sentence Processing. Hove: Psychology Press pp. 78–114.
Li B. Cheng J. Liu Y. and Keller F. 2018. Dependency Grammar Induction with a Neural Variational Transition-based Parser. Available at https://arxiv.org/abs/1811.05889
Nivre J. Agic ́ Ž. Ahrenberg L. et al. 2017. Universal Dependencies 2.0–CoNLL 2017 Shared Task Development and Test Data LIN-DAT/CLARIN digital library at the Institute of Formal and Applied Linguistics (ÃŽFAL) Faculty of Mathematics and Physics Charles University Prague.
Piantadosi S.T. Tily H. and Gibson E. 2011. Word lengths are optimized for efficient communication. PNAS vol. 108 no. 9 pp. 3526–3529.
Pluymaekers M. Ernestus M. and Baayen H. 2005. Articulatory planning is continuous and sensitive to informational redundancy. Phonetica vol. 62 no. 2-4 pp. 146–159.
Ramm A. Loáiciga S. Friedrich A. and Fraser A. 2017. Annotating tense mood and voice for English French and German. Paper presented at The 55th Annual Meeting of the Association for Computational Linguistics demo session (ACL). Vancouver Canada July 3–August 2 2017.
Sagae. K. Tsujii J. 2007. Dependency Parsing and Domain Adaptation with LR Models and Parser Ensembles. Proceedings of the CoNLL 2007 Shared Task in the Joint Conferences on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL’07 shared task) pp. 1044–1050. Prague Czech Republic.
Sasse H.-J. 2006. Aspect and Aktionsart. In: E. K. Brown ed. Encyclopedia of language and linguistics. Boston: Elsevier pp. 535–538.
Shannon C. E. Weaver W.1948. A mathematical Theory of Communication. The Bell System Technical Journal vol. 27 379–423 pp. 623–656.
Velupillai V. 2012. Zero coding in tense-aspect systems of creole languages. Amsterdam: John Benjamins.
Zipf G. K. 1936. The Psychobiology of Language. London: Routledge.
Zipf G. K. 1949. Human Behavior and the Principle of Least Effort. New York: Addison-Wesley.