Early Modern Multiloquent Authors (EMMA): Designing a large-scale corpus of individuals’ languages

Open access


The present article provides a detailed description of the corpus of Early Modern Multiloquent Authors (EMMA), as well as two small case studies that illustrate its benefits. As a large-scale specialized corpus, EMMA tries to strike the right balance between big data and sociolinguistic coverage. It comprises the writings of 50 carefully selected authors across five generations, mostly taken from the 17th-century London society. EMMA enables the study of language as both a social and cognitive phenomenon and allows us to explore the interaction between the individual and aggregate levels.

The first part of the article is a detailed description of EMMA’s first release as well as the sociolinguistic and methodological principles that underlie its design and compilation. We cover the conceptual decisions and practical implementations at various stages of the compilation process: from text-markup, encoding and data preprocessing to metadata enrichment and verification.

In the second part, we present two small case studies to illustrate how rich contextualization can guide the interpretation of quantitative corpus-linguistic findings. The first case study compares the past tense formation of strong verbs in writers without access to higher education to that of writers with an extensive training in Latin. The second case study relates s/th-variation in the language of a single writer, Margaret Cavendish, to major shifts in her personal life.

If the inline PDF is not rendering correctly, you can download the PDF file here.

  • Anderwald Lieselotte. 2011. Norm vs. variation in British English irregular verbs: The case of past tense sang vs. sung. English Language and Linguistics 15: 85–112.

  • Anthonissen Lynn and Peter Petré. 2019 (forthcoming). Grammaticalization and the linguistic individual: new avenues in lifespan research. To appear in Linguistics Vanguard (Special Issue: Language and Aging).

  • Anthonissen Lynn. (Manuscript). Cognition in construction grammar. Cognitive Linguistics (Special issue: Constructionist Approaches to Individual Grammars).

  • Anthonissen Lynn. 2019 (forthcoming). Constructional change across the lifespan: The nominative and infinitive in early modern writers. To appear in K. Bech and R. Möhlig-Falke (eds.). Grammar – discourse – context: Grammar and usage in language variation and change (Discourse Patterns). Berlin: De Gruyter Mouton.

  • Apache OpenNLP. 2017. The Apache Software Foundation. https://opennlp.apache.org

  • Archer Ian W. 2000. Social networks in Restoration London: The evidence of Samuel Pepys’s diary. In A. Shepard P. J. Withington and P. Withington (eds.). Communities in early modern England: networks place rhetoric 76–94. Manchester: Manchester University Press.

  • Bastian Mathieu Sebastien Heymann and Mathieu Jacomy. 2009. Gephi: An open source software for exploring and manipulating networks. International AAAI Conference on Weblogs and Social Media. www.aaai.org/ocs/index.php/ICWSM/09/paper/view/154.

  • Beckner Clay Richard Blythe Joan Bybee Morten H. Christiansen William Croft Nick C. Ellis John Holland Jinyun Ke Diane Larsen-Freeman and Tom Schoenemann. 2009. Language is a complex adaptive system. Language Learning 59: 126.

  • Bergs Alexander. 2005. Social networks and historical sociolinguistics: Studies in morphosyntactic variation in the Paston Letters (1421–1503) (Topics in English Linguistics 51). Berlin: Mouton de Gruyter.

  • Biber Douglas Edward Finegan and David Atkinson. 1994. ARCHER and its challenges: Compiling and exploring A Representative Corpus of Historical English Registers. In U. Fries G. Tottie and P. Schneider (eds.). Creating and using English language corpora 1–14. Amsterdam: Rodopi.

  • Burns Philip R. 2013. MorphAdorner v2: A Java library for the morphological adornment of English language texts. Evanston: Northwestern University. https://morphadorner.northwestern.edu/morphadorner/download/morphadorner.pdf.

  • Bybee Joan L. 2010. Language usage and cognition. Cambridge: Cambridge University Press.

  • Dąbrowska Ewa and James Street. 2006. Individual differences in language attainment: Comprehension of passive sentences by native and non-native English speakers. Language Sciences 28: 604–615.

  • Dąbrowska Ewa. 2015. Individual differences in grammatical knowledge. In E. Dąbrowska and D. Divjak (eds.). Handbook of cognitive linguistics 649–667. Berlin: De Gruyter Mouton.

  • de Does Jess Jan Niestadt and Katrien Depuydt. 2017. Creating research environments with blackLab. In J. Odijk and A. van Hessen (eds.). CLARIN in the Low Countries 245–257. London: Ubiquity Press. DOI: https://doi.org/10.5334/bbi.20. License: CC-BY 4.0

  • ECCO = Eighteenth Century Collections Online. quod.lib.umich.edu/e/ecco.

  • ECCO-TCP = Eighteenth Century Collections Online – Text Creation Partnership. www.textcreationpartnership.org/tcp-ecco.

  • Eckert Penelope. 2000. Linguistic variation as social practice. Oxford: Black-well.

  • Eckert Penelope. 2008. Variation and the indexical field. Journal of Sociolinguistics 12 (4): 453–476.

  • EEBO = Early English Books Online. eebo.chadwyck.com.

  • EEBO-TCP = Early English Books Online – Text Creation Partnership. www.textcreationpartnership.org/tcp-eebo.

  • Ellis Nick C. 2011. The emergence of language as a complex adaptive system. In J. Simpson (ed.). The Routledge handbook of applied linguistics 654–667. New York: Routledge.

  • Evans-TCP = Evans Early American Imprints – Text Creation Partnership. www.textcreationpartnership.org/tcp-evans.

  • Evert Stefan and Andrew Hardie. 2011. Twenty-first century Corpus Workbench: Updating a query architecture for the new millennium. In Proceedings of the Corpus Linguistics Conference 2011 Birmingham 20–22 July. Paper #153. https://www.birmingham.ac.uk/documents/college-artslaw/corpus/conference-archives/2011/Paper-153.pdf.

  • Fitzmaurice James. 2004. Cavendish [née Lucas] Margaret duchess of New-castle upon Tyne (1623?–1673) writer. Oxford dictionary of national biography. Oxford: Oxford University Press. https://doi.org/10.1093/ref:odnb/4940.

  • Fitzmaurice Susan. 2004. The meanings and uses of the progressive construction in an early eighteenth-century English network. In A. Curzan and K. Emmons (eds.). Studies in the history of the English language II 131–174. Berlin: de Gruyter.

  • Fonteyn Lauren and Andrea Nini. My alternation my rules: Investigating syntactic variation in individual Englishes. Cognitive Linguistics (Special issue: Constructionist Approaches to Individual Grammars).

  • Gotti Maurizio. 2013. The formation of the Royal Society as a community of practice and discourse. In J. Kopaczyk and A.H. Jucker (eds.). Communities of practice in the history of English 269–285. Amsterdam/Philadelphia: John Benjamins.

  • Guy Gregory and Sally Boyd. 1990. The development of a morphological Class. Language Variation and Change 2 (1): 1–18.

  • Hanson Craig Ashley. 2009. The English virtuoso: Art medicine and antiquarianism in the age of Empiricism. Chicago IL: University of Chicago Press.

  • Howard-Hill T.H. 2006. Early modern printers and the standardization of English spelling. The Modern Language Review 101 (1): 16–29.

  • Kopaczyk Joanna and Andreas H. Jucker (eds.). 2013. Communities of practice in the history of English. Amsterdam and Philadelphia: John Benjamins.

  • Kroch Anthony Beatrice Santorini and Lauren Delfs. 2004. The Penn-Helsinki Parsed Corpus of Early Modern English (PPCEME). University of Pennsylvania: Department of Linguistics. CD-ROM first edn. release 3. www.ling.upenn.edu/ppche/ppche-release-2016/PPCEME-RELEASE-3.

  • Kytö Merja and Terry Walker. 2006. Guide to A Corpus of English Dialogues 1560–1760 (Studia Anglistica Upsaliensia 130). Uppsala: Acta Universitatis Upsaliensis.

  • Labov William. 2007. Transmission and diffusion. Language 83: 344–387.

  • Manjavacas Enrique A. and Peter Petré. 2017. Enabling annotation of historical corpora in an asynchronous collaborative environment. In Proceedings of DATeCH2017 Göttingen Germany June 01–02 2017 6 pages. http://dx.doi.org/10.1145/3078081.3078089.

  • Milroy James and Lesley Milroy. 1997. Network structure and linguistic change. In N. Coupland and A. Jaworski (eds.). Sociolinguistics 199–211. London: Palgrave.

  • Milroy Lesley and James Milroy. 1992. Social network and social class: Toward an integrated sociolinguistic model. Language in Society 21 (1): 1–26.

  • Nevalainen Terttu Helena Raumolin-Brunberg and Heikki Mannila. 2011. The diffusion of language change in real-time. Language Variation and Change 23: 1–43.

  • Nevalainen Terttu. 2015. Social networks and language change in Tudor and Stuart London – only connect? English Language and Linguistics 19 (2): 269–292.

  • Nurmi Arja Ann Taylor Anthony Warner Susan Pintzuk and Terttu Nevalainen. 2006. Parsed Corpus of Early English Correspondence tagged version (PCEEC). Compiled by the CEEC Project Team. York and Helsinki: University of York and University of Helsinki. Distributed through the Oxford Text Archive.

  • Petré Peter and Freek Van de Velde. 2018. The real-time dynamics of the individual and the community in grammaticalization. Language 94 (4): 867–901.

  • Raumolin-Brunberg Helena. 2009. Lifespan changes in the language of three early modern gentlemen. In A. Nurmi M. Nevala and M. Palander-Collin (eds.). The language of daily life in England (1400–1800) (Pragmatics & Beyond 183) 165–196. Amsterdam: Benjamins.

  • Repo Liina. 2018. Errors and corrections: Early Modern English errata lists in 1529–1700 and their connection to prescriptivism. Turku: Faculty of Humanities MA thesis. http://www.utupub.fi/handle/10024/146176.

  • Rissanen Matti Merja Kytö Leena Kahlas-Tarkka Matti Kilpiö Saara Nevanlinna Irma Taavitsainen Terttu Nevalainen and Helena Raumolin-Brunberg. 1991. Helsinki Corpus of English Texts. Department of Modern Languages: University of Helsinki.

  • Rivers Isabel. 2004. Tillotson John (1630–1694) archbishop of Canterbury. Oxford dictionary of national biography. Oxford: Oxford University Press. https://doi-org/10.1093/ref:odnb/27449.

  • Sairio Anni. 2009. Methodological and practical aspects of historical network analysis. In A. Nurmi M. Nevala and M. Palander-Collin (eds.). The language of daily life in England (1400–1800) (Pragmatics & Beyond 183) 107–135. Amsterdam: Benjamins.

  • Sankoff Gillian. 2005. Cross-sectional and longitudinal studies in sociolinguistics. In P. Trudgill (ed.). Sociolinguistics: An international handbook of the science of language and society 1003–1013. Berlin: De Gruyter Mouton

  • Schmid Hans-Jörg. (Forthcoming). The dynamics of the linguistic system: Usage conventionalization and entrenchment. Oxford: Oxford University Press.

  • Standing William and Peter Petré. (Submitted). Lifespan change versus inter-generational incrementation in the schematization of syntactic constructions. In I. Buchstaller S. Wagner and K. Beaman (eds.). Panel studies of variation and change vol. II. Oxford: Routledge.

  • Standing William Odile A.O. Strik and Peter Petré. (Submitted). Change versus stability in syntactic constructions of Early Modern English networked individuals. Journal of English Linguistics (Special issue: The Role of an Individual Speaker in Linguistic Change).

  • Steels Luc. 2000. Language as a complex adaptive system. In M. Schoenauer K. Deb G. Rudolph X. Yao E. Lutton J.J. Merelo and H-P. Schwefel (eds.). Parallel Problem Solving from Nature (PPSN) VI (Lecture Notes in Computer Science 1917) 17–26. New York: Springer.

  • Taavitsainen Irma Päivi Pahta Turo Hiltunen Martti Mäkinen Ville Marttila Maura Ratia Carla Suhr and Jukka Tyrkkö. 2010. Early Modern English Medical Texts (EMEMT). CD-ROM. Amsterdam: John Benjamins.

  • Theobald Martin Jonathan Siddharth and Andreas Paepcke. 2008. SpotSigs: Robust and efficient near duplicate detection in large web collections. Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval 563–570. New York: ACM. https://dl.acm.org/citation.cfm?id=1390431&dl=ACM&coll=DL.

  • Trudgill Peter. 2011. Sociolinguistic typology: Social determinants of linguistic complexity. Oxford: Oxford University Press.

  • Van de Velde Freek and Peter Petré. 2017. Linking grammaticalization to historical demography. Paper presented at Historical Sociolinguistics Network New York April 6–7.

  • Van de Velde Freek. 2014. Degeneracy: The maintenance of constructional networks. In R. Boogaart T. Colleman and G. Rutten (eds.). Extending the scope of construction grammar 141179. Berlin: De Gruyter Mouton.

  • Wagner Suzanne Evans. 2012. Age grading in sociolinguistic theory. Language and Linguistics Compass 6 (6): 371–382.

  • Walker Terry. 2017. “he saith yt he thinkes yt”: Linguistic factors influencing third person singular present tense verb inflection in Early Modern English depositions. Studia Neophilologica 89 (1): 133–346.

  • Yáñez-Bouza Nuria. 2011. ARCHER past and present (1990–2010). ICAME Journal 35: 205–236.

Journal information
All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 718 717 36
PDF Downloads 285 285 11