The language of Early Modern texts can potentially reveal a lot about Shakespeare’s language. In this paper I describe the creation of a genre classification scheme for a segment of Early English Books Online – Text Creation Partnership (EEBO-TCP), covering the period 1560–1640. This categorisation permits meaningful comparison of the language of Shakespeare with that of his contemporaries and makes an integral contribution to The Encyclopaedia of Shakespeare’s Language project at Lancaster University. I outline the rationale behind the scheme, describe preliminary automatic genre classification work and present the prototype approach adopted for this categorisation. I also provide specific examples of classification in practice and discuss internal and external factors which influenced genre selection. I finish by suggesting how a range of scholars might benefit from this research.
Tamás Fekete and Ádám Porkoláb
In this article we would like to examine an area of onomastics that has not received much scholarly attention. We aim to provide an adequate linguistic analysis of the place-names found in The Elder Scrolls (ES) video game series. For our analysis, we rely chiefly on the methods of linguistic statistics, which have not yet gained widespread use in onomastic research. Our goal is to give a boost to linguistic and onomastic research into video games and to develop related aspects of its research methodology. Two main methods of place-name formation can be observed in our results: one is when the fictional names are coined on the basis of the lexical elements of already existing non-fictional languages (we call these mimetic names), and the other is when the game developers create so-called speaking names. In our article we demonstrate that the toponyms of the ES universe in part conform to the conventions of non-fictional place-name formation (e.g. they can be sorted into the two main categories of habitative names and topographical names), and in part they contradict such conventions, because around 14 percent of the names we analyzed are purposefully coined as semantically obscure toponyms, which does not happen in the case of non-fictional names.
Peter Petré, Lynn Anthonissen, Sara Budts, Enrique Manjavacas, Emma-Louise Silva, William Standing and Odile A.O. Strik
The present article provides a detailed description of the corpus of Early Modern Multiloquent Authors (EMMA), as well as two small case studies that illustrate its benefits. As a large-scale specialized corpus, EMMA tries to strike the right balance between big data and sociolinguistic coverage. It comprises the writings of 50 carefully selected authors across five generations, mostly taken from the 17th-century London society. EMMA enables the study of language as both a social and cognitive phenomenon and allows us to explore the interaction between the individual and aggregate levels.
The first part of the article is a detailed description of EMMA’s first release as well as the sociolinguistic and methodological principles that underlie its design and compilation. We cover the conceptual decisions and practical implementations at various stages of the compilation process: from text-markup, encoding and data preprocessing to metadata enrichment and verification.
In the second part, we present two small case studies to illustrate how rich contextualization can guide the interpretation of quantitative corpus-linguistic findings. The first case study compares the past tense formation of strong verbs in writers without access to higher education to that of writers with an extensive training in Latin. The second case study relates s/th-variation in the language of a single writer, Margaret Cavendish, to major shifts in her personal life.
Peter Collins and Xinyue Yao
This paper presents a newly-compiled diachronic corpus of Australian English (AusBrown). With four sampling time points (1931, 1961, 1991 and 2006), Aus-Brown is designed to match the current suite of British and American ‘Brown-family’ corpora in both sampling year and design. We provide details of the composition and compilation of AusBrown, and explore the broader context of its ‘Brown-family background’ and of complementary Australian corpora. We also overview research based on the Australian corpora presented, including several AusBrown-based papers.
Sebastian Hoffmann, Merja Kytö, Terttu Nevalainen and Irma Taavitsainen
In compiling and testing the diachronic part of the Helsinki Corpus of English Texts, our project group has come across three problems which arise from the use of computer corpora in studies of syntax and vocabulary. While these problems are mainly associated with work on diachronic corpora, they may be universal enough to deserve somewhat more general consideration. They could be called “The philologist’s dilemma”, “God’s truth fallacy”, and “The mystery of vanishing reliability”. The first could be described as pedagogical, the second methodological and the third pragmatic.
Paula Rautionaho, Sandra C. Deshors and Lea Meriläinen
This study focuses on the progressive vs. non-progressive alternation to revisit the debate on the ENL-ESL-EFL continuum (i.e. whether native (ENL) and nonnative (ESL/EFL) Englishes are dichotomous types of English or form a gradient continuum). While progressive marking is traditionally studied independently of its unmarked counterpart, we examine (i) how the grammatical contexts of both constructions systematically affect speakers’ constructional choices in ENL (American, British), ESL (Indian, Nigerian and Singaporean) and EFL (Finnish, French and Polish learner Englishes) and (ii) what light speakers’ varying constructional choices bring to the continuum debate. Methodologically, we use a clustering technique to group together individual varieties of English (i.e. to identify similarities and differences between those varieties) based on linguistic contextual features such as AKTIONSART, ANIMACY, SEMANTIC DOMAIN (of aspect-bearing lexical verb), TENSE, MODALITY and VOICE to assess the validity of the ENL-ESL-EFL classification for our data. Then, we conduct a logistic regression analysis (based on lemmas observed in both progressive and non-progressive constructions) to explore how grammatical contexts influence speakers’ constructional choices differently across English types. While, overall, our cluster analysis supports the ENL-ESL-EFL classification as a useful theoretical framework to explore cross-variety variation, the regression shows that, when we start digging into the specific linguistic contexts of (non-)progressive constructions, this classification does not systematically transpire in the data in a uniform manner. Ultimately, by including more than one statistical technique into their exploration of the continuum, scholars could avoid potential methodological biases.