This paper presents a newly-compiled diachronic corpus of Australian English (AusBrown). With four sampling time points (1931, 1961, 1991 and 2006), Aus-Brown is designed to match the current suite of British and American ‘Brown-family’ corpora in both sampling year and design. We provide details of the composition and compilation of AusBrown, and explore the broader context of its ‘Brown-family background’ and of complementary Australian corpora. We also overview research based on the Australian corpora presented, including several AusBrown-based papers.
Peter Petré, Lynn Anthonissen, Sara Budts, Enrique Manjavacas, Emma-Louise Silva, William Standing and Odile A.O. Strik
The present article provides a detailed description of the corpus of Early Modern Multiloquent Authors (EMMA), as well as two small case studies that illustrate its benefits. As a large-scale specialized corpus, EMMA tries to strike the right balance between big data and sociolinguistic coverage. It comprises the writings of 50 carefully selected authors across five generations, mostly taken from the 17th-century London society. EMMA enables the study of language as both a social and cognitive phenomenon and allows us to explore the interaction between the individual and aggregate levels.
The first part of the article is a detailed description of EMMA’s first release as well as the sociolinguistic and methodological principles that underlie its design and compilation. We cover the conceptual decisions and practical implementations at various stages of the compilation process: from text-markup, encoding and data preprocessing to metadata enrichment and verification.
In the second part, we present two small case studies to illustrate how rich contextualization can guide the interpretation of quantitative corpus-linguistic findings. The first case study compares the past tense formation of strong verbs in writers without access to higher education to that of writers with an extensive training in Latin. The second case study relates s/th-variation in the language of a single writer, Margaret Cavendish, to major shifts in her personal life.
In this article we would like to examine an area of onomastics that has not received much scholarly attention. We aim to provide an adequate linguistic analysis of the place-names found in The Elder Scrolls (ES) video game series. For our analysis, we rely chiefly on the methods of linguistic statistics, which have not yet gained widespread use in onomastic research. Our goal is to give a boost to linguistic and onomastic research into video games and to develop related aspects of its research methodology. Two main methods of place-name formation can be observed in our results: one is when the fictional names are coined on the basis of the lexical elements of already existing non-fictional languages (we call these mimetic names), and the other is when the game developers create so-called speaking names. In our article we demonstrate that the toponyms of the ES universe in part conform to the conventions of non-fictional place-name formation (e.g. they can be sorted into the two main categories of habitative names and topographical names), and in part they contradict such conventions, because around 14 percent of the names we analyzed are purposefully coined as semantically obscure toponyms, which does not happen in the case of non-fictional names.
The language of Early Modern texts can potentially reveal a lot about Shakespeare’s language. In this paper I describe the creation of a genre classification scheme for a segment of Early English Books Online – Text Creation Partnership (EEBO-TCP), covering the period 1560–1640. This categorisation permits meaningful comparison of the language of Shakespeare with that of his contemporaries and makes an integral contribution to The Encyclopaedia of Shakespeare’s Language project at Lancaster University. I outline the rationale behind the scheme, describe preliminary automatic genre classification work and present the prototype approach adopted for this categorisation. I also provide specific examples of classification in practice and discuss internal and external factors which influenced genre selection. I finish by suggesting how a range of scholars might benefit from this research.
Anni Sairio, Samuli Kaislaniemi, Anna Merikallio and Terttu Nevalainen
Research into orthography in the history of English is not a simple venture. The history of English spelling is primarily based on printed texts, which fail to capture the range of variation inherent in the language; many manuscript phenomena are simply not found in printed texts. Manuscript-based corpora would be the ideal research data, but as this is resource-intensive, linguists use editions that have been produced by non-linguists. Many editions claim to retain original spellings, but in practice text is always normalized at the graph level and possibly more so. This does not preclude using such a corpus for orthographical research, but there has been no systematic way to determine the philological reliability of an edited text. In this paper we present a typological methodology we are developing for the evaluation of orthographical quality of edition-based corpora, with the aim of making the best use of bad data in the context of editions and manuscript practices. As a case study, we apply this methodology to the Early Modern and Late Modern English sections of the Corpus of Early English Correspondence.