Sebastian Hoffmann, Merja Kytö, Terttu Nevalainen and Irma Taavitsainen
In compiling and testing the diachronic part of the Helsinki Corpus of English Texts, our project group has come across three problems which arise from the use of computer corpora in studies of syntax and vocabulary. While these problems are mainly associated with work on diachronic corpora, they may be universal enough to deserve somewhat more general consideration. They could be called “The philologist’s dilemma”, “God’s truth fallacy”, and “The mystery of vanishing reliability”. The first could be described as pedagogical, the second methodological and the third pragmatic.
Paula Rautionaho, Sandra C. Deshors and Lea Meriläinen
This study focuses on the progressive vs. non-progressive alternation to revisit the debate on the ENL-ESL-EFL continuum (i.e. whether native (ENL) and nonnative (ESL/EFL) Englishes are dichotomous types of English or form a gradient continuum). While progressive marking is traditionally studied independently of its unmarked counterpart, we examine (i) how the grammatical contexts of both constructions systematically affect speakers’ constructional choices in ENL (American, British), ESL (Indian, Nigerian and Singaporean) and EFL (Finnish, French and Polish learner Englishes) and (ii) what light speakers’ varying constructional choices bring to the continuum debate. Methodologically, we use a clustering technique to group together individual varieties of English (i.e. to identify similarities and differences between those varieties) based on linguistic contextual features such as AKTIONSART, ANIMACY, SEMANTIC DOMAIN (of aspect-bearing lexical verb), TENSE, MODALITY and VOICE to assess the validity of the ENL-ESL-EFL classification for our data. Then, we conduct a logistic regression analysis (based on lemmas observed in both progressive and non-progressive constructions) to explore how grammatical contexts influence speakers’ constructional choices differently across English types. While, overall, our cluster analysis supports the ENL-ESL-EFL classification as a useful theoretical framework to explore cross-variety variation, the regression shows that, when we start digging into the specific linguistic contexts of (non-)progressive constructions, this classification does not systematically transpire in the data in a uniform manner. Ultimately, by including more than one statistical technique into their exploration of the continuum, scholars could avoid potential methodological biases.
The introductory it pattern, as in ‘It is important to note that information was added’, is a tool used by academic writers for a range of different rhetorical and information-structural purposes. It is thus an important pattern for students to learn. Since previous research on student writing has indicated that there seems to be a correlation between form and function of the pattern, the present study sets out to investigate this more systematically in non-native-speaker and nativespeaker student writing in two disciplines (linguistics and literature). In doing so, the study adds to and extends previous research looking into factors such as NS status and discipline. It uses data from three corpora: ALEC, BAWE and MICUSP. The results show that there is indeed a correlation between form and function, as the most common syntactic types of the pattern each display a preferred function and vice versa. While very few differences across NS status were found, there were certain discipline-specific disparities. The findings, which could be useful for teaching students about the use of the introductory it pattern, also have implications for the automatized functional tagging of parsed corpora.
Nicholas Smith and Cathleen Waters
The aims of this paper are twofold: i) to present the motivation and design of a sociohistorical corpus derived from the popular BBC Radio show, Desert Island Discs (DID); and ii) to illustrate the potential of the DID corpus (DIDC) with a case study. In an era of ever-increasing digital resources and scholarly interest in recent language change, there remains an enormous disparity between available written and spoken corpora. We describe how a corpus derived from DID contributes to redressing the balance. Treating DID as an example of a specialized register, namely, a ‘biographical chat show’, we review its attendant situational characteristics, and explain the affordances and design features of a sociolinguistic corpus sampling of the show. Finally, to illustrate the potential of DIDC for linguistic exploration of recent change, we conduct a case study on two pronouns with generic, impersonal reference, namely you and one.