In this article I discuss the issues and challenges of compiling a corpus of historical plays by a range of playwrights that is highly suitable for use in comparative, corpus-based research into language style in Shakespeare’s plays. In discussing sources for digitised historical play-texts and criteria for making a selection for the present study, I argue that not just any set of Early Modern English plays constitutes a suitable basis upon which to make reliable claims about language style in Shakespeare’s plays relative to those of his peers. I point out factors outside of authorial choice which potentially have bearing on language style, such as sub-genre features and change over time. I also highlight some particular difficulties in compiling a corpus of historical texts, notably dating and spelling variation, and I explain how these were addressed. The corpus detailed in this article extends the prospects for investigating Shakespeare’s language style by providing a context into which it can be set and, as I indicate, is a valuable new publicly accessible resource for future research.
Corpus-based studies of learner language and (especially) English varieties have become more quantitative in nature and increasingly use regression-based methods and classifiers such as classification trees, random forests, etc. One recent development more widely used is the MuPDAR (Multifactorial Prediction and Deviation Analysis using Regressions) approach of Gries and Deshors (2014) and Gries and Adelman (2014). This approach attempts to improve on traditional regression- or tree-based approaches by, firstly, training a model on the reference speakers (often native speakers (NS) in learner corpus studies or British English speakers in variety studies), then, secondly, using this model to predict what such a reference speaker would produce in the situation the target speaker is in (often non-native speakers (NNS) or indigenized-variety speakers). Crucially, the third step then consists of determining whether the target speakers made a canonical choice or not and explore that variability with a second regression model or classifier.
Both regression-based modeling in general and MuPDAR in particular have led to many interesting results, but we want to propose two changes in perspective on the results they produce. First, we want to focus attention on the middle ground of the prediction space, i.e. the predictions of a regression/classifier that, essentially, are made non-confidently and translate into a statement such as ‘in this context, both/all alternants would be fine’. Second, we want to make a plug for a greater attention to misclassifications/-predictions and propose a method to identify those as well as discuss what we can learn from studying them. We exemplify our two suggestions based on a brief case study, namely the dative alternation in native and learner corpus data.
This paper focuses on the use-case study of the annotation of the mobile app reviews from Google Play and Apple Store. These annotations of sentiment polarity were created for later use in the automatic processing based on machine learning. This should solve some of the problems encountered in the previous analyses of the Czech language where data assumptions play a greater role than annotation itself (due to the financial constraints). Our proposal shows that some of the assumptions used for English do not apply to Czech and that it is possible to annotate such data without extensive financing.
The paper presents results of analysis of the lemma mateřství ‘motherhood’. The authors applied methods of corpus linguistics and discourse analysis – the corpus assisted discourse studies approach – in order to survey representations of the lemma in Czech journalistic texts published from 2010 to 2014, sorted the results into discourse categories on the basis of collocation and concordance analysis, and found out that chief referential discourse-of-motherhood categories were surrogate motherhood, relationship of motherhood and career, delight from motherhood, family relationships, financial and time aspects of motherhood, changes brought by motherhood, and active motherhood. Surrogate motherhood was presented as a solution to women who cannot have a baby themselves, nevertheless also as a complicated issue, in which case emphasis was put on relevant legislation. Motherhood was presented as a danger for a woman’s career, however also as a source of joy, an essential relationship within a family, a right for financial support from the state, a life change, an activity, and an entity closely connected to time factors.
The article presents empirical research of verbal prepositional “of“ structures, grammatical collocations of the verb and the preposition OF. The preposition OF belongs among the most frequent prepositions in the English language. The study is based on comparisons of English and Czech sentences containing verbs and prepositions that are followed by the object. Material was taken from the electronic data bank Prague Czech-English Dependency Treebank 2.0. The structures were examined and analyzed from morphological, syntactical and semantic points of view. The aim of the study is to create English-Czech verbal prepositional counterparts; to create verbal prepositional groups on the grounds of the similar semantic, syntactic features; to identify the features that are the same for each verb group and generalize them; to identify trends and tendencies for verbs when they collocate with a certain preposition. The findings are presented in several charts and tables.
The article presents the upcoming acquisition corpus of written texts of students learning Slovak as a Foreign Language and focuses on the annotation of texts, which includes information about the text as well as social and linguistic details about the student. The article also discusses the tags that identify individual errors in the texts and concept of creating the tagset itself.