In this article, we deal with similarity between epigenetic marks in the DNA and the so-called hapaxes in language. A grammar description based on hapax legomena is designed. We reflect hapax analysis of Czech language provided by Novotná (2013) and avoid random selection of the corpus. For this reason, we analyze the corpus of 12 authentic books from 12 authors who elaborated the theme “What’s new in…” concerning their field of science, assigned by Nová beseda publishing. By analyzing middle-sized corpus, we expected results similar to those in case of large-scale national corpus (see Novotná 2013). We chose to classify hapaxes into different categories in comparison to Novotná, yet the results show similar language productive categories. This kind of language potentiality seems to be analogical to epigenetic processes in biology, which is briefly introduced.
In this article I discuss the issues and challenges of compiling a corpus of historical plays by a range of playwrights that is highly suitable for use in comparative, corpus-based research into language style in Shakespeare’s plays. In discussing sources for digitised historical play-texts and criteria for making a selection for the present study, I argue that not just any set of Early Modern English plays constitutes a suitable basis upon which to make reliable claims about language style in Shakespeare’s plays relative to those of his peers. I point out factors outside of authorial choice which potentially have bearing on language style, such as sub-genre features and change over time. I also highlight some particular difficulties in compiling a corpus of historical texts, notably dating and spelling variation, and I explain how these were addressed. The corpus detailed in this article extends the prospects for investigating Shakespeare’s language style by providing a context into which it can be set and, as I indicate, is a valuable new publicly accessible resource for future research.
Corpus-based studies of learner language and (especially) English varieties have become more quantitative in nature and increasingly use regression-based methods and classifiers such as classification trees, random forests, etc. One recent development more widely used is the MuPDAR (Multifactorial Prediction and Deviation Analysis using Regressions) approach of Gries and Deshors (2014) and Gries and Adelman (2014). This approach attempts to improve on traditional regression- or tree-based approaches by, firstly, training a model on the reference speakers (often native speakers (NS) in learner corpus studies or British English speakers in variety studies), then, secondly, using this model to predict what such a reference speaker would produce in the situation the target speaker is in (often non-native speakers (NNS) or indigenized-variety speakers). Crucially, the third step then consists of determining whether the target speakers made a canonical choice or not and explore that variability with a second regression model or classifier.
Both regression-based modeling in general and MuPDAR in particular have led to many interesting results, but we want to propose two changes in perspective on the results they produce. First, we want to focus attention on the middle ground of the prediction space, i.e. the predictions of a regression/classifier that, essentially, are made non-confidently and translate into a statement such as ‘in this context, both/all alternants would be fine’. Second, we want to make a plug for a greater attention to misclassifications/-predictions and propose a method to identify those as well as discuss what we can learn from studying them. We exemplify our two suggestions based on a brief case study, namely the dative alternation in native and learner corpus data.
Since its inception in the second part of the 20th century, the science of language evolution has been exerting a growing and formative pressure on linguistics. More obviously, given its interdisciplinary character, the science of language evolution provides a platform on which linguists can meet and discuss a variety of problems pertaining to the nature of language and ways of investigating it with representatives of other disciplines and research traditions. It was largely in this way that the attention of linguists was attracted to the study of emerging sign languages and gestures, as well as to the resultant reflection on the way different modalities impact communicative systems that use them. But linguistics also benefits from the findings made by language evolution researchers in the context of their own research questions and methodologies. The most important of these findings come out of the experimental research on bootstrapping communication systems and the evolution of communicative structure, and from mass comparison studies that correlate linguists data with a wide range of environmental variables.
By considering a specific scenario of early language evolution, here I advocate taking into account one of the most obvious players in the evolution of human language capacity: (sexual) selection. The proposal is based both on an internal reconstruction using syntactic theory, and on comparative typological evidence, directly bringing together, formal, typological, and evolutionary considerations. As one possible test case, transitivity is decomposed into evolutionary primitives of syntactic structure, revealing a common denominator and the building blocks for crosslinguistic variation in transitivity. The approximations of this early grammar, identified by such a reconstruction, while not identical constructs, are at least as good proxies of the earliest stages of grammar as one can find among tools, cave paintings, or bird song. One subtype of such “living fossils” interacts directly with biological considerations of survival, aggression, and mate choice, while others clearly distinguish themselves in fMRI experiments. The fMRI findings are consistent with the proposal that the pressures to be able to master ever more and more complex syntax were at least partly responsible for driving the selection processes which gradually increased the connectivity of the Broca’s-basal ganglia network, crucial for syntactic processing, among other important functions.
In January 2018, the President of the Czech Republic was elected. Before that, each of the candidates communicated their intention to run for the office in a different kind of speech. By using selected characteristics we evaluate and compare these candidate speeches. Subsequently, we reflect on the possibilities of correlating the results of the election with data collected during the analysis.