Corpus-based studies of learner language and (especially) English varieties have become more quantitative in nature and increasingly use regression-based methods and classifiers such as classification trees, random forests, etc. One recent development more widely used is the MuPDAR (Multifactorial Prediction and Deviation Analysis using Regressions) approach of Gries and Deshors (2014) and Gries and Adelman (2014). This approach attempts to improve on traditional regression- or tree-based approaches by, firstly, training a model on the reference speakers (often native speakers (NS) in learner corpus studies or British English speakers in variety studies), then, secondly, using this model to predict what such a reference speaker would produce in the situation the target speaker is in (often non-native speakers (NNS) or indigenized-variety speakers). Crucially, the third step then consists of determining whether the target speakers made a canonical choice or not and explore that variability with a second regression model or classifier.
Both regression-based modeling in general and MuPDAR in particular have led to many interesting results, but we want to propose two changes in perspective on the results they produce. First, we want to focus attention on the middle ground of the prediction space, i.e. the predictions of a regression/classifier that, essentially, are made non-confidently and translate into a statement such as ‘in this context, both/all alternants would be fine’. Second, we want to make a plug for a greater attention to misclassifications/-predictions and propose a method to identify those as well as discuss what we can learn from studying them. We exemplify our two suggestions based on a brief case study, namely the dative alternation in native and learner corpus data.
If the inline PDF is not rendering correctly, you can download the PDF file here.
Deshors, Sandra C. 2014. A case for a unified treatment of EFL and ESL: A multifactorial approach. English World-Wide 35(3): 279–307.
Deshors, Sandra C. 2018. Simple Past meets Present Perfect meets Passé Composé: A semantic exploration of the Present Perfect in French-English inter-language. International Journal of Learner Corpus Research 4(1): 23–53.
Deshors, Sandra C. and Stefan Th. Gries. 2016. Profiling verb complementation constructions across New Englishes: A two-step random forests analysis to ing vs. to complements. International Journal of Corpus Linguistics 21(2): 192–218.
Fox, John and Sanford Weisberg. 2019. An R companion to applied regression. 3rd ed. Los Angeles, London, etc.: Sage.
Gries, Stefan Th. 2003a. Multifactorial analysis in corpus linguistics: A study of Particle Placement. London and New York: Continuum Press.
Gries, Stefan Th. 2003b. Towards a corpus-based identification of prototypical instances of constructions. Annual Review of Cognitive Linguistics 1: 1–27.
Gries, Stefan Th. to appear. On classification trees and random forests in corpus linguistics: some words of caution and suggestions for improvement. Corpus Linguistics and Linguistic Theory.
Gries, Stefan Th. and Allison S. Adelman. 2014. Subject realization in Japanese conversation by native and non-native speakers: Exemplifying a new paradigm for learner corpus research. In Jesús Romero-Trillo (ed.). Yearbook of corpus linguistics and pragmatics 2014: New empirical and theoretical paradigms, 35–54. Cham: Springer.
Gries, Stefan Th. and Sandra C. Deshors. 2014. Using regressions to explore deviations between corpus data and a standard/target: Two suggestions. Corpora 9(1): 109–136.
Gries, Stefan Th. and Anatol Stefanowitsch. 2004. Extending collostructional analysis: A corpus-based perspective on ‘alternations’. International Journal of Corpus Linguistics 9(1): 97–129.
Heller, Benedikt, Tobias Bernaisch and Stefan Th. Gries. 2017. Empirical perspectives on two potential epicenters: The genitive alternation in Asian Englishes. ICAME Journal 41: 111–144.
Heller, Benedikt, Benedikt Szmrecsanyi and Jason Grafmiller. 2017. Stability and fluidity in syntactic variation world-wide: The genitive alternation across varieties of English. Journal of English Linguistics 45 (1): 3–27.
Kolbe-Hanna, Daniela and Lina Baldus. 2018. The choice between -ing and to complement clauses in English as first, second and foreign language. Paper presented at ICAME 39, University of Tampere.
Kruger, Haidee and Gert De Sutter. 2018. Alternation in contact and non-contact varieties: Reconceptualising that-omission in translated and non-translated English using the MuPDAR approach. Translation, Cognition and Behavior 1(2): 251–290.
Lester, Nicholas A. 2019. That’s hard: Relativizer use in spontaneous L2 speech. International Journal of Learner Corpus Research 5(1): 1–32.
Szmrecsanyi, Benedikt, Jason Grafmiller, Joan Bresnan, Anette Rosenbach, Sali Tagliamonte and Simon Todd. 2017. Spoken syntax in a comparative perspective: The dative and genitive alternation in varieties of English. Glossa 2(1): 1–27.
Werner, Valentin, Robert Fuchs and Sandra Götz. To appear. L1 influence vs. universal mechanisms: An SLA-driven corpus study on temporal expression. In Bert Le Bruyn and Magali Paquot (eds.). Learner corpora and second language acquisition research. Cambridge: Cambridge University Press.
Wulff, Stefanie, Nicholas Lester and Maria T. Martinez-Garcia. 2014. That-variation in German and Spanish L2 English. Language and Cognition 6: 271–299.
Wulff, Stefanie and Stefan Th. Gries. 2015. Prenominal adjective order preferences in Chinese and German L2 English: A multifactorial corpus study. Linguistic Approaches to Bilingualism 5(1): 122–150.
Wulff, Stefanie and Stefan Th. Gries. 2019. Particle placement in learner English: Measuring effects of context, first language, and individual variation. Language Learning 69(4): 873–910.
Wulff, Stefanie and Stefan Th. Gries. To appear. Explaining individual variation in learner corpus research: Some methodological suggestions. In Bert Le Bruyn and Magali Paquot (eds.).di Learner corpora and second language acquisition research. Cambridge: Cambridge University Press.