Charting orthographical reliability in a corpus of English historical letters

Open access


Research into orthography in the history of English is not a simple venture. The history of English spelling is primarily based on printed texts, which fail to capture the range of variation inherent in the language; many manuscript phenomena are simply not found in printed texts. Manuscript-based corpora would be the ideal research data, but as this is resource-intensive, linguists use editions that have been produced by non-linguists. Many editions claim to retain original spellings, but in practice text is always normalized at the graph level and possibly more so. This does not preclude using such a corpus for orthographical research, but there has been no systematic way to determine the philological reliability of an edited text. In this paper we present a typological methodology we are developing for the evaluation of orthographical quality of edition-based corpora, with the aim of making the best use of bad data in the context of editions and manuscript practices. As a case study, we apply this methodology to the Early Modern and Late Modern English sections of the Corpus of Early English Correspondence.

The Bluestocking Corpus: Private Correspondence of Elizabeth Montagu, 1730s–1780s. First version. Edited by Anni Sairio, XML encoding by Ville Marttila. Department of Modern Languages, University of Helsinki. 2017.

Boyer, Paul and Stephen Nissenbaum. 1977. The Salem witchcraft papers: Verbatim transcripts of the legal documents of Salem witchcraft outbreak of 1692. 3 vols. New York: Da Capo Press.

Early Modern Letters Online (EMLO). Cultures of Knowledge, Bodleian Libraries, University of Oxford.

Electronic Enlightenment. Oxford: Bodleian Libraries, University of Oxford.

An Electronic Text Edition of Depositions 1560–1760 (ETED). 2011. Compiled by Merja Kytö, Peter J. Grund and Terry Walker. Available on the CD accompanying Merja Kytö, Peter J. Grund and Terry Walker (eds.), Testifying to language and life in Early Modern England. Amsterdam/Philadelphia: John Benjamins.

Fulk, Robert D. 2017. Philological coda. Noise: An appreciation. English Language and Linguistics 21 (2): 431–438.

Graham, Walter (ed.). 1941. The letters of Joseph Addison. Oxford: Clarendon Press.

Grund, Peter, Merja Kytö and Matti Rissanen. 2004. Editing the Salem Witchcraft records: An exploration of a linguistic treasury. American Speech 79 (2): 146–166.

Kaislaniemi, Samuli. 2017. Reconstructing merchant multilingualism: Lexical studies of early English East India Company correspondence. PhD thesis, University of Helsinki.

Kaislaniemi, Samuli, Mel Evans, Teo Juvonen and Anni Sairio. 2017. ‘A graphic system which leads its own linguistic life’? Epistolary spelling in English, 1400–1800. In T. Säily, A. Nurmi, M. Palander-Collin and A. Auer (eds.). Exploring future paths for historical sociolinguistics (Advances in Historical Sociolinguistics 7), 187–214. Amsterdam: John Benjamins.

Keränen, Jukka. 1998. Forgeries and one-eyed bulls: Editorial questions in corpus work. Neuphilologische Mitteilungen 99 (2): 217–226.

Nevala, Minna and Arja Nurmi. 2013. The Corpora of Early English Correspondence (CEEC400). In A. Meurman-Solin and J. Tyrkkö (eds.). Principles and practices for the digital editing and annotation of diachronic data (Studies in Variation, Contacts and Change in English 14). Helsinki: VARIENG.

Nevalainen, Terttu and Helena Raumolin-Brunberg. 2016. Historical sociolinguistics: Language change in Tudor and Stuart England. 2nd ed. New York: Routledge.

Nevalainen, Terttu. 1999. Making the best use of ‘bad’ data: Evidence for sociolinguistic variation in Early Modern English. Neuphilologische Mitteilungen 100 (4): 499–533.

Nurmi, Arja (ed.). 1998. Manual for the Corpus of Early English Correspondence Sampler CEECS. Helsinki: Department of English, University of Helsinki. Available at

Oldireva Gustafsson, Larisa. 2002. Preterite and past participle forms in English 1680–1790: Standardisation processes in public and private writing. Uppsala: Acta Universitatis Upsaliensis.

Osselton, Noel. 1984. Informal spelling styles in Early Modern English: 1500–1800. In N.F. Blake and C. Jones (eds.). English historical linguistics. Studies in development, 123–137. Sheffield: Department of English Language, University of Sheffield.

Raumolin-Brunberg, Helena and Terttu Nevalainen. 2007. Historical sociolinguistics: The Corpus of Early English Correspondence. In J.C. Beal, K.P. Corrigan and H.L. Moisl (eds.). Creating and digitizing language corpora. Vol. 2: Diachronic databases, 148–171. Houndsmills: Palgrave Macmillan. Pre-print available at

Salmon, Vivian. 1999. Orthography and punctuation 1476–1776. In R. Lass (ed.). The Cambridge history of the English language. Volume III: 1476–1776, 13–55. Cambridge: Cambridge University Press.

Scragg, Donald G. 1974. A history of English spelling. Manchester: Manchester University Press.

Sönmez, Margaret J.-M. 1993. English spelling in the seventeenth century: A study of the nature of standardisation as seen through the MS and printed versions of the Duke of Newcastle’s A New Method. PhD Thesis, University of Durham.

Walker, Terry and Merja Kytö. 2013. Features of layout and other visual effects in the source manuscripts of An Electronic Text Edition of Depositions 1560–1760 (ETED). In A. Meurman-Solin and J. Tyrkkö (eds.). Principles and practices for the digital editing and annotation of diachronic data (Studies in Variation, Contacts and Change in English 14). Helsinki: VARIENG.

Journal Information


All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 281 281 20
PDF Downloads 102 102 4