Anni Sairio, Samuli Kaislaniemi, Anna Merikallio and Terttu Nevalainen
Research into orthography in the history of English is not a simple venture. The history of English spelling is primarily based on printed texts, which fail to capture the range of variation inherent in the language; many manuscript phenomena are simply not found in printed texts. Manuscript-based corpora would be the ideal research data, but as this is resource-intensive, linguists use editions that have been produced by non-linguists. Many editions claim to retain original spellings, but in practice text is always normalized at the graph level and possibly more so. This does not preclude using such a corpus for orthographical research, but there has been no systematic way to determine the philological reliability of an edited text. In this paper we present a typological methodology we are developing for the evaluation of orthographical quality of edition-based corpora, with the aim of making the best use of bad data in the context of editions and manuscript practices. As a case study, we apply this methodology to the Early Modern and Late Modern English sections of the Corpus of Early English Correspondence.