The Corpus of Late Modern English Texts, version 3.1

Please use the following text to cite this item or export to a predefined format:
De Smet, Hendrik; Flach, Susanne; Diller, Hans-Jürgen and Tyrkkö, Jukka, 2015, The Corpus of Late Modern English Texts, version 3.1, CLARIN DSpace, http://hdl.handle.net/20.500.14106/2574.
Date issued
2015-10
Size
34386225 tokens,
333 texts,
212 other,
687 mb
Language(s)
Description
The Corpus of Late Modern English Texts (CLMET) is a corpus of roughly 35 million words of British English from 1710–1920, grouped into three 70-year periods. The history, versions and specifics of corpus composition can be followed up by referring to the CLMET3.0 website. CLMET3.0 is currently distributed in three formats: (i) plain text, (ii) plain text with one sentence per line, and (iii) a tagged version (one sentence per line). Version CLMET3.1 is the result of making CLMET available in a CQP format for use in CWB and CQPweb-based corpus environments. While there is no change to the selection of texts, CLMET3.1 includes additions and changes in linguistic annotation. The changes in CLMET3.1 are of three general types: (a) retokenization and retagging, (b) fixing of some systematic issues that come with historical data, and (c) enhancing annotation by adding lemmas and simplified part-of-speech class tags.
Publisher
 Files in this item
This item contains no files.