Please use the following text to cite this item or export to a predefined format:
University of Oxford, 2003,
The Emille Corpus (Beta Release Version), CLARIN DSpace,
http://hdl.handle.net/20.500.14106/2460.
| dc.contributor | McEnery, A.M. Department of Linguistics and Modern English Language Lancaster University Lancaste |
| dc.contributor.editor | McEnery, A.M. |
| dc.contributor.editor | Baker, Paul |
| dc.contributor.editor | Hardie, Andrew |
| dc.date.accessioned | 2018-07-27 |
| dc.date.accessioned | 2022-08-19T15:53:13Z |
| dc.date.available | 2022-08-19T15:53:13Z |
| dc.date.created | 2003 |
| dc.date.issued | 2003-05-02 |
| dc.description.abstract | The collection consists of: Thirty million words of monolingual written data (Gujarati, Tamil, Hindi, Punjabi-news website articles) 600,000 words of monolingual spoken data (Hindi, Urdu, Punjabi, Bengali, Gujarati-radio broadcasts) 120,000 words of parallel data in each of English, Hindi, Urdu, Punjabi, Bengali and Gujarati (U.K. government leaflets). Further information available at: http://www.emille.lancs.ac.uk/home.htm |
| dc.description.sponsorship | Engineering and Physical Science Research Council (EPSRC) |
| dc.format.extent | Text data 6551 files : ca. 482 MB |
| dc.format.medium | Digital bitstream |
| dc.identifier | ota:2460 |
| dc.identifier.uri | http://hdl.handle.net/20.500.14106/2460 |
| dc.language | English |
| dc.language | Gujarati |
| dc.language | Tamil |
| dc.language | Hindi |
| dc.language | Panjabi |
| dc.language | Urdu |
| dc.language | Bengali |
| dc.language.iso | eng |
| dc.language.iso | guj |
| dc.language.iso | tam |
| dc.language.iso | hin |
| dc.language.iso | pan |
| dc.language.iso | urd |
| dc.language.iso | ben |
| dc.publisher | University of Oxford |
| dc.relation.ispartof | Oxford Text Archive Core Collection |
| dc.rights | Distributed by the University of Oxford under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. |
| dc.rights.label | PUB |
| dc.rights.uri | http://creativecommons.org/licenses/by-nc-sa/3.0/ |
| dc.subject.lcsh | South Asia--Languages |
| dc.subject.lcsh | Indo-Aryan languages, Modern |
| dc.subject.lcsh | Linguistic analysis (Linguistics) |
| dc.subject.other | Linguistic corpora |
| dc.title | The Emille Corpus (Beta Release Version) |
| dc.type | Corpus |
| local.branding | Oxford Text Archive |
| local.branding | Oxford Text Archive |
| local.files.count | 9 |
| local.files.size | 113513930 |
| local.has.files | yes |
| local.language.name | English |
| local.language.name | Gujarati |
| local.language.name | Tamil |
| local.language.name | Hindi |
| local.language.name | Panjabi |
| local.language.name | Urdu |
| local.language.name | Bengali |
| otaterms.date.range | 2000-present |
Collections
This item isPublicly Available
and licensed under:
Files in this item
This item contains no files.

