Corpus Description

The Primary Data

The primary data used for this corpus is the written project work produced by a class of 37 school children in the UK. These projects were selected and researched independently by the same children over a period of three years. They cover such themes as animals, favourite hobbies, and countries of the world.

The age of the children when they wrote their projects - 8-11 years - is one in which literacy is still extending and developing. They are just starting to "spread their wings" as writers, exploring an ever-increasing range of genres and discourse types. Their writing is, moreover, closely bound up with extensive use of visual material - drawings, photographs, cuttings from magazines and computerized encyclopedias - making it a challenge for corpus encoding.

The Computer Corpus

The LCPW is a computerized representation of the primary data and other related material. It attempts to capture as much useful information as possible about the original projects, such as their appearance, their textual content and grammatical characteristics, and what the children and others said about them. (See also Features and How to Use the Corpus.)

Organisation of the Corpus

For the purposes of the electronic corpus we have

  1. divided the children into two sets:

    • a "Core Sample" of twelve children who span the full range of school achievement. Six of these are girls, and six are boys. For each child we have a complete set of data, and permission to use their material on the Internet (CHECK AGAIN!). This data has been processed to a reasonably high degree of accuracy, each project having been proofread twice.

    • a "Non-Core Sample" other children whose projects are available. This data is plentiful but not as comprehensive or evenly balanced as the Core Sample. We have carried out less thorough proofreading of the Non-Core Sample projects.

  2. selected three longitudinal series of projects so far, created when the children were between 9 and 11 years old. In due course we hope to add other series, including those produced between 8 and 9 years old.

Features of the Corpus

The corpus can be explored in the following ways:

  • viewing scanned images of the children's project pages. Examples

  • browsing an SGML-based transcription of the children's original texts.

  • accessing grammatically annotated (for Part-of-Speech) versions of the transcriptions.

  • finding out about the material characteristics of the original document, from detailed descriptive notes.

  • downloading the text transcriptions of the children's work, for easy loading into concordancing or other programmes.

  • navigating between these layers of data, as well as to other projects by the same child, and other children in the same class.
Please refer to How to use the corpus for further information.

Current availability of material

(as at 10 April 2001) The most complete data is for the Core Sample children (see above) in the series 5.1 Animals, 5.3 Birds and 6.2 Free Choice. For each of these projects there should be page scans, transcription, POS-tagged version, and notes on physical characteristics. We are just doing the last stages of proofreading before making the transcriptions and POS-tagged versions available from the download button.

We plan to put the other projects (Non-Core Sample 5.1, 5.3 and 6.2, plus all the remaining material for 4.1 Free Choice and 4.2 Free Choice) on the site in the next few weeks.

Contacting us

If you have any general queries regarding the LCCWP, please contact Roz Ivanic (r.ivanic@lancaster.ac.uk) or Tony McEnery (a.mcenery@lancaster.ac.uk), the project directors. On questions relating to the content and design of the website, please contact Nick Smith (n.i.smith@lancaster.ac.uk).

References

Ivanic, R. (1999) Literacies and epistemologies in primary education. In Tosi A. and Leung C. (eds) Rethinking Language Education. London: Centre for Information on Language Teaching and Research.

Ormerod F., and Ivanic, R. (1999) Texts in Practices: Interpreting the physical characteristics of texts. In Barton, D., Hamilton, M., and Ivanic, R. (eds). Situated Literacies. London: Routledge.

Smith, N., McEnery A. and Ivanic, R. (1998) Issues in Transcribing a Corpus of Children's Handwritten Projects. Literary and Linguistic Computing, Vol.13, No.4. Oxford: OUP.

Full documentation on the corpus and associated research is contained in the Final Report to the Leverhulme Trust.

To order a copy of the report, please contact Elaine Heron, e.heron@lancaster.ac.uk

(Price = UK£3-00, as at 11 May 00)

Top