Lancaster University Department of Linguistics and Modern English Language
Corpus Linguistics Home
Page index
WordSmith
BNCweb
DIY Corpora
Building DIY Corpora
Headers in DIY Corpora
 
Page One
 
 
Current page
 
 
Page Three
 
 
Page Four
 
 

Text Collection:
Text Archives

 

There are several well-known electronic text archives on the net. Look at the following information and see which site will suit your needs. Go to a couple of sites and download at least three texts you are interested in.

Famous Text Archives:

(a) Mainly old literary texts (before 1920)
 
  The Oxford Text Archive

  (http://ota.ahds.ac.uk/)
  - multilingual archive
 
  Project Gutenberg
  (http://www.gutenberg.net/)
  - includes copyright-free government documents
 
  The On-Line Books Page
  (http://digital.library.upenn.edu/books/)
 
 
(b) Newspapers
 
   Newsbank
  (http://infoweb.newsbank.com)
   - An extremely large repository of British tabloid and broadsheet newspapers.

  News Resources
  (http://www.newo.com)
  - Links to news resources around the world
 
  CNN Plus: Transcripts
  (http://www.cnn.com/TRANSCRIPTS/)
  - CNN news transcripts classified into several categories
  - "Interview & Debate" could be a good spoken resource
 
  The Guardian
  (http://www.guardian.co.uk)
 

(c) Movie, drama, TV scripts
 
  The Daily Script

  (http://www.dailyscript.com/)
 
  Drew’s Scripts-O-Rama
  (http://www.script-o-rama.com/table.shtml)
  - Links to hundreds of movie and TV scripts.
  - Format varies.
  - Could be good resources for teaching spoken English
 
 
(d) E-text center in the United States
 
  Directory of Electronic Text Centers in the U.S.

  (http://scc01.rutgers.edu/ceth/infosrv/ectrdir.html)
(e) Netnews, discussion lists
Use Outlook Express or Netscape to connect to the news server (news.lancs.ac.uk). You can subscribe to hundreds of newsgroups on various topics.