Corpora in Lexical Studies

Empirical data has been used in lexicography long before the discipline of corpus linguistics was invented. Samuel Johnson, for example, illustrated his dictionary with examples from literature, and in the 19th Century the Oxford Dictionary used citation slips to study and illustrate word usage. Corpora, however, have changed the way in which linguists can look at language.

A linguist who has access to a corpus, or other (non-representative) collection of machine readable text can call up all the examples of a word or phrase from many millions of words of text in a few seconds. Dictionaries can be produced and revised much more quickly than before, thus providing up-to-date information about language. Also, definitions can be more complete and precise since a larger number of natural examples are examined.

Follow this link for an example of the benefits of corpus linguistics in lexicography

Examples extracted from corpora can be easily organised into more meaningful groups for analysis. For example, by sorting the right-hand context of the word alphabetically so that it is possible to see all instances of a particular collocate together. Furthermore, because corpus data contains a rich amount of textual information - regional variety, author, date, genre, part-of-speech tags etc it is easier to tie down usages of particular words or phrases as being typical of particular regional varieties, genres and so on.

The open-ended (constantly growing) monitor corpus has its greatest role in dictionary building as it enables lexicographers to keep on top of new words entering the language, or existing words changing their meanings, or the balance of their use according to genre etc. However, finite corpora also have an important role in lexical studies - in the area of quantification. It is possible to rapidly produce reliable frequency counts and to subdivide these areas across various dimensions according to the varieties of language in which a word is used.

Finally, the ability to call up word combinations rather than individual words, and the existence of mutual information tools which establish relationships between co-occuring words (see Session 3) mean that we can treat phrases and collocations more systematically than was previously possible. A phraseological unit may consitute a piece of technical terminology or an idiom, and collocations are important clues to specific word senses.

Read about coprus-based work on morphlogy in Corpus Lingustics, Chapter 4, page 92.