In recent years, however, some historical linguistics have changed their approach, resulting in an upsurge in strictly corpus-based historical linguistics and the building of corpora for this purpose. The most widely known English historical corpus is the Helsinki corpus.
The Helsinki corpus contains approximately 1.6 million words of English dating from the earliest Old English Period (before AD 850) to the end of the Early Modern English period (1710). It is divided into three main periods - Old English, Middle English and Early Modern English - and each period is subdivided into a number of 100-year subperiods (or 70-year subperiods in some cases). The Helsinki corpus is representative in that it covers a range of genres, regional varieties and sociolinguistics variables such as gender, age, education and social class. The Helsinki team have also produced "satellite" corpora of early Scots and early American English.
Other examples of English historical corpora in development are the Zürich Corpus of English Newspapers (ZEN), the Lampeter Corpus of Early Modern English Tracts (a sample of English pamphlets from between 1640 and 1740) and the ARCHER corpus (a corpus of British and American English from 1650-1990).
The work which is carried out on historical corpora is qualitatively similar to that which is carried out on modern language corpora, although it is also possible to carry out work on the evolution of language through time. For example, Peitsara (1993) used four subperiods from the Helsinki corpus and calculated the frequencies of different prepositions introducing agent phrases. Throughout the period she found that the most common prepositions of this type were of and by, which were of almost equal frequency at the beginning of the period, but by the fifteenth century by was three times more common than of, and by 1640 by was eight times as common.
Studies like this have particular importance in the context of Halliday's (1991) conception of language evolution as a motivated change tin the probabilities of the grammar. However, it is important to be aware of the limitations of corpus linguistics, as Rissanen (1989) pointed out. Rissanen identifies three main problems associated with using historical corpora