Conclusion
In this session we have seen how a number of areas of language study have benefited from exploiting corpus data. To summarise, the main important advantages of corpora are:
- Sampling and quantification. Because a corpus is sampled to maximally represent the population, any findings taken from the corpus can be generalised to the larger population. Hence quantification in corpus linguistics is more meaningful than other forms of linguistic quantification because it can tell us about a variety of language, not just that which is being analysed.
- Ease of access. As all of the data collection has been dealt with by someone else, the researcher does not have to go through the issues of sampling, collection and encoding. The majority of corpora are readily available, either free or at low-cost price. Once the corpora have been obtained, it is usually easy to access the data within it, e.g. by using a concordance program.
- Enriched data. Many corpora have already been enriched with additional linguistic information such as part-of-speech annotation, parsing and prosodic transcription. Hence data retrieval from annotated corpora can be easier and more specific than with unannotated data.
- Naturalistic data. Corpus data is not always completely unmonitored in the sense that the people producing the spoken or written texts are unaware until after the fact that they are being asked to participate in the building of a corpus. But for the most part, the data are largely naturalistic, unmonitored and the product of real social contexts. Thus the corpus provides one of the most reliable sources of naturally occurring data that can be examined.