Statistical inference is concerned primarily with understanding the quality of parameter estimates. For example, a classic inferential question is, ‘‘How sure are we that the estimated mean, , is near the true population mean, ?’’ While the equations and details change depending on the setting, the foundations for inference are the same throughout all of statistics. We introduce these common themes in Sections 2.7-2.10 by discussing inference about the population mean, , and set the stage for other parameters and scenarios in Section 2.11. Some advanced considerations are deferred to Math235. Understanding these sections will make the rest of this course, and indeed the rest of statistics, seem much more familiar.
Throughout the next few sections we consider a data set called LonMar13, which represents all 34,280 runners who finished the 2013 London Marathon (just over 26 mile run).2222https://www.virginmoneylondonmarathon.com/en-gb/ Part of this data set is shown in Table 2.3, and the variables are described in Table 2.4.
Place | age | time | gender |
---|---|---|---|
1 | 18-39 | 137.1667 | M |
2 | 18-39 | 137.7167 | M |
3 | 18-39 | 139.3667 | M |
4 | 18-39 | 141.6500 | M |
99 | 40-44 | 156.8667 | F |
100 | 40-44 | 156.8833 | M |
variable | description |
---|---|
place | where they placed overall |
age | Category of Age, in years |
time | London Marathon run time, in minutes |
gender | Gender (M for male, F for female) |
These data are special because they include the results for the entire population of runners who finished the 2013 London Marathon. We took a simple random sample of this population, which is represented in Table 2.5. We will use this sample, which we refer to as the LonMar13Samp data set, to draw conclusions about the entire population. This is the practice of statistical inference in the broadest sense. A histogram and barplot summarizing the time and age variables respectively from the LonMar13Samp data set are shown in Figure LABEL:run10SampHistograms.
R>hist(LonMar13Samp[,3], breaks=10)
R>barplot(table(LonMar13Samp[,2]))
Place | age | time | gender |
---|---|---|---|
9102 | 18-39 | 233.0500 | M |
12757 | 18-39 | 248.6333 | M |
19637 | 18-39 | 277.8833 | M |
20678 | 45-49 | 282.5333 | M |