Home page for accesible maths 2 Distributions and Inference

Style control - access keys in brackets

Font (2 3) - + Letter spacing (4 5) - + Word spacing (6 7) - + Line spacing (8 9) - +

2.6 Foundations for inference

Statistical inference is concerned primarily with understanding the quality of parameter estimates. For example, a classic inferential question is, ‘‘How sure are we that the estimated mean, x¯, is near the true population mean, μ?’’ While the equations and details change depending on the setting, the foundations for inference are the same throughout all of statistics. We introduce these common themes in Sections 2.7-2.10 by discussing inference about the population mean, μ, and set the stage for other parameters and scenarios in Section 2.11. Some advanced considerations are deferred to Math235. Understanding these sections will make the rest of this course, and indeed the rest of statistics, seem much more familiar.

Throughout the next few sections we consider a data set called LonMar13, which represents all 34,280 runners who finished the 2013 London Marathon (just over 26 mile run).2222https://www.virginmoneylondonmarathon.com/en-gb/ Part of this data set is shown in Table 2.3, and the variables are described in Table 2.4.

Place age time gender
1 18-39 137.1667 M
2 18-39 137.7167 M
3 18-39 139.3667 M
4 18-39 141.6500 M
99 40-44 156.8667 F
100 40-44 156.8833 M
Table 2.3: Six observations from the LonMar13 data set.
variable description
place where they placed overall
age Category of Age, in years
time London Marathon run time, in minutes
gender Gender (M for male, F for female)
Table 2.4: Variables and their descriptions for the LonMar13 data set.

These data are special because they include the results for the entire population of runners who finished the 2013 London Marathon. We took a simple random sample of this population, which is represented in Table 2.5. We will use this sample, which we refer to as the LonMar13Samp data set, to draw conclusions about the entire population. This is the practice of statistical inference in the broadest sense. A histogram and barplot summarizing the time and age variables respectively from the LonMar13Samp data set are shown in Figure LABEL:run10SampHistograms.

R>hist(LonMar13Samp[,3], breaks=10)
R>barplot(table(LonMar13Samp[,2]))

Place age time gender
9102 18-39 233.0500 M
12757 18-39 248.6333 M
19637 18-39 277.8833 M
20678 45-49 282.5333 M
Table 2.5: Four observations for the LonMar13Samp data set, which represents a simple random sample of 100 runners from the 2013 London Marathon.