We want to estimate the population mean based on the sample. The most intuitive way to go about doing this is to simply take the sample mean. That is, to estimate the average 26 mile run time of all participants, take the average time for the sample:
R>mean(LonMar13Samp[,3])
The sample mean minutes is called a point estimate of the population mean:if we can only choose one value to estimate the population mean, this is our best guess. Suppose we take a new sample of 100 people and recompute the mean; we will probably not get the exact same answer that we got using the LonMar13Samp data set. Estimates generally vary from one sample to another, and this sampling variation suggests our estimate may be close, but it will not be exactly equal to the parameter.
We can also estimate the average age of participants by examining the sample mean of age:
Notice here that as we do not have the exact age of each participant, we have assumed that everyone in the 40-44 age bracket is aged 42.5 (middle value for the bracket). Note that we use the bottom of the next bracket to calculate the middle value, this is because you can be 44 and 364 days old. Using the middle of the bracket is the only sensible way to calculate a mean from grouped data.
What about generating point estimates of other population parameters, such as the population median or population standard deviation? Once again we might estimate parameters based on sample statistics, as shown in Table 2.6. For example, we estimate the population standard deviation for the running time using the sample standard deviation, 49.87 minutes.
R>mean(LonMar13Samp[,3]);median(LonMar13Samp[,3]); sd(LonMar13Samp[,3])
R>mean(LonMar13[,3]); median(LonMar13[,3]); sd(LonMar13[,3])
time | estimate | parameter |
---|---|---|
mean | 273.4978 | 272.1001 |
median | 265.6833 | 267.4167 |
st. dev. | 49.87072 | 57.82621 |
Suppose we want to estimate the difference in run times for men and women. If and , then what would be a good point estimate for the population difference?
Answer. We could take the difference of the two sample means: . Men ran about 18.84 minutes faster on average in the 2013 London Marathon.
R>x=by(LonMar13Samp[,3],LonMar13Samp[,4],mean); abs(diff(x)
If you had to provide a point estimate of the population IQR for the run time of participants, how might you make such an estimate using a sample?
Answer. To obtain a point estimate of the IQR for the population, we could take the IQR of the sample.