Home page for accesible maths 2.8 Confidence intervals 2.8.1 Capturing the population parameter 2.8.3 A sampling distribution for the mean

Style control - access keys in brackets

Font (2 3) - + Letter spacing (4 5) - + Word spacing (6 7) - + Line spacing (8 9) - +

2.8.2 An approximate 95% confidence interval

Our point estimate is the most plausible value of the parameter, so it makes sense to build the confidence interval around the point estimate. The standard error, which is a measure of the uncertainty associated with the point estimate, provides a guide for how large we should make the confidence interval.

The standard error represents the standard deviation associated with the estimate, and roughly 95% of the time the estimate will be within 1.96 standard errors of the parameter. If the interval spreads out 1.96 standard errors from the point estimate, we can be roughly 95% confident that we have captured the true parameter:

\displaystyle\text{point estimate}\ \pm\ 1.96\times SE

(2.8)

But what does ‘‘95% confident’’ mean? Suppose we took many samples and built a confidence interval from each sample using Equation (2.8). Then about 95% of those intervals would contain the actual mean, $\mu$ . Figure LABEL:95PercentConfidenceInterval shows this process with 25 samples, where 24 of the resulting confidence intervals contain the average time for all the runners, $\mu=272.1001$ minutes, and one does not.

See the Moodle file for the code for the simulation.

Example 2.8.2

In Figure LABEL:95PercentConfidenceInterval, one interval does not contain 272.1001 minutes. Does this imply that the mean cannot be 272.1001?

Answer. Just as some observations occur more than 1.96 standard deviations from the mean, some point estimates will be more than 1.96 standard errors from the parameter. A confidence interval only provides a plausible range of values for a parameter. While we might say other values are implausible based on the data, this does not mean they are impossible. The rule where about 95% of observations are within 1.96 standard deviations of the mean is only approximately true. However, it holds exactly for the normal distribution. As we will soon see, the mean tends to be normally distributed when the sample size is sufficiently large.

Example 2.8.3

If the sample mean of times from LonMar13Samp is 273.4978 minutes and the standard error, as estimated using the sample standard deviation, is 4.987072 minutes, what would be an approximate 95% confidence interval for the average 26 mile time of all runners in the race? Apply the standard error calculated using the sample standard deviation ( $SE=\frac{49.87072}{\sqrt{100}}=4.987072$ ), which is how we usually proceed since the population standard deviation is generally unknown.

Answer. We apply Equation (2.8):

\displaystyle 273.4978\ \pm\ 1.96\times 4.987072\quad\rightarrow\quad(263.7231% ,283.2725)

Based on these data, we are about 95% confident that the average 26 mile time for all runners in the race was larger than 263.5237 but less than 283.472 minutes. Our interval extends out 1.96 standard errors from the point estimate, $\bar{x}$ .

Example 2.8.4

The sample data suggest the average runner’s age is about 35.37 years with a standard error of 1.036423 years (estimated using the sample standard deviation, 10.36423). What is an approximate 95% confidence interval for the average age of all of the runners?

Answer. Again apply Equation (2.8): $35.37\ \pm\ 1.96\times 1.036423\rightarrow(33.33861,37.40139)$ . We interpret this interval as follows: We are about 95% confident the average age of all participants in the 2013 London Marathon was between 33.33861 and 37.40139 years.