Suppose we want to consider confidence intervals where the confidence level is somewhat higher than 95%: perhaps we would like a confidence level of 99%. Think back to the analogy about trying to catch a fish: if we want to be more sure that we will catch the fish, we should use a wider net. To create a 99% confidence level, we must also widen our 95% interval. On the other hand, if we want an interval with lower confidence, such as 90%, we could make our original 95% interval slightly slimmer.
The 95% confidence interval structure provides guidance in how to make intervals with new confidence levels. Below is a general 95% confidence interval for a point estimate that comes from a nearly normal distribution:
(2.10) |
There are three components to this interval: the point estimate, ‘‘1.96’’, and the standard error. The choice of was based on capturing 95% of the data since the estimate is within 1.96 standard deviations of the parameter about 95% of the time. The choice of 1.96 corresponds to a 95% confidence level.
If is a normally distributed random variable, how often will be within 2.58 standard deviations of the mean?
Answer. This is equivalent to asking how often the score will be larger than -2.58 but less than 2.58. (For a picture, see Figure LABEL:choosingZForCI.) To determine this probability, look up -2.58 and 2.58 using R ( and ). Thus, there is a probability that the unobserved random variable will be within 2.58 standard deviations of . To create a 99% confidence interval, change 1.96 in the 95% confidence interval formula to be . Exercise 2.8.5 highlights that 99% of the time a normal random variable will be within 2.58 standard deviations of the mean. This approach – using the Z scores in the normal model to compute confidence levels – is appropriate when is associated with a normal distribution with mean and standard deviation . Thus, the formula for a 99% confidence interval is
(2.11) |
The normal approximation is crucial to the precision of these confidence intervals. Section 2.10 provides a more detailed discussion about when the normal model can safely be applied. When the normal model is not a good fit, we will use alternative distributions that better characterize the sampling distribution.
Verifying independence is often the most difficult of the conditions to check, and the way to check for independence varies from one situation to another. However, we can provide simple rules for the most common scenarios.
TIP: How to verify sample observations are independent
Observations in a simple random sample consisting of less than 10% of the population are
independent.
Conditions for being nearly normal and being
accurate
Important conditions to help ensure the sampling distribution of is nearly normal and the
estimate of SE sufficiently accurate:
•
The sample observations are independent.
•
The sample size is large: is a good rule of thumb.
•
The distribution of sample observations is not strongly skewed.
Additionally, the larger the sample size, the more lenient we can be with the sample’s skew.
Caution: Independence for random processes and experiments
If a sample is from a random process or experiment, it is important to verify the observations
from the process or subjects in the experiment are nearly independent and maintain their
independence throughout the process or experiment. Usually subjects are considered independent if
they undergo random assignment in an experiment.
Confidence interval for any confidence level
If the point estimate follows the normal model with standard error , then a confidence interval
for the population parameter is
where corresponds to the confidence level selected. We can calculate
using pnorm.
Figure LABEL:choosingZForCI provides a picture of how to identify based on a confidence level. We select so that the area between - and in the normal model corresponds to the confidence level.
Margin of error
In a confidence interval, is called the
margin of error.
Create a 90% confidence interval for the average time for all runners in the 2013 London Marathon. The point estimate is and the standard error is .
Answer. We first find such that 90% of the distribution falls between - and in the standard normal model, . We can look up - in R by looking for a lower tail of 5% (the other 5% is in the upper tail), qnorm(0.95)=1.6448541.645, thus implies . The 90% confidence interval can then be computed as . (We had already verified conditions for normality and the standard error.) That is, we are 90% confident the average time is larger than 265.2691 but less than 281.7265 minutes.