Home page for accesible maths 4 Confidence intervals

Style control - access keys in brackets

Font (2 3) - + Letter spacing (4 5) - + Word spacing (6 7) - + Line spacing (8 9) - +

4.1 Bootstrap confidence intervals

A very simple way to obtain an estimate for the confidence interval of a parameter θ is by using sampling with replacement. This is an iterative procedure. First specify a large number B, for example take B=100 or B=500. Then for i=1,,B

  1. 1.

    Sample, with replacement, from your original sample to obtain a new sample of the same size as the original. ‘With replacement’ means that once an individual is resampled it is placed back into the sampling pool and may be sampled again.

  2. 2.

    Calculate an estimate of the parameter θ^(i) for your new sample.

The sample of parameter estimates θ^(1),,θ^(B) now represents an approximation to the sampling distribution of the estimator θ^. An approximate 95% confidence interval is therefore given by the 2.5% and 97.5% quantiles of this distribution, i.e. by taking the empirical 2.5% and 97.5% quantiles of θ^(1),,θ^(B).

Remark.

Can you think why we need to sample with replacement in the above? What would happen if we sampled without replacement?

Remark.

How would you calculate a 90% confidence interval for θ using the method above?

TheoremExample 4.1.1 Arctic sea ice

Example 3.2.1 contained the following sample of the annual minima of the Arctic sea ice extent, 4.55, 5.05, 6.48, 5.62, 6.89, 7.52, 6.40, 6.16, 5.32, 6.61.

Using this sample, estimate a 95% confidence interval for the mean of the minimum sea ice extent in the Arctic using the bootstrap approach. Recall that the data can be found in the file arctic.Rdata. We will use 100 bootstrap samples to create

> seaIceSample <- c(4.55,5.05,6.48,5.62,6.89,7.52,6.40,6.16,5.32,6.61)
> B <- 100   #set up a variable B to give number of bootstrap samples
> n <- length(seaIceSample)   #size of each bootstrap sample
> meanBS <- rep(0,B)    #set up an empty vector to collect the bootstrap estimates
> for (i in 1:B){
dataB <- sample(seaIceSample,n,replace=T) #sample with replacement
meanBS[i] <- mean(dataB)
}
> confInt <- quantile(meanBS,c(0.025,0.975))
> confInt
2.5%    97.5%
5.523075 6.55000

So an estimate of the 95% confidence interval for the mean of the minimum sea ice extent is (5.52,6.55).

Remark.

When you run this code in R, you will get a slightly different confidence interval to the one above. This is because the technique is based on random sampling, and the computer generates a different set of random samples each time you run the code. If you want to exactly replicate your results then you should use the set.seed function: before running the for loop, type the command

> set.seed(4)

Now, each time you run the code you will get the same estimate. You would get a different estimate if you typed

> set.seed(5)
{mdframed}

Things to think about…

  1. 1.

    How sensitive is the confidence interval estimate to the number of bootstrapped samples B? Using the above code see whether you can obtain confidence interval estimates using 500 and 1000 bootstrapped samples.

  2. 2.

    Can you adjust the above code to calculate a 90% confidence interval for the mean of the minimum Arctic sea ice extent? What is your estimate? How does it compare to the 95% confidence interval? Can you explain any differences?

  3. 3.

    Without calculating it, would the 99% confidence interval for the mean minimum sea ice extent be wider or narrower than the 95% confidence interval?