A very simple way to obtain an estimate for the confidence interval of a parameter is by using sampling with replacement. This is an iterative procedure. First specify a large number , for example take or . Then for
Sample, with replacement, from your original sample to obtain a new sample of the same size as the original. ‘With replacement’ means that once an individual is resampled it is placed back into the sampling pool and may be sampled again.
Calculate an estimate of the parameter for your new sample.
The sample of parameter estimates now represents an approximation to the sampling distribution of the estimator . An approximate 95% confidence interval is therefore given by the 2.5% and 97.5% quantiles of this distribution, i.e. by taking the empirical 2.5% and 97.5% quantiles of .
Can you think why we need to sample with replacement in the above? What would happen if we sampled without replacement?
How would you calculate a 90% confidence interval for using the method above?
Example 3.2.1 contained the following sample of the annual minima of the Arctic sea ice extent, 4.55, 5.05, 6.48, 5.62, 6.89, 7.52, 6.40, 6.16, 5.32, 6.61.
Using this sample, estimate a 95% confidence interval for the mean of the minimum sea ice extent in the Arctic using the bootstrap approach. Recall that the data can be found in the file arctic.Rdata. We will use 100 bootstrap samples to create
So an estimate of the 95% confidence interval for the mean of the minimum sea ice extent is .
When you run this code in R, you will get a slightly different confidence interval to the one above. This is because the technique is based on random sampling, and the computer generates a different set of random samples each time you run the code. If you want to exactly replicate your results then you should use the set.seed function: before running the for loop, type the command
Now, each time you run the code you will get the same estimate. You would get a different estimate if you typed
Things to think about…
How sensitive is the confidence interval estimate to the number of bootstrapped samples ? Using the above code see whether you can obtain confidence interval estimates using 500 and 1000 bootstrapped samples.
Can you adjust the above code to calculate a 90% confidence interval for the mean of the minimum Arctic sea ice extent? What is your estimate? How does it compare to the 95% confidence interval? Can you explain any differences?
Without calculating it, would the 99% confidence interval for the mean minimum sea ice extent be wider or narrower than the 95% confidence interval?