MATH235

MATH235 Week 2 - Workshop problems

If not all of the problems below are discussed in the workshop for lack of time, then please have a go at the problems on your own.

WS2.1 

Bootstrap confidence intervals. Let X1,,Xn be an IID sequence of Bernoulli(p) random variables. The following R code describes a non-parametric bootstrap algorithm for obtaining a confidence interval for p, using a vector data containing a sample x1,,xn:

n <- length(data)
propBS <- c()
for(i in 1:500){
  dataBS <- sample(data,n,replace=T)
  propBS[i] <- mean(dataBS)
}
ci <- quantile(propBS,c(0.05,0.95))
  1. (a)

    How many bootstrapped samples are being taken? How would you change this to be 1000 samples?

  2. (b)

    What size of confidence interval is being calculated? How would you change this to create a 99% confidence interval?

  3. (c)

    Two confidence intervals were created, (0.84,0.88) and (0.81,0.91). One of these corresponds to a 90% confidence interval and the other to a 95% confidence interval. Which is which?

  4. (d)

    Explain why non-parametric bootstrapping uses ‘sampling with replacement’.

WS2.2 

This question looks at the annual forecast prices for tea given in Table 1 of Question Sheet 1. We wish to test the hypothesis that the mean price of tea is greater than 1.8 dollars per kg. Assume that tea prices are an IID sample from a Normal(μ,σ2) distribution, with unknown population variance. We conduct the test using the confidence interval for μ.

  1. (a)

    To carry out this test at the 5% level, what confidence interval should we calculate?

  2. (b)

    Obtain the appropriate confidence interval.

  3. (c)

    Can you conclude that the mean price of tea is greater than 1.8 dollars?

  4. (d)

    How does this answer relate to your answer to CW1.2?

WS2.3 

As part of the Childhood Respiratory Disease Study in 1980 in East Boston, Massachusetts, the FEV (forced expiratory volume) of 654 children with the disease was measured. FEV is the volume of air expelled after one second of concentrated effort and is measured in litres. Also collected were the child’s age (years), height (inches), gender and whether of not they smoked. The aim was to see whether or not these four variables have an effect on FEV.

We will conduct an exploratory analysis to see whether or not gender has an effect on FEV, using a subset of 20 of the children (all of whom were non-smokers). The aim is to test whether the mean FEV is the same for males and females. The data are shown in Table 0.7. Let μ1 and μ2 denote mean FEV for males and females respectively.

  1. (a)

    Are these data paired or unpaired? Explain your answer.

  2. (b)

    Calculate the pooled sample variance for the data.

  3. (c)

    Using the critical value approach, test, at the 5% level, whether or not there is evidence that the mean FEV for males is the same as the mean FEV for females. You should clearly state any assumptions made about the data, your hypotheses and conclusions.

  4. (d)

    Calculate a 95% confidence interval for the difference in means μ1-μ2. How could you use this confidence interval to carry out the test in part (c)?

Individual Age FEV log FEV Height Gender
1 8 2.382 0.868 62.0 0
2 11 2.988 1.095 70.0 1
3 10 3.111 1.135 66.0 1
4 7 1.726 0.546 53.0 0
5 9 2.076 0.730 60.5 1
6 9 2.798 1.029 62.0 1
7 9 2.056 0.721 63.0 0
8 12 2.751 1.012 63.0 0
9 9 2.352 0.855 59.0 1
10 9 3.042 1.113 66.0 0
11 10 3.350 1.209 69.0 1
12 9 2.571 0.944 60.5 1
13 11 4.324 1.464 67.5 1
14 15 2.635 0.969 64.0 0
15 7 1.611 0.477 57.5 1
16 8 1.991 0.689 59.5 1
17 10 1.873 0.628 52.5 1
18 12 3.001 1.099 63.5 0
19 9 1.895 0.639 57.0 1
20 11 3.977 1.381 70.5 1
Table 0.4: Age (years), FEV (litres), log FEV, height (cm) and gender (male-1; female-0) for a subset of 20 children in the Childhood Respiratory Disease Study.