MATH235 MATH235 Week 1 - Moodle Quiz-assessed problems MATH235 Week 2 - Assessed problems (coursework)

MATH235 Week 2 - Workshop problems

If not all of the problems below are discussed in the workshop for lack of time, then please have a go at the problems on your own.

WS2.1

Bootstrap confidence intervals. Let $X_{1},\ldots,X_{n}$ be an IID sequence of Bernoulli $(p)$ random variables. The following R code describes a non-parametric bootstrap algorithm for obtaining a confidence interval for $p$ , using a vector data containing a sample $x_{1},\ldots,x_{n}$ :

⬇

n <- length(data)

propBS <- c()

for(i in 1:500){

dataBS <- sample(data,n,replace=T)

propBS[i] <- mean(dataBS)

}

ci <- quantile(propBS,c(0.05,0.95))

(a)

How many bootstrapped samples are being taken? How would you change this to be 1000 samples?
(b)

What size of confidence interval is being calculated? How would you change this to create a 99% confidence interval?
(c)

Two confidence intervals were created, $(0.84,0.88)$ and $(0.81,0.91)$ . One of these corresponds to a 90% confidence interval and the other to a 95% confidence interval. Which is which?
(d)

Explain why non-parametric bootstrapping uses ‘sampling with replacement’.

WS2.2

This question looks at the annual forecast prices for tea given in Table 1 of Question Sheet 1. We wish to test the hypothesis that the mean price of tea is greater than 1.8 dollars per kg. Assume that tea prices are an IID sample from a Normal $(\mu,\sigma^{2})$ distribution, with unknown population variance. We conduct the test using the confidence interval for $\mu$ .

(a)

To carry out this test at the 5% level, what confidence interval should we calculate?
(b)

Obtain the appropriate confidence interval.
(c)

Can you conclude that the mean price of tea is greater than 1.8 dollars?
(d)

How does this answer relate to your answer to CW1.2?

WS2.3

As part of the Childhood Respiratory Disease Study in 1980 in East Boston, Massachusetts, the FEV (forced expiratory volume) of 654 children with the disease was measured. FEV is the volume of air expelled after one second of concentrated effort and is measured in litres. Also collected were the child’s age (years), height (inches), gender and whether of not they smoked. The aim was to see whether or not these four variables have an effect on FEV.

We will conduct an exploratory analysis to see whether or not gender has an effect on FEV, using a subset of 20 of the children (all of whom were non-smokers). The aim is to test whether the mean FEV is the same for males and females. The data are shown in Table 0.7. Let $\mu_{1}$ and $\mu_{2}$ denote mean FEV for males and females respectively.

(a)

Are these data paired or unpaired? Explain your answer.
(b)

Calculate the pooled sample variance for the data.
(c)

Using the critical value approach, test, at the 5% level, whether or not there is evidence that the mean FEV for males is the same as the mean FEV for females. You should clearly state any assumptions made about the data, your hypotheses and conclusions.
(d)

Calculate a 95% confidence interval for the difference in means $\mu_{1}-\mu_{2}$ . How could you use this confidence interval to carry out the test in part (c)?

Individual	Age	FEV	log FEV	Height	Gender
1	8	2.382	0.868	62.0	0
2	11	2.988	1.095	70.0	1
3	10	3.111	1.135	66.0	1
4	7	1.726	0.546	53.0	0
5	9	2.076	0.730	60.5	1
6	9	2.798	1.029	62.0	1
7	9	2.056	0.721	63.0	0
8	12	2.751	1.012	63.0	0
9	9	2.352	0.855	59.0	1
10	9	3.042	1.113	66.0	0
11	10	3.350	1.209	69.0	1
12	9	2.571	0.944	60.5	1
13	11	4.324	1.464	67.5	1
14	15	2.635	0.969	64.0	0
15	7	1.611	0.477	57.5	1
16	8	1.991	0.689	59.5	1
17	10	1.873	0.628	52.5	1
18	12	3.001	1.099	63.5	0
19	9	1.895	0.639	57.0	1
20	11	3.977	1.381	70.5	1

Table 0.4: Age (years), FEV (litres), log FEV, height (cm) and gender (male-1; female-0) for a subset of 20 children in the Childhood Respiratory Disease Study.