MATH235 MATH235 Week 2 - Moodle Quiz-assessed problems MATH235 Week 3 - Assessed problems (coursework)

MATH235 Week 3 - Workshop problems

If not all of the problems below are discussed in the workshop for lack of time, then please have a go at the problems on your own.

WS3.1

Consider an unpaired $t$ -test comparing $H_{0}:\mu_{X}=\mu_{Y}$ against $H_{1}:\mu_{X}\neq\mu_{Y}$ , in which the pooled variance $s^{2}_{p}$ and the sample size $n$ are both known and fixed. Suppose that the true difference between the means is $\Delta$ . We will investigate how the power of the test varies with both $\Delta$ and the significance level $\alpha$ .

(a)

Explain what is meant by the power of a hypothesis test. Which type of error does this relate to, and how?
(b)

Assuming a significance level of 5%, Figure 0.1 (First Link, Second Link) shows a plot of the power of the test, as a function of the true difference $\Delta$ . What do you notice about the power as $\Delta$ increases? Can you explain why this result makes sense?
(c)

For a fixed difference $\Delta$ , Figure 0.1 (First Link, Second Link) shows the power against the significance level of the test. What happens to the power as the significance level increases? Can you explain why this is? Hint you might want to use the relationship that you described in part $(a)$ to help you answer this part of the question.

Figure 0.1: First Link, Second Link, Caption: Power vs.

\Delta

, the true difference between

\mu_{X}

and

\mu_{Y}

(left) and power vs. significance level

\alpha

(right).

WS3.2

To assess the effect of different kinds of protein on weight gain, a group of rats were each randomly assigned to one of four groups. Each group was then assigned a different diet (low protein beef, high protein beef, low protein cereal and high protein cereal). Table 0.9 contains the weight gained by each rat in each group. The data are a subset of a data set found in A Handbook of Small Data Sets http://www.stat.ncsu.edu/research/sas/sicl/data/.

Beef Low	Beef High	Cereal Low	Cereal High
90	73	107	98
76	102	95	74
90	118	97	56
64	104	80	111
86	81	98	95
51	107	74	88
72	100	74	82
90	87	67	77

Table 0.9: Weight gain in rats fed four different types of diet.

Let $\mu_{LB}$ , $\mu_{HB}$ , $\mu_{LC}$ and $\mu_{HC}$ denote the mean weight gains for the low beef, high beef, low cereal and high cereal groups respectively.

(a)

Write down appropriate null and alternative hypotheses to test whether or not the group means are the same.
(b)

Given that the overall mean is 86.375 and the group means are 77.375, 96.5, 86.5 and 85.125, carry out a one-way ANOVA to test you hypotheses in part $(a)$ . Test at the 5% significance level. What are your conclusions?
(c)

What assumptions are made by the one-way ANOVA?

WS3.3

Table 0.12 contains the winning Olympic high jump heights, long jump distances and discus throws for a sample of 10 Olympic years (the Olympics are held every four years). All measurements are in inches. We are interested in whether or not we can use winning long jump distances and discus throws to predict the winning high jump height.

Observation	Year	High Jump	Long Jump	Discus Throw
1	1896	71.25	249.750	1147.50
2	1900	74.80	282.875	1418.90
3	1904	71.00	289.000	1546.50
4	1952	80.32	298.000	2166.85
5	1960	85.00	319.750	2330.00
6	1964	85.75	317.750	2401.50
7	1972	87.75	324.500	2535.00
8	1976	88.50	328.500	2657.40
9	1980	92.75	336.250	2624.00
10	1984	92.50	336.250	2622.00

Table 0.10: Winning Olympic high jump heights and long jump and discus distances for a sample of 10 years. All measurements are in inches.

(a)

What is the response variable in this problem? What are the explanatory variables?
(b)
The following linear regression model is fitted to the data,

$Y_{i}=\beta_{1}+\beta_{2}x_{i,1}+\beta_{3}x_{i,2}+\epsilon_{i},~{}~{}~{}i=1,% \ldots,n$

where $Y_{i}$ is high jump height, $x_{i,1}$ is long jump distance. $x_{i,2}$ is discus throw and $\epsilon_{1},\ldots,\epsilon_{n}$ are residuals.
1. (i)
  
  What is the interpretation of $\beta_{1}$ ?
2. (ii)
  
  What is the interpretation of $\beta_{2}$ ?
3. (iii)
  
  What is the interpretation of $\beta_{3}$ ?
(c)

The least squares estimate for $\beta$ is $\hat{\beta}=(39.4,0.0665,0.0108)^{\prime}$ . Use this to estimate the expected winning high jump height in 1972.

WS3.4

(a)
For the linear regression model,
- (i)
  
  Define the least squares function $S(\beta)$ .
- (ii)
  
  Write down the general form for the least squares estimates regression coefficients $\beta$ .
- (iii)
  
  What condition is satisfied by $S(\hat{\beta})$ ?
(b)

Table 0.11 shows the sample of 10 mean June temperatures in Durham, previously seen in Part 1. The question of interest is whether this sample provides evidence of an increase in the mean June temperature over time. We can assess this by fitting the linear regression model

$\mathbb{E}[Y_{i}]=\beta_{1}+\beta_{2}(t_{i}-1879),$

where $Y_{i}$ is temperature and $t_{i}$ is the year of observation. We centre times at 1879 as recording began in 1880 (although this year is not in our sample).

Year 1885 1911 1936 1950 1951 1961 1963 1990 1994 2004

Mean Temp. ( ${}^{\circ}$ C) 12.70 12.95 12.85 15.20 12.10 13.50 13.55 12.70 13.35 14.40

Table 0.11: Mean June temperatures in Durham, sample of size 10
- (i)
  
  What is the design matrix for this model?
- (ii)
  
  Find the least squares estimates $\hat{\beta}$ for $\beta=(\beta_{1},\beta_{2})^{\prime}$ .
- (iii)
  
  Interpret your estimates.

Beef Low	Beef High	Cereal Low	Cereal High
90	73	107	98
76	102	95	74
90	118	97	56
64	104	80	111
86	81	98	95
51	107	74	88
72	100	74	82
90	87	67	77

Beef Low	Beef High	Cereal Low	Cereal High
90	73	107	98
76	102	95	74
90	118	97	56
64	104	80	111
86	81	98	95
51	107	74	88
72	100	74	82
90	87	67	77

Beef Low	Beef High	Cereal Low	Cereal High
90	73	107	98
76	102	95	74
90	118	97	56
64	104	80	111
86	81	98	95
51	107	74	88
72	100	74	82
90	87	67	77