If not all of the problems below are discussed in the workshop for lack of time, then please have a go at the problems on your own.
Consider an unpaired -test comparing against , in which the pooled variance and the sample size are both known and fixed. Suppose that the true difference between the means is . We will investigate how the power of the test varies with both and the significance level .
Explain what is meant by the power of a hypothesis test. Which type of error does this relate to, and how?
Assuming a significance level of 5%, Figure 0.1 (First Link, Second Link) shows a plot of the power of the test, as a function of the true difference . What do you notice about the power as increases? Can you explain why this result makes sense?
For a fixed difference , Figure 0.1 (First Link, Second Link) shows the power against the significance level of the test. What happens to the power as the significance level increases? Can you explain why this is? Hint you might want to use the relationship that you described in part to help you answer this part of the question.
To assess the effect of different kinds of protein on weight gain, a group of rats were each randomly assigned to one of four groups. Each group was then assigned a different diet (low protein beef, high protein beef, low protein cereal and high protein cereal). Table 0.9 contains the weight gained by each rat in each group. The data are a subset of a data set found in A Handbook of Small Data Sets http://www.stat.ncsu.edu/research/sas/sicl/data/.
Beef Low | Beef High | Cereal Low | Cereal High |
---|---|---|---|
90 | 73 | 107 | 98 |
76 | 102 | 95 | 74 |
90 | 118 | 97 | 56 |
64 | 104 | 80 | 111 |
86 | 81 | 98 | 95 |
51 | 107 | 74 | 88 |
72 | 100 | 74 | 82 |
90 | 87 | 67 | 77 |
Let , , and denote the mean weight gains for the low beef, high beef, low cereal and high cereal groups respectively.
Write down appropriate null and alternative hypotheses to test whether or not the group means are the same.
Given that the overall mean is 86.375 and the group means are 77.375, 96.5, 86.5 and 85.125, carry out a one-way ANOVA to test you hypotheses in part . Test at the 5% significance level. What are your conclusions?
What assumptions are made by the one-way ANOVA?
Table 0.12 contains the winning Olympic high jump heights, long jump distances and discus throws for a sample of 10 Olympic years (the Olympics are held every four years). All measurements are in inches. We are interested in whether or not we can use winning long jump distances and discus throws to predict the winning high jump height.
Observation | Year | High Jump | Long Jump | Discus Throw |
---|---|---|---|---|
1 | 1896 | 71.25 | 249.750 | 1147.50 |
2 | 1900 | 74.80 | 282.875 | 1418.90 |
3 | 1904 | 71.00 | 289.000 | 1546.50 |
4 | 1952 | 80.32 | 298.000 | 2166.85 |
5 | 1960 | 85.00 | 319.750 | 2330.00 |
6 | 1964 | 85.75 | 317.750 | 2401.50 |
7 | 1972 | 87.75 | 324.500 | 2535.00 |
8 | 1976 | 88.50 | 328.500 | 2657.40 |
9 | 1980 | 92.75 | 336.250 | 2624.00 |
10 | 1984 | 92.50 | 336.250 | 2622.00 |
What is the response variable in this problem? What are the explanatory variables?
The following linear regression model is fitted to the data,
where is high jump height, is long jump distance. is discus throw and are residuals.
What is the interpretation of ?
What is the interpretation of ?
What is the interpretation of ?
The least squares estimate for is . Use this to estimate the expected winning high jump height in 1972.
For the linear regression model,
Define the least squares function .
Write down the general form for the least squares estimates regression coefficients .
What condition is satisfied by ?
Table 0.11 shows the sample of 10 mean June temperatures in Durham, previously seen in Part 1. The question of interest is whether this sample provides evidence of an increase in the mean June temperature over time. We can assess this by fitting the linear regression model
where is temperature and is the year of observation. We centre times at 1879 as recording began in 1880 (although this year is not in our sample).
Year | 1885 | 1911 | 1936 | 1950 | 1951 | 1961 | 1963 | 1990 | 1994 | 2004 |
---|---|---|---|---|---|---|---|---|---|---|
Mean Temp. (C) | 12.70 | 12.95 | 12.85 | 15.20 | 12.10 | 13.50 | 13.55 | 12.70 | 13.35 | 14.40 |
What is the design matrix for this model?
Find the least squares estimates for .
Interpret your estimates.