Submission is due on Tuesday in Week 4.
For the Olympics data in Table 0.12, a linear regression model in which long jump distance is used to predict high jump height is proposed,
where is high jump height and is long jump distance.
Observation | Year | High Jump | Long Jump | Discus Throw |
---|---|---|---|---|
1 | 1896 | 71.25 | 249.750 | 1147.50 |
2 | 1900 | 74.80 | 282.875 | 1418.90 |
3 | 1904 | 71.00 | 289.000 | 1546.50 |
4 | 1952 | 80.32 | 298.000 | 2166.85 |
5 | 1960 | 85.00 | 319.750 | 2330.00 |
6 | 1964 | 85.75 | 317.750 | 2401.50 |
7 | 1972 | 87.75 | 324.500 | 2535.00 |
8 | 1976 | 88.50 | 328.500 | 2657.40 |
9 | 1980 | 92.75 | 336.250 | 2624.00 |
10 | 1984 | 92.50 | 336.250 | 2622.00 |
What assumptions are made about the residuals ?
[marks: 2]
Using the data in Table 0.12, write out the design matrix for this model.
[marks: 1]
Obtain the least squares estimates for .
[marks: 2]
A survey on five colonies of ants was conducted to assess whether the mean ant length varied between colony. The results are shown in Table 0.13, and can also be found in the dataframe antLengths. Lengths are in . Let denote the population mean length for colony .
Ant length () | Group |
---|---|
8.1 | 1 |
7.7 | 1 |
8.1 | 1 |
7.8 | 1 |
7.6 | 1 |
8.2 | 1 |
8.0 | 1 |
7.6 | 1 |
9.9 | 2 |
10.0 | 2 |
9.9 | 2 |
9.7 | 2 |
10.0 | 2 |
9.5 | 2 |
9.1 | 2 |
11.4 | 2 |
9.9 | 3 |
10.0 | 3 |
10.3 | 3 |
9.6 | 3 |
10.6 | 3 |
10.6 | 3 |
14.4 | 3 |
13.6 | 3 |
14.3 | 4 |
13.5 | 4 |
13.0 | 4 |
13.7 | 4 |
13.9 | 4 |
14.8 | 4 |
16.7 | 4 |
16.5 | 4 |
15.3 | 5 |
16.3 | 5 |
15.3 | 5 |
15.9 | 5 |
16.0 | 5 |
19.2 | 5 |
18.1 | 5 |
17.1 | 5 |
Give appropriate null and alternative hypotheses to test whether the mean ant length varies between colonies.
[marks: 1]
Calculate the sample group means for all five groups.
[marks: 1]
Given that the overall mean length of the samples is and that the total sum of squares , carry out a one-way ANOVA to test your hypotheses in part . You should test at the 5% significance level, and state your conclusions clearly.
[marks: 3]
Challenge
Consider the following extension to the simple linear regression model,
where are independent but not identically distributed Normal random variables, where .
Standardise to create a random variable which has Normal distribution.
[marks: 1]
By summing the squares of , create a sum of squares function for .
[marks: 2]
Using the first-order partial derivatives of the sum of squares function derived in part (b), show that the least squares estimator of is
[marks: 2]