If not all of the problems below are discussed in the workshop for lack of time, then please have a go at the problems on your own.
The linear regression model is defined as
Define the predicted value for individual .
Define the estimated residual for individual .
Give the formula for the estimated residual variance . You should define any notation that you use.
Consider the following model for the Olympics data in Question Sheet 3, which uses long jump distances () and discus throws () as covariates for high jump height (),
where and are the sample means of the long jumps and discus throws respectively.
What is the coefficient known as?
Write down the design matrix for this model.
For this model the estimated regression coefficients are .
Obtain the vector of estimated residuals, and hence give an estimate of the residual variance .
What is the predicted value for the high jump in 1948, when the winning long jump was 308 inches and the winning discus throw was 2078 inches?
This question focuses on the sampling distribution of the least squares estimator .
Derive the expected value of the least squares estimator . Is this estimator unbiased?
What is the sampling distribution for ? You should state any parameters in this distribution, as well as its name.
Now consider the following model for the log FEV data introduced in Question Sheet 2,
where is FEV and is age.
Using the data in Table 2 of Question Sheet 2,
Find the least squares estimates .
Given that , find the variance matrix for .
Using your answers to parts (i) and (ii) or otherwise, test at the 5% level whether or not there is evidence of a linear relationship between age and log FEV. You should state clearly your hypotheses and conclusions.