If not all of the problems below are discussed in the workshop for lack of time, then please have a go at the problems on your own.
We return to the Childhood Respiratory Disease Study, first introduced in Question Sheet 2. Recall that the response variable is FEV (forced expiratory volume, litres) and the explanatory variables are age (years), height (inches), gender (male/female) and smoker (yes/no). Consider three models for log FEV,
where is an indicator function, taking the value 1 if individual is male, and 0 otherwise.
Which of models FEV1–FEV3 are nested? Which are not nested? Explain your answer.
Now focus on model FEV3. Using the data in Table 2 to fit this model,
What is the variance matrix for the least squares estimator ? What is the variance of the intercept term?
Given that , find a 95% confidence interval for .
Use your confidence interval from part (iii) to test at the 5% level whether there is evidence for a significant gender effect. As usual, you should state your hypotheses and conclusions.
Consider the model
where and are indicator functions for a two-level factor, i.e. takes the value 1 if individual has level 1, and 0 otherwise, whereas takes the value 1 if individual has level 2, and 0 otherwise. Out of individuals, the first take level 1 of the factor and the remaining take level 2.
Write down the design matrix for this model.
Calculate , and explain why this matrix is singular (non-invertible).
Explain what this result tells us about how factors should be included in a linear model.